C Program To Implement Dictionary Using Hashing Functions

Hashing is an efficient method to store and retrieve elements.

It’s exactly same as index page of a book. In index page, every topic is associated with a page number. If we want to look some topic, we can directly get the page number from the index.

Likewise, in hashing every value will be associated with a key. Using this key, we can point out the element directly.

Create a simple hash function and some linked lists of structures, depending on the hash, assign which linked list to insert the value in. Use the hash for retrieving it as well. I did a simple implementation some time back. C Program To Create Hash Table using Linear Probing. Learn How To Create Hash Table in C Programming Language. This Program For Hashing in C Language uses Linear Probing Algorithm in Data Structures. Hash Tables are also commonly known as Hash Maps. The functions such as Insertion, Deletion and Searching Records in the Hash Tables are included. I can explain you, how I would have implemented. Step 1: Create a structure with two char arrays Key ; Value; This structure will act as a dictionary which can store key and value pair. To make it simple I have considered key and value. Hashing (Hash Function) In a hash table, a new index is processed using the keys. And, the element corresponding to that key is stored in the index. This process is called hashing. Let k be a key and h(x) be a hash function. Here, h(k) will give us a new index to store the element linked with k. Hash table Representation. To learn more, visit.


Let’s discuss hashing with modulo method.



How to calculate the hash key?

Let's take hash table size as 7.

size = 7

arr[size];

Formula to calculate key is,

key = element % size

If we take modulo of number with N, the remainder will always be 0 to N - 1.

Exactly array index also starts from 0 and ends with index N -1. So we can easily store elements in array index.



Initialize the Hash Bucket

Before inserting elements into array. Let’s make array default value as -1.

-1 indicates element not present or the particular index is available to insert.




Dictionary

Inserting elements in the hash table

i)insert 24

ii)insert 8

iii)insert 14



Searching elements from the hash table

i)search 8

ii)search 19



Deleting an element from the hash table

Here, we are not going to remove the element.

We just mark the index as -1. It is indirectly delete the element from array.

Example

Delete: 24



What is collision in hashing?

What if we insert an element say 15 to existing hash table?

C Program To Implement Dictionary Using Hashing Functions Pdf

But already arr[1] has element 8 !

Here, two or more different elements pointing to the same index under modulo size. This is called collision.



Hash table implementation in c using arrays

Example


We didn't implement any collision avoidance technique in the above code.

We will discuss collision avoidance in the next tutorials.


Collision Avoidance

Collision Avoidance using Linear Probing
Collision Avoidance using Separate Chaining



Topics You Might Like

Idea of a Hash Table

In this problem you will implement a dictionary using a hash table.The idea of a hash table is very simple, and decidedly hackish.

We are going to write a hash table to store strings, but the same idea can beadapted to any other datatype. We write a function (the hash function)that takes a string and'hashes' it, i.e. returns an integer (possibly large) that is obtainedby manipulating the bytes of the input string. For a good hash functionall bytes of the input string must affect the output, and the dependance of the output on the strings should be non-obvious. Here is an exampleof a good hash function (actual library code), which uses the factthat a character is interpreted as a small integer: For the C++ class strings the same function will look like this:

With some such hash function, we try the following idea. Declare an arrayof length N and, for any string str, store the stringin the slot in the array. Since the results of hashing a string appear random (but, of course, the resultfor any given string is always the same) the above number for the slotto store the string in will hopefully be distributed uniformly through the array. If we want to check if a string is already stored in our array, we hash the string and look at the slot determined by the aboveformula. The best distribution of indices for the above hash function hasbeen observed to be with the following values of N:

It may happen that two different strings hash to the sameslot number, collide. If N is large enough, there won't be many such collisions. This difficulty can be resolved as follows: our array will be an array of lists of stringsinstead of indivudual strings. These lists are called bucketsin some books. Thus the algorithm for adding a string to the hash tableis:
For a string str:

  1. Apply the hash function to str, take mod by N. This is the index in the hash array.
  2. Check every element of the linked list (bucket) at that index. If str is found, return true, else return false.

Why use Hash Tables?

Imagine a dictionary implemented as one long linked list (of strings). Finding a word in it may require going all the way to the end of the list,checking every element for equality with the search string. This is veryinefficient and slow.

Hashing

Now look at the same process for the hash table. Hashing a string does not take long (see hash_string), and then we know the right bucket rightaway. Of course, the bucket is a linked list and has to be searchedby comparing every element in it to the search string, but if the indexobtained by hashing is roughly uniformly distributed across the arrayof buckets, the buckets should not be long (on average, a bucketwill contain TOTAL_WORDS_STORED / N elements).

Thus a hash table is much better than an array or a likned list, becausewe zero in on the right bucket by hashing, and then have only a fewwords that collided at that bucket's index to check. All kinds of seriousapplications use hash tables, including C++ compilers that keep the namesof defined variables, functions and classes in one.

Exercise (Problem D)

On UNIX systems there is a file that contains all frequently usedwords of the English language, usually /usr/dict/words.Here is one, zipped: words.zip. Writeyour hash table for strings, and fill it in by reading that file.Something like this will do:There are 45402 words in my file, so reasonable choices for hash tablesize are 12289, 24593, and 49157. You can make you class keep statisticsof the largest and the average number of collisions (i.e. sizes of buckets).

Then, in the same manner as above, read in a text file and print outall words that are possible spelling errors. Write a member functionbool HashTable::Contains(const string & ). There is a Unixprogram ispell that does something like that.

C Program To Implement Dictionary Using Hashing Functions

Pesky technical problems

Notice that from the point of view of the program above, any sequenceof non-whitespace characters is a word. Whitespace characters (space,newline and tab) are seen as word separators and skipped. Thus youneed to remove any punctuation marks (except apostrophes) from your file being spell checked,before you can run the program on your text, so that 'end.' , 'end!', 'end?' and 'end' are not different words. This can be done by another small program, or by search-and-replace in any good text editor. There isalso the issue of capital vs. lowercase letters: 'And' and 'and'should be the same word. This issue, unfortunately, cannot be resolvedwith any text editor (other than Emacs) that I know of. I can pre-processfor you any text file, as follows:

Both of these preparatory tasks can be carried out with just two commands of the UNIX operating system:The first command changes any character which is not a letter in theranges A-Z or a-z or a newline n into a space. The second commandchanges each capital letter to the corresponding lowercase one. See man tr on your UNIX system for more information.Nothing like that on MS Windows or MacOS.