Data Structures in JavaScript pt 2 — Hash Tables

Written by hackernoon-archives | Published 2017/08/06
Tech Story Tags: computer-science | javascript | hash-table | programming | data-structures

TLDRvia the TL;DR App

Last week, we explored the binary search tree data structure and how to implement them in JavaScript. This week we will look at another data structure known as a hash table, also known as a hash map. Hash tables allow us to look up and insert data in constant time or O(1). In many situations, a hash table turns out to be more efficient than search trees or other table lookup structures. You can find the full repo for my hash table build along with a few other data structures here — https://github.com/mega0319/data-structures

Hash tables implement associative array abstract data types — which basically means it maps keys to values.

How do they work?

A hash table will be initiated with a certain number of ‘buckets’. Suppose we initiated an array with the size of 32. Think of these ‘buckets’ as empty slots in an array. Each of the indexes from 0–31 can be considered a ‘bucket’.

Hash tables use a hashing function in order to figure out where to store data. The hashing function will take in a key from the data we intend on storing, and return an index. Sounds simple, but hashing functions can be extremely complex!

In the case of building out a hashing function for our hash table, we want a function that will return the same index, given the same input. For example, let’s say I want to store a node with a key of “rocky”. Every time I run the key “rocky” through my hashing function, it should return the same index. When we insert the data, we get, let’s say an index of 16. We then look for the bucket with the index of 16, and store out data in that bucket. When we want to look up the data in our hash table we run the same key through our hashing function, we get an index of 16, and search that particular bucket to see if it exists. This is the reasoning behind constant or O(1) lookups.

Ideally, our hashing algorithm is designed so that each key returns a unique index and distributes our data evenly across our array of buckets, but most hashing algorithms are designed with imperfections. What happens if two different keys return the same index? In this case, we would have something called a hash collision.

Not this kind of collision

Collisions can be minimized to a certain extent depending on how complex the hashing function is but they are inevitable. Even if we had a million buckets and wanted to store 2000 different keys, there is still a chance that we will have some sort of collision.

According to the probability theory, and the birthday problem, there is a 95% chance that there will be a collision, even with a perfect hashing algorithm. The birthday problem or birthday paradox states that given n people selected at random, some pair of them will have the same birthday. With just 23 people the probability is around 50% and with 70 people the probability is 99%. The probability actually increases to 100% when you get to 367 people, since there are 366 possible birthdays including February 29th. Makes sense…

There are several approaches to handle this collision issue, and I will show one such way to accommodate collisions, but we won’t go into too much depth in this blog. That being said, let’s start building out our hash table in JavaScript.

So the first thing we need is a couple of constructors.

hash table and hash node constructors

The first constructor is for the hash table itself. It will take in an integer to specify the size or number of buckets we want for our hash table. As you can see in line 11, I have initialized a hash table with 30 buckets. It will also keep track of the number of buckets we currently have in our hash table.

The second constructor is for a hash node. This node will take in a key, and a value. You can ignore the next part for now, as we will get to that later.

Next, we will build out our hash function.

hashing function

Let’s walk through how our hashing function works. Remember that this algorithm’s sole purpose is to take a key, and return an index so that given a particular input(key), it will always return the same output(index). The key is a string. We will iterate through that string and find the character code for each letter using JavaScript’s charCodeAt function. Once we find the character code, we will add it to our total. We end up with some number stored in total. Then, we take that number and mod it by the number of buckets. This step ensures that we return an index between 0 and 29, which is what we want since we have 30 buckets. For this step we will always mod by n number of buckets to obtain an index anywhere from 0 to n-1. And that’s it! Now we have a hashing function that will return an index.

Next stop, insertion!

insert function

Here is where things start to get interesting. Our insert function will take in a key and a value. The key and value arguments will be used to create a new hash node. Before we create the node however, we must first figure out where to put the node…

Our first step is to find the index. On line 32, you can see that we are calling our hashing function, passing it the key and obtaining an index and storing it in a variable. This is the index or bucket location, in which we want to place our node.

Next we check to see if the bucket at that index is empty. If the bucket is empty we can create a new hash node and place it in that position. Easy!

What if the bucket is not empty? Oh man, hash collision…

It’s really not this serious

Not to fret! Before we assume it’s a collision, our insert function will check to see if the key in the bucket is the same as the key passed in. If it is, we can assume we are just updating the value. In that case, line 36 will just set the value in that bucket to the value passed in. If this is not the case, then…

chaining nodes in a collision

We will chain the nodes. This is where the node’s next property comes into play. On line 38, we begin to traverse the chain of nodes in that bucket. First we set the bucket in that index to be our currentNode. Using a while loop, we check to see how deep the rabbit hole goes. Throughout this loop we will continue to check to see if we happen to find a node with the same key, and edit the value of that key. If not, we will traverse downwards until we find a node that has its next pointer set to null. This means that we have hit the final node in the chain. We will then attach a new node to that position in the chain. The above image gives a good visual representation of how this looks.

Now we will build out our get function.

get function

The purpose of this function is to search through our hash table and look up nodes using a key that we will pass in as an argument. Similar to our insert function, the first thing we do is run our key through the hash function to obtain the index. Once we obtain the index we can run a lookup in constant time. I want to emphasize the reason this works.

A quick refresher — an index in an array essentially references where in memory our element lies. Take a look at this example.

let arr = [1,2,3,4,5,6,7,8,9]

arr[3]
// 4

When we create an array, our elements are being stored somewhere in memory. Since the array itself keeps track of the memory address in which our elements start, the index helps to calculate the address for each element in our array. This calculation is done in constant time or O(1).

So because we are able to obtain the particular index using our hash function, we are able to check to see if that key exists quickly. There is a caveat however. The more data we store in a hash table, the slower it gets. You can imagine that we will experience many collisions as we add data. Once this happens, we will have to traverse down the node chains in our buckets more frequently. This changes our average lookup to linear time or O(n).

Very much like our insert function, if the key is not the first key in the bucket, we will traverse down its chain to look for the key. If we still don’t find it, we will return null.

Our final function will be retrieveAll.

retrieveAll

Retrieving every node from our hash table reveals one of its weaknesses — traversal. At the end of the day, our buckets array is an array, and so we must iterate through each of the buckets and then for each bucket iterate down the node chain to touch on every possible node. The runtime for this function is quadratic or O(n²).

Building out a hash table, like all the other fundamental data structures, is a good exercise to understand how they work by testing their strengths, and uncovering their weaknesses.

I hope this helped!

END


Published by HackerNoon on 2017/08/06