Immutability is one of the building blocks of Functional Programming. Here are some of the advantages of using immutable objects.
If you want to learn more about why you should use immutable objects, this is a nice article to read.
But the first question that comes to the mind when talking about immutability is the performance. For an example, say that we have an array of integers. We need to change one of the integers in the array. Now if we want to stay immutable, instead of changing the array in place, we need to keep the original array intact and return a new array with the changed integer. For this we need to create a new array and copy over the old elements. This is much more expensive than changing the array in place.
In this article I am going to talk about a method that is used to optimize immutable data structures, called “Structural Sharing”. To begin let’s learn what a persistent data structure is.
This is how Wikipedia introduces a persistent data structure,
In computing, a persistent data structure is a data structure that always preserves the previous version of itself when it is modified. Such data structures are effectively immutable, as their operations do not (visibly) update the structure in-place, but instead always yield a new updated structure.
These implementations are optimized a lot to improve performance. Structural sharing is one of the techniques used for optimization.
First, if you are not familiar with Trie data structure, we need to first understand what is a Trie. Trie is a special kind of a tree data structure. Take a look at the following image from Wikipedia.
This is different from a binary tree since no nodes specifically stores the key associated with that node. The node’s position in the tree defines the key it’s associated with. All nodes under a certain node have a common prefix. Values tend to be only associated with leaves (and some inner nodes if they have a significance). Tries are commonly used to store dictionaries. As we can see from this example we can easily validate words and obtain suggestions for partial words. Please refer here to understand Tries in detail.
Since now we know what a Trie is, let’s take a look at how to represent an array using a Trie. Take the following array as an example,
[“red”, “green”, “blue”, “yellow”, “pink”, “purple”, “black”, “white”]
Let’s see how we can try and represent this using a Trie.
So now you would ask me how do you get the element at index 1. If you follow the path “001” (0 means the left node, 1 means the right node. look at the diagram) from the root node you can find element with index 1 (Green). Each leaf node has a unique address. Using those, we can index elements.
Now let’s see why we represented the array as a Trie. Say we need to change the last element of this array from “White” to “Brown”. We want this to be done in a persistent manner. Therefore changing the original data structure is not a solution. Let’s take a look at the following diagram.
You can see that the old root is still there and you can access the old array using that. And the array after the new element is added, is the structure with the new root. We are creating a new array with the new element by reusing the old structure. If we had a traditional array we had to copy all the elements.
We have taken a look at how to represent a numerically indexed data structure as a Trie and use structural sharing to optimize it as a persistent data structure. Next let’s take a look how to represent a object with non numeric keys.
It is pretty simple. In a hash map we get a numerical hash for each key and use that to create a data structure. We can use the same idea here.
hash('a') = 97
Lets take a hash value of a string as the sum of it’s character ASCII values. Then we get the hash of ‘a’ as 97. We can use this value along with modular operation to create a Trie to represent a data structure. If you want to know more about how hash maps work please refer here. With the Trie data structure we can handle modifications to the data structure persistently in a more efficient way.
Considering branching factor, we used two way branching in the example for simplicity. But for a big array this would mean a very deep tree. Deep trees offer a lot of sharing thus reduces memory usage, but as the tree gets deeper time for modification increases as well. There should be a balance. Clojure (a functional programming language) uses 32 way branching. This provides a good balance. A Clojure array of one billion elements only goes 6 nodes deep. You need 35 billion nodes to hit 8 nodes deep.
Traditionally if you want to create a new array and change an element, it would take O(n) time. But with the representation of tries and using structural sharing it could be taken down to O(log(branching_factor) n). Since branching factor is a constant this would mean O(log n).
In this article we have learned how to implement efficient persistent data structures using structural sharing with Tries. For more information please look at the references.
If you have enjoyed this article, claps are welcome !!!