When I faced a search problem in Ruby the first thing that came to my head was a binary tree (yes, I’m a weirdo). After some search about it, I decided to create an open-source tree of my own so anyone can just download and use it in the future.
Here I try to explain what makes a person use his free time to code a data structure just because.
Binary trees can be dangerous
Trees are data structures that characterize a hierarchical relationship between its data — let’s call it nodes. It means that each data set is hierarchically subordinate to the one before it.
I want to point three important characteristics in binary trees: 1) The first node, that we are going to call root, is the only node that is not subordinated to any other one; 2) each of the other nodes is a binary tree — called sub-tree; 3) the relation between nodes is a “before-after” relation.
So, you create the root with your first value, being it a number or a string. To add the second value you must compare it with the first one: if it comes before (being a string you compare alphabetically, being a number if it is less than the other) you create a new node to the left (that I called west), otherwise, you create a new node to the right (guess what? I called it east), creating the second level in the tree. The number of levels in your tree, or sub-tree, is called depth. In the example above the sub-tree at the left of the root have a depth of 2 while the one at the right has a depth of 5. The depth of the tree is 6.
Example of a not balanced tree created bu inserting the numbers 3, 6, 10, 9, 5, 8, 1, 4, 7, 0 and 2 in this order.
But, why should I use trees? Simple: for the oxygen, the fruits and it’s beautiful in your garden. Oh, you are asking about the data struct? Well, the answer is also simple: the search time is exponentially smaller than a simple array. If a tree is well balanced (we will talk more about it later), the number of comparisons that you have to do to find your data is, at maximum, log n in base 2 where n is the number of elements in the tree. In a simple array, this number is n.
First things first: I’m not a computer scientist. I’m a newbie trying to do a complex struct and show the way so some people can do it later. So some things can be different from academic concepts.
With that clear, the first thing I’ve done was to code a class that will create the nodes. I called it Node and it has only the initialize method. This class set the node’s value, in this case, x, all the data that I want to store in the tree, in the args array and a nil value for the east and west pointers. These last two will be pointing for the next nodes in the tree.
The next step was to create a second class, that I called BinaryTree and initialize the tree’s root as nil.
The first method in this class is the one to insert the values in the tree. It is not complicated. You always start to look at the root. If it is nil you create it with your value and your data (x and args in this code). If not you compare both values. In case of a “before value”, you go for the node pointed by the root’s west, else go for the east. You repeat it until finding a nil node.
The second needed method is the search method. It works the same as to find the nil node in the insert method, compare your value with the node’s one. If equal return true, else go to east or west depending on the comparison. If the method finds nil it returns false.
The last needed method is the balance method, but this a topic for another time.
Is it another time already? Ok, let’s talk about the balance method. The search method in a binary tree is as efficient as its depth. In the best scenario, the number of nodes in the tree is between 2 ^ (depth — 1) and (2 ^ depth) — 1. So, if you have a depth of 6, like in our example, the number of elements should be between 32 and 63. In the worst case, it became as inefficient as a list and we don’t want this.
How worrying is an unbalanced tree.
A balanced tree is the one where the difference between the depth of both subtrees is less than two for all nodes in the tree. So, if you have a tree of depth 6 there is no node in levels 4 and below that aren’t filled.
One way to balance a tree is node by node. You check a node. If this subtree is not balanced you “pull” it for the side with smaller depth. The image below exemplifies this. Of course, it is more complex than just “pull” the nodes, but this is the gist of it.
But that is not what I did, I choose the long and easy way. I created an array with all the values in the tree in the crescent order. Then I created another tree inserting the value in the middle, dividing the array in two and repeating the process in the sub-arrays until all values were inserted in the new tree. Doing this the example presented in the beginning becomes this:
A well-balanced tree with the same values as the example used before.
Create, search and balance were not the only methods created. There are methods to:
- delete a node;
- edit the data inside a node;
- load a file creating a tree with its data;
- save a file with the created tree as simple data;
- print the tree in crescent order;
- return the tree’s depth;
- return the number of nodes in the tree.
But an idea that I already started to implement is what I’m calling a bi-dimensional tree: why not a tree with two search parameters instead of only one? Come and see it.