Arrays in Ruby: Benefits and Costs to Use Them

Data structures are eagerly introduced to new programmers, since they will be using them pretty much through their whole career. Knowing them, how they work, what are they good for, which one to use, are things that can change the performance of your solutions quite vastly.

On this article, as the title suggest, I will be talking arrays with ruby, and going into a bit of detail here and there about how this language manages them.

Article is intended to beginners with a general understanding of basic concepts.

Some ruby array knowledge is necessary to follow some examples, such as adding elements, concatenation, and looping.

Arrays. What, how and where.

Arrays are, as the word semantic suggest, a consecutive collection of objects. This definition is quite literal in most languages, meaning that objects stored in said structure would be placed sequentially in memory as well as in the abstraction that the developer will be using at a high level.

Usually, we would find two different 'types' of Arrays. Static, and Dynamic. Static arrays are those which we define the size, and that is set in stone. Dynamic arrays, in the other hand, grow or shrink, typically used when we do not know the final size of this structure.

Check this next example of an implementation (using C not ruby, but bear with me)


#include <stdio.h>

int main()
{
    int intArray [10];
    printf("Size of int: %ld bytes\n", sizeof(int));
    for (int i=0; i < 10; ++i ) {
        printf("\nElement\t%p",&intArray[i]);
    }
    return 0;
}

Now, this trivial piece of code, when executed will output a very literal representation of what an array in a programming a computer usually is.

Size of int: 4 bytes

Element	0x7ffe9a634b50
Element	0x7ffe9a634b54
Element	0x7ffe9a634b58
Element	0x7ffe9a634b5c
Element	0x7ffe9a634b60
Element	0x7ffe9a634b64
Element	0x7ffe9a634b68
Element	0x7ffe9a634b6c
Element	0x7ffe9a634b70
Element	0x7ffe9a634b74

We defined a static array of integers. More specifically, a static array that will hold up to 10 integers. We can see that every integer, will have up to 4 bytes of data and even thought our array is still empty, memory is already reserved an ready to use.

Not only that, but their address are separated by these 4 bytes in particular, which gives you an idea of how it could work. You have the first memory cell address, if you go 4 bytes from there, you will find the second element. So, we can assume that something like this :

array[4]

could work by starting on the first direction and adding 4*[size of object] to that address, to get to that direction and get the value in it.

Why is it important that memory is already reserved ? Basically, because the first concept of "everything is next to each other in memory". Therefore, you first need to find a place in memory with [size] consecutive free spaces before start placing them.

"But wait, I always use .push(element) in Ruby and never had an issue with reserving memory beforehand ". That is correct! And that is because you were using a dynamic array.

Dynamic Arrays

A dynamic array is a particular structure that, although still being an array, is a bit more complex and has some more functionality added to make your life easier by making the everyday array feel extremely flexible. And don't worry, no more C. For now on I can use Ruby.

Let's start by writing a few plain examples:

array = []
5.times do
 |x| array.push(x) 
end
p array
p "First array length: "+array.length().to_s
prefixSumArray = []
array.each do
  |x| prefixSumArray.push(x+5)  
end
arrayCopy = array
array.concat(arrayCopy.concat(prefixSumArray))
p array
p "First array length after... playing: "+array.length().to_s

All right all right. That's a lot of action. Lets break it down.

array = []

Defining a new array with 0 elements, therefore, from i = 0 to... well 0.

On the first loop, we are adding 5 elements with .push method. And let's stop there. We know that array.push adds an element, and if this array did not have that original length then, it would make it bigger.

Now that you have 1 element, on an originally 0 sized array, your memory reserved from this array is for one element. Right? Well, no. Since we are working with dynamics array, Ruby (and usually every other approach) have some extra space reserved for every element that you explicitly say that your array will have.

So right now, your memory block for this array would look something like this: [0][]. Adding one more, would be [0][1]. Next step, [0][1][2][][][]. And by this point you might have already noticed, that when we end our new dynamic array after all this push and concat and adding, we will have quite more memory reserved than we wanted to use, or that we ended up using.

On average can waste up to O(n) . But you might use all of it after all !

"Oh no! But I did not wanted to waste space ! Why would they do that?" Good question, and basically is just a trade. You are trading space complexity for time complexity. You are making your code faster, by using a bit more space to help you. What goes into every push for a new element, outside the original range of the array, it can be quite expensive ;

At the very least you will be going through the memory and look for a place where your array will fit. And this involves looking and reserving again, plus some cleaning of memory that you wont be using any more.

With all this in mind, every time you are using a built-in Ruby method that involves expanding an array, what is happening under the hood might be actually that. So that one-liner may not be the best for solving problems where time is crucial or you have to manage big chunks of data.

Arrays & Hash.

I noticed, learning about the subject, that every time you get a new tool you overuse it a bit. Kind of like a new toy. So a lot of people tend to use Hash when there is really no need for it. (I've been there. And I've seen a lot of people there).

For instance, if you need to map objects like { name: string, telephone: integer } ; with 0..25, you can easily solve that in... well, infinite ways. But lets check a few obvious ones just for the sake of benchmarking.

# Your code here!
require 'benchmark'

def doByHash
  hash_test = Hash.new()
  100001.times { |x|
      hash_test[x] = 1000000+x
  }
  hash_test
end

def doByPush
  array_test = []
    100001.times { |x|
      array_test.push(x+100000)
    }
    array_test
end

def pseudoAlloc
  array_test = Array.new(100001)
    100001.times { |x|
      array_test[x] = x+100000
    }
  array_test
end

Benchmark.bm do |x|
  x.report { doByHash }
  x.report { doByPush }
  x.report { pseudoAlloc }
end

You can run an implementation here. But I highly recommend to benchmark it on your local computer so server connection does not affect the result. Also run it several times to get an average and beware of the numbers that you try to test !!

The average total time results that I got for n = 100k was :

The amazing hash-man with : 0.026240 # hash
Your friendly neighbour the dynamic array that will eventually identify himself as a linked list : 0.014025 # array with push
The edgy teen who still is a dynamic array: 0.012692 # array with init(size)

RAW_SINGLE_OUTPUT

        user     system      total        real

   0.031000   0.000000   0.031000 (  0.025804)

   0.015000   0.000000   0.015000 (  0.015524)

   0.000000   0.000000   0.000000 (  0.012109)

Code is just stating the obvious in terms of speed in this particular situation. Hash are specially different in behaviour when it comes to this, because theoretically each node can be anywhere in memory. And, as we can see, given that Ruby, as far as my lectures goes, do not have static arrays, even if you define size by default the speed difference that you will get is not a lot that if you did not bother.

Keep in mind that this will grow over data size. Sometimes, you will be manipulating data in a time-sensitive code. Reducing complexity of your algorithm is out of the table since you are already doing the best it can. What is left? Hardcode optimizing. And this little trick is one of a few that I've been encountering in my journey with Ruby learning.

Closing remarks on Arrays.

A bit on Statics:

To sum up this bit of data. When we are thinking static arrays, we have that cute advantage of "no wasted space". And, that is great ! Only if you know you will be using that. Because once you defined that array, that's it. No deleting, no inserting. Will not get bigger, and will not get smaller. Unless you delete it, and do it all over again.

A byte on dynamics:

When we are using dynamic arrays, we will be potentially wasting space and a bit of time here an there. But also you will gain so much flexibility, that I am sure you will be jealous of people who wrote that source code for having so much fun. Inserts & deletes, are now way easier for you to use. Same index complexity than his static array brother and O(n) of wasted space here and there is your only price for those extra functions.

Since you are using Ruby, I am assuming, you will not think about the edgy static array too much and all that nonsense about memory. But now you know that from time to time, that last push(number), might have caused quite a bit of work on that harmless array of yours.

Conclusion

In this article we talked a little about what dynamic & static arrays are. And a quick pick on what are they doing. Now, please keep in mind that I've been bashing on hashes, on THIS particular situation. And if you paid attention, I said that three times. (this particular situation. 4)

There is a reason why hashes are EXTREMELY useful. That of course is out of the scope of this article, but the point remain. Make sure you always have more than hammers to use, or everything will look like a nail !

For more information about Ruby there is no better source than themselves. Don't be afraid to glance at the source code of every function to know what is going on just a glance thought... people who go in there, usually don't come back.

Thank you for stopping by!