After my introduction to the world of open source, I had to work with various programming languages, based on the requirements of the project I was working on. I had worked on Python, Ruby, Java and C++. All Python programmers would agree that learning how to code (without going into internal details) in Python is pretty simple. A little practice and you can write a lot of powerful codes in just a few lines and certain big computations (for which even there’s no data-type available in C) can be achieved. It’s pretty similar for Ruby too, just that it doesn’t bother the programmers much about incorrect indentation. But, when it comes to performance, the C language stands way ahead of it’s interpreter-based counterparts.
Recently, I came across a new programming language — Julia. Well, it’s still in it’s late infancy but what interested me the most was that it promised an easy syntax, (like in Python) along with powerful performance (close to C), and it was interpreter-based too. I thought, “exactly, this is what we are looking for!” So, I decided to test it out on my own machine.
After glancing through the syntaxes of various commonly used commands in Julia, I coded out merge sort. The reason for choosing it was that it has a time complexity of n log n (base 2) for all cases. Hence, it would give reliable results on randomly generated numbers. I quickly programmed merge sort in Python and C too. I wrote another small piece of code in Julia (since I had started loving it!) to generate random integers. Using a shell script, I called all of them to compute the time required by each of the programs (C, Python and Julia) for number of integers from 10 to 1,00,00,000 (10⁷). The programs did not have I/O. The outcome was interesting. I present a pie chart below to illustrate the results.
The program was run on a system with 2.30 GHz 64-bit processor, 1.8 GiB memory, Linux 4.4.0, GCC : 4.0 (20160609), Python 2.7.12 (2016–06–25) and Julia 0.4.5 (2016–03–18 00:58 UTC).
10 (10¹):
100 (10²):
1,000 (10³):
10,000 (10⁴):
1,00,000 (10⁵):
10,00,000 (10⁶):
1,00,00,000 (10⁷):
From 10⁵ onwards, Julia proves to be faster than Python as it keeps reaching the speed of C. From 10⁶ onwards, where C language fails with a segmentation fault, Julia puts up a pretty impressive performance, keeping it’s dominance over Python, in terms of speed, by a huge margin.
Digging under the hood, I noticed there is a difference in the amount of memory being allocated while appending to a linear data structure in Python and Julia. Python follows a factor of 1.125 plus a constant number, 6 (click here for reference).
It follows the pattern 8, 16, 25, 35, 46, 58, 72, 88, …
But Julia follows a factor of 2 (click here for reference). Hence, when it comes to large data, Julia outperforms Python by a huge margin.
In this article, we only went into the aspects of memory reallocation, while there are several other factors which make Julia more awesome and faster when compared to Python. I will discuss more about them in my following posts.
Repository of the code for merge sort: https://github.com/americast/Performance_analysis
Reference to memory reallocation in Julia: https://github.com/JuliaLang/julia/blob/f74cb71f5aa14835069d495db4380eefad5662a2/src/array.c#L737
Reference to memory reallocation in Python: https://github.com/python/cpython/blob/202fda55c2dffe27125703225e5af92254602dc6/Objects/listobject.c#L32
Pie chart credits: www.meta-chart.com