Oleksandr Kaleniuk

@okaleniuk

Reasoning about performance Linux style. Part 1: preparing the experiment

This is not at all trivial. We all love when our software runs fast, we love arguing about the ways to make it run fast, we love to write highly-performant code and brag about it. Sadly, it is more of a fun social activity than engineering.

The thing is, performance is not an inherent property of a code. It’s all much more complex than that. Consider radix trees. The larger the radix — the shorter the key, hence fewer operations per read, right? Right. But enlarging the radix also makes the size of the whole structure boast immensely. At some point, it can no more exploit cache efficiently and these fewer operations take simply more time per each because of that.

It gets worse considering different hardware have different cache sizes and even different cache levels. All your justification, backed by elaborate measurements, for using “the best” radix for the particular algorithm blows off in a couple of years just because your customers upgrade their hardware regularly.

But it doesn’t mean you shouldn’t measure. You should. In fact, you just don’t have any better alternatives.

You may of course just assume that some piece of code should run faster that the other, but most of the time this would be no more than self-deception. Yes, this is the easiest way, but it doesn’t mean it’s the cheapest. All the bad decisions, made solely based on assumptions, cost the industry about 1.5 billion dollars per year* in maintenance cost alone.

What are we measuring then?

I get questions deserving trial every now and then. The last one was about C++ standard function objects. Let’s say we decided to use one as a callback that may or may not be defined in the runtime. Every time we want to call this function, we have to check if it’s there. But wouldn’t it be more efficient to predefine some empty lambda as a default and run the callback every time we want regardless of was it deliberately redefined or not?

Well, let’s find out.

The default C++ compiler for my Linux distributive is g++ 5.4.0. It is a rather sophisticated thing that is made of millions of source code lines itself. It is designed to make my code as efficient as it can and we also want to take this into account.

The backside of this optimization capability is that we have to design our experiment very carefully. If we allow input to be static, then it might just precompute pieces of our algorithm in the compile time. Then the part of it will not show on the measurement. And if we make our output irrelevant, it might not do the computation at all. If we don’t care for the result, then not doing anything is the best way to get everything done.

I came up with the following pieces of code. One for runtime checks:

#include <cstdlib>
#include <functional>
#include <iostream>
#include <limits>

int main(int argc, char** argv){
std::function<void(int)> check;
if( argc == 2 ){
int check_against = atoi(argv[1]);
check = [=](int number) -> void {
if( check_against == number )
std::cout << number;
};
}
for(int i = 0; i < std::numeric_limits<int>::max(); ++i)
if(check)
check(i);
}

And one for predefined function-object:

#include <cstdlib>
#include <functional>
#include <iostream>
#include <limits>

int main(int argc, char** argv){
std::function<void(int)> check = [](int){};
if( argc == 2 ){
int check_against = atoi(argv[1]);
check = [=](int number) -> void {
if( check_against == number )
std::cout << number;
};
}
for(int i = 0; i < std::numeric_limits<int>::max(); ++i)
check(i);
}

We get to measure each of them against the two scenarios: one when the check is set and the text program is supplied with and argument, and the other — when it’s not.

C++ has enough options to benchmark our code. But since we want to explore Linux, we shall start from the time.

It simply runs the program and then writes back how much time does it take to run. It accounts system time just as well as user time. In Linux, system time is the time our program spends in the system kernel and user time — in the user space, that’s where all the task specific computation is going. Since we don’t really call anything special from the kernel, the system time for our program is close to 0.

Now, of course, to evaluate the results properly we would have had to undergo some statistical analysis to eliminate the stochastic error. But this particular measurement is very very crude anyway. It doesn’t give us a reliable benchmark, but it can bring up some interesting info anyway.

So that’s what I got on my PC.

okaleniuk@bbn:~/performance_experiments$ time ./fun_check
real 0m2.092s
user 0m2.088s
sys 0m0.000s
okaleniuk@bbn:~/performance_experiments$ time ./fun_default
real 0m7.034s
user 0m7.032s
sys 0m0.000s
okaleniuk@bbn:~/performance_experiments$ time ./fun_check -1
real 0m7.023s
user 0m7.020s
sys 0m0.004s
okaleniuk@bbn:~/performance_experiments$ time ./fun_default -1
real 0m6.992s
user 0m6.988s
sys 0m0.004s

In conclusion

We may conclude that when the function is not set, then the check version wins doubtlessly. And when it is, then the default seems to be marginally better. We may conclude this other part, but we shouldn’t! We only measured it on my machine, with my compiler, and with a crude tool. The best conclusion that we may pull out of this experiment is that we need more data.

And we’ll get some in part 2.

  • — I totally “assumed” this number up. You see, how both easy and unhelpful it is.
Hacker Noon is how hackers start their afternoons. We’re a part of the @AMIfamily. We are now accepting submissions and happy to discuss advertising &sponsorship opportunities.
To learn more, read our about page, like/message us on Facebook, or simply, tweet/DM @HackerNoon.
If you enjoyed this story, we recommend reading our latest tech stories and trending tech stories. Until next time, don’t take the realities of the world for granted!

More by Oleksandr Kaleniuk

Topics of interest

More Related Stories