The Rules of Optimization: Why So Many Performance Efforts Fail

The First Rule of Program Optimization: Don’t do it. The Second Rule of Program Optimization (for experts only!): Don’t do it yet. — Michael Jackson

In our era of modern, speedy machines with oodles of memory, performance is something that few coders ever need to think about; but we think about it anyway.

We think about performance even when we don’t need to, and it’s much to our detriment. It complicates our lives and the lives of other coders in our codebases. Thus the first rule of optimization: Don’t do it.

In this post, I’ll illustrate the reasoning behind the first and second rules of optimization, and I’ll provide some tips for the times when you really truly do need to undertake performance improvements.

Note: When we talk about performance we’re often referring to the speed a program is run, but the same rules apply to optimizing memory usage, battery usage, or whatever other resource might important to your program.

Part I: The Bottleneck

My first important real-world performance problem happened during a summer of research in college. A program for helping biologists identify genes was too slow. My advisor suggested that there was a part of the graph coloring algorithm that took a long time, and I should start with optimizing that.

So I did. I had a great time squeezing CPU cycles out of C++. It’s a fun form of puzzle solving. I improved the performance of the graph coloring algorithm by 70%.

After a week of intricate optimization, I ran the whole program on the full dataset. It was only 15% faster — hardly noticeable. What was going on?

I dug around a little and discovered that when a process finished, it wrote long results to a file. The program caused many processes to write at once. As the operating system’s scheduler switched among the processes, the disk had to do tons of seeks to write each file. Disk seeks are expensive.

I added a few lines of code that created a file system lock so that only one process could write to disk at once. Sequential writes are much faster than seeks, so with that change the program was about 60% faster.

With a few lines of code, I’d more than cut the runtime in half. My week of C++ optimization had not. My advisor had pointed me in the C++ direction because he’d spent the most time on it. He’d probably only spent a couple minutes writing the code that writes to files.

I can’t tell you how many times in my years of performance work across many platforms I’ve seen people waste time on optimizations that do almost nothing. The most common failed performance effort is optimizing something that’s not the bottleneck. You often think you’re optimizing the bottleneck but you’re not looking at the whole picture.

Ironically it’s often the most experienced engineers that make this mistake first . It’s easy to get caught up in our own systems and forget there’s a whole other world out there. I once saw a group of senior backend engineers spend an entire hack week rewriting a complex web endpoint to use Go instead of Python. At the end, they found out that the bottleneck for the page was on the browser side of the app (and not the server side) making their performance improvement completely irrelevant.

These were experienced engineers who knew extremely well what made the server fast and slow, but their detailed knowledge of one part of the system prevented them from looking at the bigger picture.

Always ask yourself, what performance does the end-user care about? How does your code impact that?

Part II: Complexity

Performance improvements that don’t get at the bottleneck can actively harm your codebase. Almost all performance improvements increase the complexity of the code. Unneeded complexity creates a lot of problems that are not performance-related by making code harder to understand.

Let’s take a look at another example (slightly simplified) from a friend who used to use Python for graphics:

for i in xrange(1000):my_mesh.draw()

Imagine that before pushing this, the author noticed that draw() is being called 1000 times. Since the Python interpreter has to look up methods on each invocation, time could be saved by caching the reference to the draw method in a local variable:

draw_method = my_mesh.drawfor i in xrange(1000):draw_method()

This seems simple enough but it’s liable to cause a lot of problems down the line. While right now it might not seem like it will cause bugs, imagine that it’s in a large code base. Over time, many new lines of code might be added to the loop by many different engineers. Eventually draw_method might be far away from draw_method = my_mesh.draw, and the engineer will have to skip around and lose context to figure out what that means as they read through the code.

It will definitely slow people down and it could cause bugs. For instance, somebody might add live updating:

draw_method = my_mesh.draw

for i in xrange(1000):…if my_mesh.outdated():my_mesh = updated_mesh…draw_method()

Now there is a bug. The draw method will call draw on the first mesh, not the updated one, because the wrapper to my_mesh.draw wasn’t updated.

It’s quite common to see small performance “improvements” thrown in as part of larger changes, when those improvements haven’t been evaluated for how much they help. Even what seem like simple performance improvements add a cost. The cost is complexity. Complexity means bugs and more time reading, testing, and maintaining code. Particularly in large or long-lived code bases, complexity is the biggest impediment to progress.

The change above is consistently faster on my machine by about .05ms on average. Is .05ms enough of an improvement to warrant the complexity? Most of the time, no.

You are simply not in a position to evaluate whether the complexity cost is worth the improvement in performance unless you have a big picture view of the performance of the system. Maybe you can optimize, but you certainly shouldn’t do it yet.

Part III: How to approach performance problems

But let’s say you think your system is really too slow, and you really do need you to improve its performance for empirical reasons. Here are few rules to get you started in a disciplined way.

Get the right metric

A lot of the problems I covered above come from measuring the wrong things. In my project in college, I measured the time for the graph algorithm to finish and not the time for the results to be written out. The hack week team was measuring server response time and not the time for the web page to appear to a user. Measuring the wrong thing leads to solving the wrong problem.

Making sure you’re measuring the right number is the single most important thing to do when you’re tackling a performance problem. Without that, you have no idea how much you’re helping. Similarly, don’t necessarily trust those more experienced than you to tell you the bottleneck. Make sure you evaluate it yourself.

How important are marginal gains in this system?

Once you’re measuring the right thing, it will be a lot easier to tell which improvements are worthwhile. Nonetheless, it’s worth it to spend some time thinking about what your goals are.

You need to know not only how fast (or memory-intensive, etc) the system is, but also how much marginal gain you’ll get from improvements. Do you save your company money? Do you save your users time? If it’s a script that runs once a week that nobody is dependent on, even savings of an entire minute (basically forever in computer time) might not be worth adding complexity. But if it’s a function run a million times per second across a fleet of thousands of servers, savings of microseconds could save a lot of money.

If you understand what your performance goals are before beginning your work, you can make the right call on performance/complexity tradeoffs later on. If you’re being honest with yourself, you’ll often see that you should scrap marginal gains and focus on major wins.

Measure right

Lots of things can affect your performance measurement. You might be a mobile developer with a slick test device that has only one program running on it. If your goal is to produce a response in < 10ms on average, that’ll be a lot easier on your test device than on your user’s four-year-old phone with low battery and a hundred apps running in the background.

It could also cause you to misunderstand your program’s characteristics. You might be I/O bound on a test device, when on most devices you’re actually CPU bound.

All sorts of external factors can affect performance including:

Memory usage
Server load
Network bandwidth and latency
Battery level
CPU usage
Disk usage
Various caches at all layers

When you’re measuring, try to make sure as many factors as possible are held constant so you can accurately compare different approaches. But also make an effort to understand what these factors usually look like, so you can make sure your appraisal of bottlenecks is realistic.

One simple strategy for cutting out the natural ebbs and flows of resources on a machine is to run your code many times. Python’s easy-to-use timeit library defaults to running your sample a million times. This can help average out some fluctuations in system resource availability.

Try to integrate performance tests into your build

Some software projects fail their builds if a commit causes a performance regression. If performance is important for your project, consider adding performance as part of your continuous integration. It can let you prevent performance regressions before they get shipped. The sooner you are aware of regressions, the easier they are to fix. In large codebases, small regressions can build up over time unless they’re aggressively kept in check. Tests can keep them out of your system.

Finally, the optimization

Armed with the right philosophy and information about your system, you’re ready to begin performance optimization. Don’t do it until you’ve profiled and analyzed and figured out a coherent strategy — then go forth and code!

Once you understand the basics of how to tackle general performance problems, you can start to delve deeper into your system. This is where things can get really fun and exciting. Small systems are elegant logic puzzles. Large systems are universes of slowness whose complex interlocking parts can take months to unravel. Improving performance in both environments can be beautiful.

It’s just not often the right thing to do.

For my guidance on web performance see So your website is slow? Let’s fix that.

Hacker Noon is how hackers start their afternoons. We’re a part of the @AMIfamily. We are now accepting submissions and happy to discuss advertising &sponsorship opportunities.

To learn more, read our about page, like/message us on Facebook, or simply, tweet/DM @HackerNoon.

If you enjoyed this story, we recommend reading our latest tech stories and trending tech stories. Until next time, don’t take the realities of the world for granted!