Alex Landau


Why do we cache?

Caching is a fundamental and omnipresent concept in computer science. From memory caches used by your computer’s CPU to the browser cache which seems to be responsible for every mysterious “issue” with a random website (“have you tried clearing your cache?”), caching is everywhere.

Most developers understand the basic gist: caching makes things faster. When you want to grab some data that is expensive to look up (in terms of time or other resources), you cache it so that next time you want to look up that same data, it’s much less expensive. But what does this look like in the real world?

Big wins from low-hanging fruit

Caching is extremely important because you can get dramatic performance improvements without a lot of effort in certain situations. Imagine you’re IMDB and you have a lot of data that doesn’t really change, like movie metadata (think summary, actor information, location details). That data is probably stored somewhere — perhaps a SQL database. When a user visits the web page for a particular movie, they are presented with all this information that is essentially static.

Because you don’t expect the basic metadata to change, it’s wasteful to go look it up in the database on every single page visit. It would be far more performant to look it up in a cache (local to the web server) than fetch it from the SQL database, which is remote from the web server—at a minimum, another network hop. This also creates a lot of extra unnecessary load on the database.

That much is easy enough to understand. In what situations does it really matter though? After all, modern SQL databases are pretty powerful.

With great scale comes great responsibility

Think of a really popular movie. Maybe Netflix just released Goodfellas worldwide (we can only hope!). Lots of people from around the world are visiting the Goodfellas page on IMDB. Woah, it’s getting 10,000 page views every second. Your SQL database is getting hammered. In addition to the extensive regular traffic the database receives for various other IMDB functionality (comments, image uploads, reviews…), it’s now having to handle a huge amount of extra read traffic.

Thankfully most people in Amazon’s upper management don’t look this scary.

In a desperate attempt to keep the entire website from keeling over, you add a cache to the web server (my personal favorite is Redis). Whenever a page tries to look up the Goodfellas movie metadata, it gets the data directly from the cache, which is a process running locally on the web server that returns the data in a few milliseconds at most. Not only have you dramatically cut down the response time, saving a network hop and a slow CPU on your database, you’ve also significantly cut down the load on that poor, overworked database, saving other features in your application from failing.

It isn’t as easy as it looks

Obviously I’ve oversimplified a lot of the nitty-gritty details that go into building a robust caching system. Challenges include:

  1. Dealing with dynamic data. If you expect data to change often, you don’t want to cache it for too long, otherwise your clients will have an inaccurate/outdated view of the world. After all, cache invalidation is one of only two hard things in computer science.
  2. Failures. What happens when your cache goes down? Can your backend systems handle the increased load? One solution is to build multiple levels of caching. For example, a local cache for each application server as well as a remote cache fleet shared among all application servers.
  3. Deployment. Local caches will likely be cleared when the web servers are re-deployed, or at least will be empty when new application servers are spun up. How do you prime the cache to avoid a spike in backend traffic each deployment? How do you deploy updates to your shared cache fleet without knocking over your backend systems when the caches get cleared?

But sometimes it is

Despite these complexities, I wanted to include a bit of data from a real world example to show that you can, indeed, get big wins with little effort. In my spare time I created a little web app called Lineup, which lets you create Spotify playlists from a list of artists.

Lineup includes an autocomplete feature to search for artists from Spotify as you type. Each time it fires, a request is made to Spotify’s search API to get back the top 5 matching artists for the query.

That’s a lot of requests to Spotify’s API! Imagine a few thousand people using Lineup at once (only in my wildest dreams!). What if all of the users are trying to search for the same few popular artists at once? For example, say they’re trying to put together a playlist with all the artists playing at Coachella this year.

They heard Hans Zimmer is going to be there. Yes, THAT Hans Zimmer.

We don’t expect Spotify’s search results to change all that often. I mean, how frequently are new artists getting added, for whom users are likely to search? Probably less than once per hour. A lot of those search requests to Spotify’s API are going to be redundant. We should be able to safely cache the search results for a given query for at least a short amount of time.

To investigate the kinds of savings an artist search cache makes for Lineup, I wrote a script to query Lineup for all the artists who are playing at Coachella this year. The script randomly searches all the prefixes of the artist (e.g. “B”, “Be”, “Bey”, … for “Beyonce”), as well as lowercase variations, to simulate users typing the artists into the Lineup web interface. The script ran for a total of 36,002 searches, at which point the cache was completely full (and all subsequent requests would be cache hits).

Delicious, juicy low-hanging fruit

We’re interested in seeing how many cache hits occur, how much faster the cache hits are than misses (which result in slow remote calls to Spotify’s API), and what the impact of the cache is on our error rate. After all, Spotify’s API will rate limit us at some point, resulting in server errors for Lineup and a poor user experience. With five threads running the requests in parallel, here are the results:

  • 33016 cache hits, 2986 cache misses
  • The mean response time for cache hits was 3.22 ms. The 90th percentile (the value which the 10% slowest requests were slower than) was 7.19 ms. By comparison, the mean response time for cache misses (served by Spotify’s API) was 146 ms. The 90th percentile was 163 ms. Obviously the cache hits will serve responses faster, but I wanted to make it clear how much time you can save if you recognize places in your application where you can apply a cache.
  • Starting with an empty cache, all requests went to Spotify’s API, which resulted in throttling. As you can see in the graph below, as the cache filled up, the server errors dropped to 0(and the mean response time across all requests plummeted):

While Lineup might be a niche use case, the idea is that even a simple cache can profoundly improve the performance of your application, especially as it starts to scale.


I hope it’s clear why caching is important, and the amazing wins you can achieve even with a simple caching scheme. I think caching is so crucial that it should be accessible to every developer with ease. So I wrote a library for Python, Cachual, that aims to do exactly that. It lets Python developers cache Python function return values with a very simple decorator. My next post will go into my motivation and the details of Cachual.

Topics of interest

More Related Stories