Cache Stories

During the course of Windows 8 development, I was in a meeting with a number of Windows engineers talking about async design patterns and our Office experience. The particular discussion was about how to think about a resource that is exposed to higher application levels where the application might make multiple async requests to the same resource. This led to a discussion of caches and a certain high level Windows engineer made the comment that the caching should be at the lowest possible level in the system while I responded that it should be at the highest.

Rather than devolving into an argument, we both laughed. This was just a perfect example of why it’s so hard to get caching right in the first place. We were both right and both wrong.

Caching is the universal performance strategy. You have something expensive to compute or fetch. By holding on to it for a while so you can cheaply return it to another request, you can quickly amortize the cost of that request. Return it one extra time and the amortized cost is cut in half, and continues to reduce in cost the more “cache hits” you get. In many cases, proper caching can be the key strategy for achieving good performance.

Of course, there’s no such thing as a free lunch. If you’re caching something, you are allocating memory resources to hold the cache and you’re using computing resources to both fill and search the cache. You separately need strategies and introduce overhead for invalidating or otherwise managing the cache. This adds complexity to the system. Since the cache behavior and effectiveness is inherently related to the dynamic characteristics of the system, tuning the cache can be exceptionally hard.

A familiar dynamic is that an engineer gets assigned a performance bug around some specific scenario. They find that introducing a cache into the system can solve their local problem. But they don’t do the full analysis to really understand how the cache changes the overall characteristics of the system, including how it behaves across a wider range of scenarios and it’s general impact on the system. We sometimes talk about space/time as a trade-off, but there are many cases where significant overall performance wins can come from just focusing on keeping memory requirements down and instead doing computation on the fly. A word processor is an example where we carefully discard and recreate on demand information about line and word breaks in order to reduce overall memory use rather than caching and holding on to them. In those cases, time improvement comes with space improvement. Touching memory with poor locality is an incredibly expensive operation on modern processors.

Getting back to the original discussion, the argument for placing the cache at the lowest level of some subsystem is that that level provides the most opportunity to share the benefits of the cache. It also has exposure to the full dynamic behavior of all the clients of the subsystem, so it can integrate all their requirements into one caching strategy. This also prevents (sometimes) the pernicious behavior where multiple clients or multiple levels in the overall system cache duplicate copies of the same data, possibly with different and poorly interacting invalidation strategies.

However, there are a few disadvantages of this approach. The most common is that higher levels of the system might have a much clearer model of the dynamic characteristics of their usage and therefore can implement both a more accurate as well as less expensive caching strategy. For example, they might know they will request a series of resources in a fully sorted way, so that a simple one-element cache would result in both a cheap, as well as “perfect” caching strategy. This same kind of dynamic often plays out when we achieve performance wins by using custom memory allocators for cases where the allocation/free pattern is well-understood.

The more subtle issue you run into when doing the caching at low levels is you end up with an API that has widely different performance characteristics. If you’re caching some resource obtained across the network, you can have an API that ranges from almost instantaneous to taking an arbitrary amount of time to complete. One thing I like about an explicitly async pattern is that you make the potential latency explicit in the API design and you force the consumer of the API to acknowledge it. Mark Russinovich had an interesting post about a hang that arose from misunderstanding the performance characteristics of an API, principally due to its internal caching strategy. In that case, the problem was further complicated by a cache that would invalidate after 30 minutes, making the problem even less predictable and hard to diagnose.

Caches can also introduce end to end issues, since the cache is inherently presenting information that may be out of date. While this kind of issue is always present in a distributed system, it is often an end-to-end issue, and adding caching “transparently” at a low level can make it impossible to address effectively.

Anyone that’s been programming for a while has a trove of war stories around caches. There was an interesting case in Word about the cache that was used to optimize access to paragraph properties. Because the properties of many paragraphs are the same, Word implements a cache that allows the storage for identical property sets to be shared across paragraphs. When Word added independent resource IDs to each paragraph to help provide higher fidelity merge and compare, this became an intentionally unique field in each property set and made the cache useless. It was a couple releases before this problem was even identified and fixed.

Patrick Dussud (private communication) related looking at Windows Presentation Foundation performance and identifying numerous internal caches that could be eliminated, including several where the cache was carefully populated and searched but in fact never returned a cache hit. Undoubtedly the cache was introduced at some point in development where it was having an impact but other changes later made it ineffective.

When Outlook was originally written to run against MAPI, the synchronous design of the API led the team to assume that MAPI was internally caching data from the server in order to optimize client-server interactions. It was not. The team needed to go back and introduce an entirely new caching layer in order to get decent performance. When I joined Microsoft, my interview loop included a discussion with Jon DeVaan (head of Office at the time) about MAPI caching strategies — I had been dealing with these issues when adding MAPI support to BeyondMail, a PC email client that supported multiple types of backend servers.

These types of stories lead to the most obvious and uncontroversial guidance on caches: make sure they are fully instrumented so that their cost and benefit is well understood and tracked over time. This is important when they are initially designed, but it is absolutely critical when “programming in the large” with a big team over multiple releases — which of course is the characteristic of most of the code I was working on and responsible for in Office.