This article covers one of the secrets of high scalability and performance. A blog post about Flickr Architecture that has more than 5 billions photos brings us the following: Caching and RAM are the answer to everything.
Different pages of a Website commonly share the same assets. The user should reuse assets during navigation. Images, scripts, and styles could be cached for months and also the document page itself could be cached for minutes on the Client Browser.
HTTP headers have the responsibility to define if a response could be cached and for how long. The following
Cache-Control header example indicates that the response might be cached for 7 days. The Browser is going to request it again only if the cache expires or if the user force refreshes the page:
A response could also include a
Last-Modified header or a
Etag header. These headers are used to verify if an expired response could be reused. The response status 304 indicates that the content didn't change and doesn't need to be downloaded again. Pay attention to the
Last-Modified and the
If-Modified-Since header pairs and the dates bellow:
Etag header is used with
If-None-Match in a similar way to exchange codes to identify if a content changed or not.
A Website with the HTTP headers wisely defined will provide a better experience for the users. The Browser could save time and Network Bandwidth based on it.
Wikipedia defines a Content Delivery Network (CDN) as a globally distributed network of proxy servers. CDNs are about caching — shared caching.
Cache-control: public HTTP header directive allows different parts of the Network to cache a response. It is common to find assets with
Cache-Control: public, max-age=31536000 meaning that it last a year anywhere.
Finally, the control is all on your hand, developer! Aside setting the right response headers and handle the request headers correctly, there are many things you could improve on server and application side.
The first approach to faster responses and save resources is setting up a cache server between the application and the client.
Tools like Varnish, Squid, and nginx might cache images, scripts and other contents that are shared by users. The following setup up a nginx server proxy that caches content relying only on the application HTTP headers:
There is also a directive called
proxy_cache_lock that allows the proxy server to delegate only the first of similar client requests at a time for the application. If it is set on, the clients are going to receive the response when the first request returns.
It is a simple but powerful mechanism that avoids chaos on the application side when a content expires, and many clients are requesting for it.
The server proxy could also deliver expired content for the subsequent similar requests using the directive
proxy_cache_use_stale: updating;. This faster the response time and reduces the number of clients waiting for a server response.
Last but not least, the proxy could improve the fault tolerance of the application. There are flags for the
proxy_cache_use_stale directive to deliver expired content when the application returns error statuses or when the communication between the server proxy and the application is not working as expected.
The article A Guide to Caching with NGINX and NGINX Plus have more details and configuration options.
Application Caching reduces the time of specific operations. Complex computations, data requests to other services or common data shared across request are some examples.
The Ruby code above uses the simple memoization caching technique. It stores the product price to avoid future calculations. In that case, it will store the data on an object instance, and it will only save resources during a request.
This technique could be applied anywhere in the code. But the use of it brings some concerns. It is important to mind that your data will not expire for example. A global code memoization is going to last in-memory during all the application execution cycle.
4.2. Smart In-Memory Caching
The code above uses the Rails Caching API to store and reuse the category tax during one minute across requests. The cache key definition uses the
category_id to identify the data. This technique is used there to reduce the amount of request to the external Category Tax Service saving resources and time.
There are many libraries that provide this pattern. But it is important to mention that the application memory is a finite resource. The node-cache module, for example, doesn't manage the amount of memory consumed. It could be a problem if your application massively caches data consuming all the available memory.
The Rails Memory Caching wisely prunes the cached data when it exceeds the allotted memory size by removing the least recently used entries. It allows to cache immutable data without defining expiration.
4.3. Cache Storages — Shared Caching
Handle a growing amount of users and requests is an important subject of Web development. One of the ways to scale an application is through adding more application instances (scale horizontally). And as you might imagine, the simple in-memory cache can't be shared between instances.
The Twelve-factor App, a methodology for building Software as a Service (SaaS), points that an application should never assume that anything cached in memory or on disk will be available on a future request or job — with many processes of each type running, chances are high that a future request will be served by a different process.
A Key-value Storage like Memcached or Redis could be used to share cache data between application instances. These tools have different policies to prune the amount of cached data. Cache Storages might also be fault tolerable with replication and persistence. It should have so different requirements that Netflix builds it own tool.
Another important aspect to consider on using Cache Storages is the race condition that happens when different application instances fetch for a not cached data at the same time. Rails Cache Fetch API have the
race_condition_ttl to minimize this effect.
It is difficult to completely eliminate cache updating race conditions issues with multiple application instances. A solution for that is to update the cache data outside the application flow and just consume cached data on the application. On a micro services architecture, it is also possible to protect the communication between application and service with a nginx proxy server as explained above.
I hope this article helps you to understand and choose the best strategy for your application. HTTP headers for caching is something easy that you should setup as soon as possible. You should set up the others caching strategies when you fill some performance pain, keep in mind that “premature optimization is the root of all evil.”