After 8 years of working on high-volume web sites, I’ve seen a lot of interesting scaling techniques. While it’s true that very specific strategies can eke out enormous efficiency gains, complication doesn’t come without a cost. In my experience, scaling problems that a moderately-sized site will encounter can be solved with:
The examples below are written in the context of Ruby on Rails, but should translate well to other languages and frameworks.
As a web site grows, controllers can easily become bloated and slow down with all the additional tasks they might have to perform, such as:
For example, imagine a user wants to delete his profile and thousands of records associated with it — this can take a while. And it really doesn’t have happen immediately; it could instead acknowledge that the request has received and they’ll be sent email when it’s finished. You could then perform the actual delete in an asynchronous work queue. An async work queue consists of:
Basically, anything that can be deferred from the controller should be:
The only reason not to defer is if the result is important to the response; e.g. a payment request might block so that the customer is immediately notified that payment has failed.
Outside of the web request cycle, async work queues are very useful to parallelize large work loads. For example, imagine a daily scheduled script that announces new products to a 10k email distribution list. If this script sends all the emails in sequence, this could take upwards of an hour. With a dozen async workers this might take closer to 5 minutes. It’s also simple to temporarily scale the worker count if you need to handle job/traffic spikes.
Asynchronous workers are tremendously effective for a variety of problems and have a low implementation cost, making them a pragmatic solution in a variety of situations.
The programming language your web site is written is probably very general purpose — it’s optimized for flexibility at the cost of efficiency. A database is a highly optimized computation engine for relational algebra. Literally hundreds (if not thousands) of books have been written about databases, so I’m only going to mention a few common scenarios I run across.
N+1 Query Problems
One of the biggest complaints I’ve heard of Object Relational Mappers (ORMs) — especially ActiveRecord in Rails — is that they make it really easy to write bad queries/sets of queries without knowing. Consider for example:
User.each { |u| puts u.address }
Seems pretty innocuous. But what’s not obvious is that Rails is doing one query to load the User
relation, and then in each iteration of the loop querying the Address
table:
SELECT * FROM users;
SELECT * FROM addresses WHERE user_id = 1;
SELECT * FROM addresses WHERE user_id = 2;
...
This class of performance problem is called an N+1 query problem and is simple to identify by a simple analysis of your SQL log. The fix is as straightforward too: instead of querying in a loop, collect the IDs of your users and query all of them in one statement:
SELECT * FROM addresses WHERE user_id in (SELECT id FROM users);
SELECT * FROM users INNER JOIN addresses ON (users.id = addresses.user_id);
Ruby on Rails provides a few “eager loading” mechanisms to make it easy to avoid N+1 query patterns:
User.includes(:address).each {|u| puts u.address}
Under the hood, this instructs Rails uses SELECT..FROM..WHERE id IN (?)
pattern to select against addresses with the user_ids from user.
Missing Indexes
When querying a table by any column the database engine has two options: look at each record in the table or use an index — a highly-optimized data structure that can be used to select or eliminate records before looking at the table.
On tables smaller than 10k records, it can be easy to not realize you’re missing an index because it’s fast enough. Beyond 10k though, queries will become noticeably slow. Foreign key relationships (other_table_id
) are an obvious place to put an index. Covering indexes, where all information you need to return is in the index, can speed up queries by obviating the need to access the table itself at all. If you’re not sure if you’re using an index, you can use your database engine’s EXPLAIN
command:
EXPLAIN SELECT * FROM foo;
QUERY PLAN
---------------------------------------------------------Seq Scan on foo (cost=0.00..155.00 rows=10000 width=4)(1 row)
EXPLAIN SELECT sum(i) FROM foo WHERE i < 10;
QUERY PLAN
--------------------------------------------------------------------Aggregate (cost=23.93..23.93 rows=1 width=4)-> Index Scan using fi on foo (cost=0.00..23.92 rows=6 width=4)Index Cond: (i < 10)(3 rows)
This tool will quickly show you the strategy your database engine will use. The documentation for Postgres’ EXPLAIN should be followable even if you’re using another database. The excellent database administrator tool PgHero will also automatically surface index add/drop recommendations based on usage statistics.
Serialized Data
Relational algebra works on structured data: pre-defined tables, columns, and relationships. Semi-structured data such as XML and JSON are not always predefined though. The workaround has traditionally been to serialize your XML or JSON and store them as a text record. Recent database releases have added native column support for JSON (original text stored) and JSONB (binary structure).
API responses are popular JSON structures to store in a database. Using a native JSON format, API response details can be easily be queried in the database instead of in your programming language after deserialization:
SELECT * FROM stripe_charges WHERE (transfer->'amount')::int > 1000;
Or you can pluck individual values from a large JSON object:
SELECT (transfer->'created'), (transfer->'amount')::intFROM stripe_charges;
Aggregate in the Database
Imagine you wanted to select all users who have purchased more than $100:
users = User.includes(:payments).select do |u|u.payments.map(&:amount).sum > 100end
This will instantiate an ActiveRecord object for every user in the table, every payment in the table, and then throw away records it doesn’t need. Instead, we can instruct the database to do the filtering and only return User records that meet the criteria:
User.joins(:payments).group('users.id').select("users.*, SUM(payments.amount) AS total_amount").having("SUM(payments.amount) > 100")
This while generate the appropriate JOIN
, HAVING
, and GROUP BY
:
SELECT users.id, users.name, SUM(payments.amount) AS total_amountFROM usersINNER JOIN payments ON payments.user_id = users.idGROUP BY users.idHAVING SUM(payments.amount) > 100
Database Wrap-up
Databases are mature, highly-optimized data stores. Whenever possible, let databases do what they’re good at instead of doing it more slowly in your web programming language.
A common saying in the development community is that there are only two hard things in computer science: naming, caching, and off-by-1 errors. Caching can be hard to get right, leading to hard-to-track-down stale value bugs. I’m definitely not claiming that caching is a silver bullet; but it is certainly a lead bullet that should be in your arsenal.
Often, entire responses can be cached, such as:
sitemap.xml
These cacheable responses are some of my favorite low-hanging fruit for performance improvements. In the case of a cache hit, no template engines needs to be invoked and the web server immediately begins serving content to the client. It’s hard to beat a .1ms response. If 50% of traffic enters via the logged out homepage, caching the entire page can be a huge win. Especially because Google considers page response time as a signal when they rank search results. Beyond user and ranking benefits, the cached responses will also lighten server load.
If at first it doesn’t seem like the entire response is cacheable, think again. A logged-in homepage might just be a logged out homepage with a logged-in menu bar; in which case you can quickly serve the logged-out homepage, and then use AJAX to replace the logged-out menu with the logged-in menu.
Over 50% of requests to www.cameralends.com were served in under 10ms
Even if an entire response isn’t cacheable, expensive bits can be:
SELECT COUNT(*)
, complicated joins)A common problem with caching is the cache stampede problem:
Cache stampede_A cache stampede is a type of cascading failure that can occur when massively parallel computing systems with caching…_en.wikipedia.org
The problem deals with cache expiration; imagine a JSON endpoint that takes 10s to generate and is about to expire:
If requests come in every .1s, 100 expensive requests will have begun before the original 10s request finishes. If it ever finishes! If it’s an expensive database query, the load increases drastically. It could take 20s (or more!) for the first query to finish, and that could be enough to cause downtime. If there are only 100 web workers, then it will cause downtime.
A common solution is to use a scheduled job to cache the value before it expires (or maybe even never expire the value, so there’s always a fast response). This strategy can be a bit of boilerplate to add, which can discourage common use. That’s why I favor an asynchronous cache recompute strategy.
Something like a product catalog might only change a few times a day or week. And when it does, it’s acceptable for tens-hundreds of requests respond with a slightly-stale catalog. The asynchronous caching strategy I’ve used works like that:
This strategy will let you always return quickly unless it’s the initial access and the value needs to be computed. Although, you could also have it return nil
and degrade gracefully, and queue an async job to fill the cached value for future responses.
Rails provides a simple solution to cache stampedes viarace_condition_ttl
in [ActiveSupport::Cache::Store#fetch](http://api.rubyonrails.org/classes/ActiveSupport/Cache/Store.html#method-i-fetch)
. When this parameter is set, the first process that accesses the stale value will:
race_condition_ttl
The second and further process that access that cache key will be returned the slightly stale value. If the recomputed value hasn’t been cached by race_condition_ttl
, the current process will bump the expiration and recompute. It’s worth noting that some unlucky requests will be stuck with the task of recomputing, so this strategy will have an effect on user experience.
Static caching
With heavy amounts of unchanging content, static caching can be a good option. Static site generators like Jekyll can be used to build the entire file structure to serve through Nginx or Apache. On Heroku, you can simply use Rack to serve a static directory.
This collection of easy wins is far from a comprehensive optimization guide, but can take the scalability of your web site surprisingly far. Before adding a complicated performance solution, it’s worth checking to see if dumber, more straightforward solution will work.