Control online traffic and treat your visitors fairly with a virtual waiting room.
You've used your budget on social media, ads, and preparations for what should be the year's biggest campaign...just to see your website buckle under the stress of thousands of simultaneous users.
They try to access your site but are instead met with lengthy lag times and dreadful HTTP error codes.
Sound familiar? If it hasn’t happened to your own site or app, you’ve likely read about many examples in the media.
It’s a terrible feeling to be the victim of your own success, to have your infrastructure fail just when you are poised to reap the biggest rewards.
The more load you add to a system, the more likely it will fail. That’s just a fact of computing. At a certain point, you might have to add servers, refactor your application architecture, or move to a new application framework that is easier to scale.
That said, there are steps you can take right now to tweak more performance out of your existing setup. Some you might already have, but others might be overlooked.
At Queue-it, we’ve seen our fair share of websites that could not handle the pressure from the thousands of concurrent visitors. Some of these websites were poorly designed and built. Others, after spending millions on performance, still cannot handle the demand.
Here are 13 tips based on years of experience from Queue-it's Director of Product Martin Larsen, an expert in web performance.
In 2021, using a CDN should be a given. If you don’t have a Content Delivery Network (CDN) in front of your site, it’s a clear first step.
Moving all your static resources to a CDN is by far the easiest way to tweak a bit more performance out of your server.
A CDN improves performance by caching—or saving—static content like images at the CDN level, so these assets can be loaded just once and then served to hundreds or thousands of visitors. Serving static files from your web server takes clock cycles, bandwidth, and threads away from what your web server should be doing— serving dynamic content.
Caching is one of the most effective and least expensive ways to improve scalability and performance. It can be implemented on your infrastructure, in your application, or at the data level. Each has its advantages, but the infrastructure level is likely where you’ll see the greatest rewards for the least amount of effort.
Depending on your setup, you might load CDN content from another domain (e.g. assets.website.com), which requires configuration in your application. But these days, CDNs let you put the CDN in front of your entire website and then give you the ability to configure rules to determine what should load from the CDN and what should load from your web servers.
Even if you already have a CDN in place, there’s a good chance you’re not getting the most out of it.
Originally, CDNs just helped serve static content. But today they do so much more, from caching of dynamic content to routing to DDoS protection. Some, like Akamai EdgeWorkers or Cloudflare Workers, even offer to execute code.
Taken together, this means that it’s possible to offload increasingly more of your website onto CDN providers’ infrastructure, resulting in lower strain on your servers.
Contrary to popular belief from the past decades, the relational database is not a Swiss army knife.
Steep prices for data storage and software licenses have led developers and architects to misuse the relational database to save images or temporary data like session state.
Relational databases are the most common reason why we see websites fail to scale. Not many developers or architects seem to realize that there are other ways to persist your data – NoSQL databases, blob storage, message queues, push notification and browser storage just to name a few.
Relational databases are the most common reason why we see websites fail to scale.
A relational database is a powerful tool. But it’s only one of many.
To achieve a scalable system, you need to use many tools, not just one. You’ll end up with a set of tools and your data may be distributed and replicated between them.
We realize this sounds a bit scary.
You’ll need to have a broad knowledge of and experience with different tools and technologies to pick the right ones. Your application will need to support the ones you have picked. Finally, you need to run it in a production environment.
But it sounds scarier than it is. Scaling a relational database is harder.
Web performance is not just about hardware, tools, and algorithms. It is also about how you chose to design and build your application.
When it comes to elasticity in scaling your servers, scaling horizontally is generally preferred over vertically. That means that you should favor adding additional smaller servers when you scale over replacing servers with larger ones.
Scaling horizontally with many smaller servers means greater flexibility and lower costs, both in purchasing, repair, and replacement.
This is vital if your application is running in the cloud. When you scale horizontally, your application may need support for e.g. easily adding new servers with bootstrapping and serving the same user from multiple servers, just to name a few aspects. Even with auto scaling, server scaling is complex.
Load balancing helps maximize your server resources by distributing requests across multiple servers. A cloud-based load balancer makes the decision at the network edge, closer to the users — allowing you to boost response time and effectively optimize your infrastructure while minimizing the risk of server failure. Even if a single server fails, the load balancer can redirect and redistribute traffic among the remaining servers, ensuring that customers don’t experience significant latency or see a site outage.
With horizontal scaling and load balancing, you get an application that is easy and inexpensive to scale.
To help underscore the savings, you can think of two cars, one fancy sportscar and one modest sedan. Let’s say the Ferrari 812 Superfast and the Ford Focus.
It’s much easier to buy a Ford Focus than the Ferrari. The supply is high, delivery time is short, and if you’re based in the U.S. it doesn’t need to be imported from abroad.
What’s more, it’s easier to repair (almost any auto shop can fix it) and it can easily be replaced, even with another model or brand with similar specs. And it’s cheaper.
Similarly, scaling horizontally with many smaller servers means greater flexibility and lower costs, both in purchasing, repair, and replacement.
A lot of these tips involve ways to reduce the workload on your servers, and this one is no different. Serverless technology that lets you execute code on the network edge is an effective way of removing the task burden from your web servers.
The genius of edge computing is it's another platform upon which you can run your code—without necessarily involving your origin web servers.
Edge computing leverages the many edge servers of a CDN. These servers are distributed around the globe, originally with the purpose of reducing latency and bandwidth use.
The genius of edge computing lies in adapting this network of servers into another platform upon which you can run your code—without necessarily involving your origin web servers.
So what type of assets or code do you want to run on the edge? Ideally, you’ll want to move things that don’t require a state. In other words, it doesn’t need to check back in with your web server to execute correctly.
To take an example from ecommerce, let’s say each time a visitor logs in to their account on your site, you show a personalized list of items that might interest them. To show this, what you need access to is a historical overview of purchasing history, perhaps refreshed once per day. That’s it. It doesn’t need to be updated in real time.
But creating the list still requires CPU power and an algorithm. This would be a great candidate for offloading to run on the edge.
The more you can minimize requests to your web servers, the more it frees up those resources to perform the critical tasks you really need them to—and the more performance you’re able to squeeze out of your web application.
Despite your efforts to maximize performance, you are bound to end up in a situation where there is a shortage of resources. In this case, it is better to have a responsive, basic website than an unavailable, feature-rich one.
You’ve likely spent years tinkering with and tailoring your site to make it just how you like. There’s always something to optimize, and you’ve added some nice bells and whistles along the way. These can be performance intensive, and they can add up.
Think about your search feature. When someone searches for a product, an advanced search function must find all items that match search term(s), accounting for misspellings, product categories and features, and so on. This is very CPU-intensive.
Build your application with Ops Toggles so that performance intensive features are disabled when resources are low. Prime targets for fancy “nice-to-have” features are an advanced search feature or “Recommended for you” panels.
It is better to have a responsive, basic website than an unavailable, feature-rich one.
Yes, the user experience will be degraded. But even giants like Amazon do this, like when the site was overwhelmed by demand on Prime Day in 2018, and the company implemented a scaled-down “fallback” home page. If Amazon uses feature toggling, you could too.
Sharding is the process of splitting up your resources and IT infrastructure, so end users do not access the same servers, databases and other components the application consists of. The purpose is to minimize the scaling requirements for each shard and to reduce to blast radius if one shard goes down.
Sharding means that a shortage of resources on one database server will not affect users running on other database servers.
One common example is to shard by geography. If you are serving users worldwide, you could provision three identical systems to serve users in America, Europe and Asia/Pacific rather than having one system to serve them all. Each of the three systems are easier to scale on their own compared to one huge system. And, if one system is down or unreliable, it will not affect the other two systems. Assuming the load is equally divided, the result is that your overall reliability is improved by at least a factor 3.
Depending on your application, another approach might be to shard by some key data attribute, like a user id or an event id if you are selling something like tickets. This way you can more evenly distribute load throughout your shards. You might even take advantage of advanced techniques like shuffle sharding to further increase scalability and reliability.
If your bottleneck is in the data layer, you can apply the same techniques to databases so that data resides in different tables in different physical databases. Distributing data in this way means that a shortage of resources on one database server will not affect users running on other database servers.
There are several data models for accomplishing this, so you can choose a way that best matches the needs of your application.
Many web servers are limited by the number of concurrent threads. Having long-running requests will reduce the number of requests the web server can handle in a given timeframe. Performance is drained when web servers starts to block requests when the thread pool is exhausted.
Make sure you monitor the latency of requests and take actions to reduce it. High latency is usually caused by blocking code in your application like database requests, transactions, poorly written algorithms and network latency. Latency is likely to increase over time as more data accumulates and the number of concurrent requests increases.
Make sure you monitor the latency of requests and take actions to reduce it.
Modern programming languages allow the application to reuse threads when they are waiting for IO. Utilizing this asynchronous programming model is key in those cases where the application makes requests to dependencies like databases.
Use caching as much as possible – in many cases the application logic does not need fully updated data, and for those cases latency can be reduced considerably when adding a cache layer.
Make sure you only create transactions and use locking when you absolutely must. In most cases there is a way around using them and doing so will reduce latency considerably.
Finally, you can change the underlying tool to one with faster response time. If you’re working with transilient data, then use a memory database rather than a SQL database.
Locating a performance bottleneck is a bit like finding a needle in a haystack. Yet most developers will start looking at their code to find it.
What they should be doing is looking at data.
Most applications are black boxes. You put something in, you wait, and you get a result back. When applications start to perform poorly, it’s difficult to figure out why.
Is it thread starvation? An unoptimized SQL query? Inefficient code and algorithms’ exponential complexity? Network latency? The possible causes are almost infinite – so where do you start?
A superior performance profiler will collect data and visualize in a way that makes it easier to locate the bottlenecks. You can typically explore CPU and memory usage and will point you to the exact piece of code that is causing the issue.
Application Performance Monitoring (APM) is becoming increasingly important with the modern distributed applications and microservices architectures. APM tools will allow you to explore the relationship and latency between services and data stores, making it possible to pinpoint exactly with component is causing the issue and why.
Locating a performance bottleneck is a bit like finding a needle in a haystack. A superior performance profiler will collect data and visualize in a way that makes it easier to locate the bottlenecks.
There is a bit of a learning curve and it takes some time to set up, but that will be paid back many times as more performance issues are solved.
Even with superb technical architecture, brilliant developers, and exceptional infrastructure, your application will still have limitations. You doubtless have some type of distributed system. Network, latency, multithreading, and the like introduce a series of new sources of error that will restrict the scalability of your application.
When we ask our customers how many visitors their website can handle, they usually have no idea. That's where load testing can help.
It is essential that you’re aware of these limitations and prepare for them before the error happens in production.
Yet when we ask customers how many visitors their website can handle, they usually have no idea. Or, they expect to be able to handle way more than they actually can. While it is an important question to answer, it does often reveal that they do not run any kind of load testing.
If you’re planning for a high traffic sale or registration, it’s crucial to start the load testing process early. It requires a lot of planning. You also don’t know what you’re going to find. And that means you don’t know how long it will take to implement any changes.
All this can create tension. On the one hand, you need to test as soon as possible. On the other, there’s no point in testing something that bears no resemblance to the finished article. So, you might find that you need to have your landing pages and key journeys ready much earlier than expected.
Remember too that new code can potentially introduce new limitations, so it’s important to run your load tests regularly.
You know the level of performance you expect to provide your visitors. And if you’ve run your load tests, you know what kind of traffic your site or app can handle.
But if you receive a spike in website or app visitors, it’s unlikely you’ll be able to deliver the performance your customers expect. As the saying goes, “at scale everything breaks”.
To keep visitor levels where your website and app perform best, you need a solution to manage traffic inflow.
Websites and apps are built with assumptions of how much traffic they normally handle. Making a site scalable on demand is technically challenging and can prove costly. Every website has limits, as anyone who reads the news knows—even the world’s largest businesses fail under heavy load.
To keep visitor levels where your website and app perform best, you need to manage traffic inflow.
Some web traffic management strategies include rate limiting, coordinating marketing campaigns, and using a virtual waiting room.
Your server capacity is usually quite limited, either by physical hardware or cost. In contrast, the power of end-user browsers scales with the number of users.
By rethinking the way we build our websites, our servers will be freed up to focus on data access and security.
By rethinking the way we build our websites, our servers will be freed up to focus on data access and security, while HTML templates and some data access can be handled by cache servers and CDNs.
In many situations, your performance bottlenecks are complex business rules that force you to build complex code. In other cases, it is a third-party service, such as a payment gateway, which limits performance.
It's well worth questioning business rules and trying to simplify them.
Often these business rules are modelled on the processes and transactions that exist in the physical world. For instance, an ecommerce site will typically process a synchronous authorization on a credit card used in a sale in the same way as your local corner store. Now a strong dependency between the website and the payment processor, so the website’s performance is now limited by the payment processor’s scalability and uptime.
But is that really necessary?
In a scalable system, services are autonomous and utilize asynchronous workflows in an eventually consistent architecture.
It is worth revisiting such business rules and trying to simplify them.
Do we have to pick the item from the inventory when it is added to the basket, or is it fine to do it asynchronously when the order is completed? Can we complete orders without authorizing credit cards?
Oftentimes decision makers will change the rules once they learn about the consequences.
Like any project, the “Why?” should always come before the “How”.
In other words, before spending a lot of time and money on redesigning your system to handle more visitors, remember to consider your business.
If your business is to sell a limited stock to a large audience, like a sneaker release or popular concert tickets, there is no reason to spend resources on a system that can handle all users at once.
Before spending a lot of time and money on redesigning your system to handle more visitors, remember to consider your business.
In such a situation, it’s fine if your site has high capacity. But if you open the flood gates and allow far more people than there are tickets, you’re setting users up for disappointment.
Designing and building a scalable website isn’t easy. The infinitely scalable website will always be out of reach.
But, if you use a combination of the right technical tools and business processes, you’ll have yourself an exemplary, high-performance website.
If you’ve gone through all 13 of these essential steps, you’ll be well on your way.