6 Critical Flaws That Can Crash Your Website or App Under Load

You’ve devoted time, energy, and money into plans for a big campaign, only to have the overwhelming demand crash your site or app.

You become a victim of your own success.

It's not an ideal situation, but know that you're not alone. When web traffic surges suddenly, it overloads even the world's largest businesses.

Still, your worst nightmare came true, and you need to understand why. What went wrong?

It's normally not just a single flaw. But here are some prime culprits and actionable steps you can take to mitigate them.

1. Your Monitoring is Lacking

Even if monitoring doesn’t make your application scalable, it’s still a foundational part of your defense against website failure.

You’re already behind if you don’t have insight into your application’s metrics, you aren’t alerted to failures, or you don’t have easy access to your log. Your ability to react swiftly to a failure is considerably weakened.

Also, after the event, data is all-important to conduct a root cause analysis based on facts, not guesswork. An accurate understanding of the root failure will allow you to effectively optimize your system for the future.

If your application is a black box, there’s good news: There are a slew of log
management and application performance management (APM) SaaS solutions that can help you and are easy to get up and running (Datadog, New Relic, Loggly, and Splunk, just to name a few).

2. Your Cache Strategy isn't from this Century

A core feature of your website is showing data such as inventory status updated in real-time. Odds are you accomplish this by querying your database.

It might seem like a sound idea from a business point of view. But querying your database to update data in real-time is actually a terrible idea if you’re trying to create a scalable system that you can afford.

Caching is one of the most effective and least expensive ways to improve scalability and performance. It can be implemented on your infrastructure, in your application, or at the data level. Each has its advantages, but the infrastructure level is likely where you’ll see the greatest rewards for the least amount of effort.

If you don’t have a content delivery network (CDN) in front of your
site, it’s a clear next step. A CDN distributes content closer to website
visitors by leveraging its network of internet properties around the globe.
This reduces latency, improves page load times, and limits the requests made to your hosted web servers, resulting in lower bandwidth and hosting costs.

If you already have a CDN in place, there’s a good chance you’re not getting the most out of it.

Originally, CDNs served static content like pictures. But today, they do much more, from caching of dynamic content to routing to DDoS protection. Some, like Akamai EdgeWorkers or Cloudflare Workers, even offer to execute code. The edge computing offered from CDNs can serve as a new home for your stateless tasks—ones that don’t need to check in with the server to execute correctly but which require an algorithm and drain CPU power.

The more you can minimize requests to your web servers, the more it frees up those resources to perform the critical tasks you really need them to—and the more performance you’re able to squeeze out of your web application.

3. Your System Uses One Database Technology

For decades we’ve developed applications primarily based on relational databases. Steep prices for data storage and software licenses have led us to misuse the relational database to save images or temporary data like session states.

A relational database is a powerful tool. But it’s only one
of many. To achieve a scalable system, you need to use many tools, not just one.

At the same time, we’ve been told that data must be normalized down to an atomic level, disregarding the fact that the data need to be combined again before it can be useful. The result is complex SQL queries that slowly kill database performance.

There are many solutions, including document-, time series-, memory-, or graph-databases, message streams, or even a pattern like Command Query Responsibility Segregation (CQRS) that generates optimized read models.

Each use case has its own specifics and requirements, and different tools are needed to attain scalability.

4. Your Business Processes Will Have to Change

We’ve traditionally modeled our digital applications around the processes and transactions that exist in the physical world. For instance, an eCommerce site will often process a synchronous authorization on a credit card in the same way as your local corner store.

The problem is, there’s now a strong dependency between the website and the payment processor. The website’s performance is now limited by the payment processor’s scalability and uptime. In a scalable system, services are autonomous and utilize asynchronous workflows in an eventually consistent architecture.

If you’ve bought something on Amazon, you may have received an email explaining that your payment didn’t go through and asks you to re-enter your credit card information. This happens because the payment is only processed after the order goes through and the shopper received their order confirmation.

The system can then subsequently process the transactions at a speed the payment gateway can handle. As a result, the website can process orders in amounts that, at times, are far higher than the payment gateway’s capacity.

The vast majority of transactions we implement in our applications aren’t necessary. But in most cases, converting those we need into asynchronous, scalable processes requires changes to business workflows.

5. Your Application can't Increase Capacity

There will be limits in your application, regardless of infrastructure. A scalable application should, however, be able to leverage its elasticity to increase capacity. Optimally, a system will scale horizontally by adding more servers to a cluster. So, when the number of servers doubles, the capacity doubles. The advantage is that the system – automatically or manually – can easily adjust capacity based on current demand.

Commonly, though, many applications scale vertically by replacing the server with a larger or smaller one. This often requires lots of resources and some period of downtime. Scalability becomes more and more
costly in a vertical model due to administrative, hardware, and licensing
costs. Yet developing a horizontally scalable system is neither easy nor cheap.

An alternative or complementary approach is to downgrade the user experience as the system runs out of resources. An advanced database CPU-heavy search function can, for instance, be replaced with a simple search function to free up the database’s CPU for other purposes.

As another option, you can choose to give access to a certain share of users straight away and redirect excess users to a waiting room that
doesn’t strain your system.

6. You're Not Prepared

Even with superb technical architecture, briliant developers, and exceptional infrastructure, your application will still have limitations. You doubtless have some type of distributed system and network, latency, multithreading, and the like introduce a series of new sources of error that will restrict the scalability of your application.

It's essential that you’re aware of these limitations and prepare for them before the error happens in production. Systematically load testing your application is the best way to accomplish this. Initially, you’ll likely experience that each load test will expose yet another limitation. That’s why it’s critical that you set aside enough time for load testing and that you run multiple iterations to identify and fix performance limitations.

Remember too that new code can potentially introduce new limitations, so it’s important to periodically run your load tests.

So, now you’ve run your load tests. You’re armed with the knowledge of where your bottlenecks are and when they’ll start to overheat. But what do you do if the online traffic begins exceeding the throughput your bottlenecks can manage? You’d need a way to manage traffic inflow to keep traffic below your capacity thresholds.

Common web traffic management solutions include instituting rate-limiting, timing marketing campaigns, and using a virtual waiting room.

By combining bottleneck insights from load tests with your traffic management strategy, you’ll bolster your preparedness and keep your website online no matter the demand your systems encounter.