First of all, you should know what's happening with your website. If you're experienced with Prometheus/Grafana, you could use them, but if you’re not, it's not a problem;  you can use any monitoring service, such as DataDog or any other SaaS service, and set it up really quickly. If it's still hard, use pingdom or site24x7, at least to check that your website is still available. 1. Monitor your infrastructure. Remember, you can control what you want to measure, and the most important thing is that if you don't know what's happening inside your system and exactly where it's happening, you can't fix it. There are multiple possibilities of what could go wrong when you get hit by traffic: 1. You're bound by CPU resources 2. You're bound by RAM limits 3. You're bound by your HDD/storage performance 4. You're bound by the bandwidth on your cloud instance/cluster/server Whenever you see that you've reached 80% of your resource limits, you should start scaling. When you reach 100%, you'll be down, and it will take time to recover (not to mention it will be very stressful). When you reach 80% of your load, scale until you get it down to 40%, then repeat as necessary. It's  harder to discover the problem when your performance is hit by IOPS (input/output operations per second) or net bandwidth limits. 2. Prepare to scale at 60-80% of maximum load. You should act fast, because you’ll be losing your users, and you might make more mistakes when you're in a hurry. 3. Keep an eye on HDD performance and bandwidth limits, not only CPU and RAM. RDS, Cloud SQL, MongoDB Atlas and other services are managed by the cloud but they have their own limits and you should watch them and scale when necessary. 4. Watch your database performance, especially when you're using a cloud database. Adding indexes dramatically reduces CPU load. Say you’re using 90% of your DB CPU. You might want to scale the server 2x CPU to handle 2x load, but if most of your queries are unindexed, adding indexes might reduce your CPU load by 10x, so it’s worth investigating. 5. When your DB hits a CPU check for indexes, that might really help. It's easy to forget about your bills when you’re in a rush. Set up budget alerts in your billing system. Bandwidth is especially pricey. If you're unable to move your content to a CDN or to dedicated hosting services lik 100TB or LeaseWeb, the prices are still high. 6. Keep an eye on your cloud bills. Though it's possible to scale CPU and RAM resources in the cloud, there is still a limit that you can't overcome. At that point, you’ll want to scale horizontally by adding new instances of the same app—but your app should be ready for it. When you have multiple instances of the same app, your users' requests are distributed across multiple servers, so you can't store the data on a local disk. 7. Avoid state in your app. You can’t easily scale when you’re using dedicated hosting; it would take time to add more servers. , and usually you pay by the month, not by the hour. You don’t want to wait hours or days if you’re already down. It’s much easier to scale in the cloud. There are some basic things that are disabled by default that you might want to configure in your OS, network layer, app management, and programming language manager; they might reduce your resource usage dramatically. 8. Consider moving to the cloud if you're on a dedicated hosting. It could take anywhere from a couple of hours to   couple of days to get new servers available 9. Tune your infrastructure. Google for “your-tech-stack tuning” and follow the basic  recommendations. Despite any of your efforts, if you get a 100x spike in traffic, you’ll be down. It takes time to scale up, so be ready to serve a static cached version. Just make sure that you’re able to do it when you need to. 10. Be ready to start a minimal/cached version. You might use Cloudfront/Cloudflare cache for this, or your CDN cache, nginx cache, or anything else.

How to Prepare Your Site for Heavy Traffic

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Box.com terror tactics sales model. Is there an alternative?

The Noonification: How to Deal With Flapping or Broken Tests (11/29/2023)

The Noonification: Delving Into OpenTelemetry Collector (11/18/2023)

The Noonification: How to Implement a Merkle Tree in Solidity (11/12/2023)

105 Stories To Learn About K8s

104 Stories To Learn About Continuous Integration

Box.com terror tactics sales model. Is there an alternative?

The Noonification: How to Deal With Flapping or Broken Tests (11/29/2023)

The Noonification: Delving Into OpenTelemetry Collector (11/18/2023)

The Noonification: How to Implement a Merkle Tree in Solidity (11/12/2023)

105 Stories To Learn About K8s

104 Stories To Learn About Continuous Integration

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps