Apache bench This is our step 1 , but keep in mind that this is single threaded. Think of ab like sanity testing for performance. tool For step 1, you can use a single machine, with a good amount of RAM and CPU. Run loads for up to 15–30 mins to check nothing obvious is going wrong. wrk wrk is our step 2 tool for , it is similar to ab, but this will puts proper load (concurrent). In step 2, we would use a big machine (think 64 GB ram, 8–16 cores). This gives a good understanding of the system before moving on to the next phase. load testing We are still testing on a single machine in this phase though. Run loads for about 30–60 mins. Up until 1 and 2, we are probably testing single end points of the application. This is easy to setup and get the results. But things are not complete until we do distributed load testing + complete application flows (scenarios where multiple API are called before the flow is completed). Distributed load testing will take both longer to setup, and more expensive then 1 and 2. Distributed load testing Our goto tool here is jmeter. Jmeter might sound specific to java but it is not so. It has been a good investment of time to pick up jmeter. Plus since jmeter is a very old tool, its very easy to find a lot of helpful guides and setups. The goal of 3 would be to get a (do take note of the machine specs). Once you have this number, . Also helps to decide how much vertical/horizontal scaling you will need. good estimate of Requests per second per machine you can project your requirements based on expected traffic/peak traffic For vertical scaling Choice of language Once you have identified the main flows you need to focus for performance, you can rewrite those pieces is a compiled language (this is almost always a low hanging fruit to get performance boost). But this is only when your app is performing a lot of stuff and not just a middle layer to talk to the database. Architecture You are already at a good stage where the setup is across a few services instead of one huge service. Try to not get too many services either. You want to take into account each network call (this will add a minimum of 10ms for every call). Sometimes coupling two services will give you the gains. Going into memory usage It might be harder to get these numbers for nodejs, but on a code level, there can be certain low hanging fruits as well. Identify hotspots for gc or in many cases the algorithm and data structure itself will matter as well. For horizontal scaling Load balancing checkout the arrangement of your http calls. How many routers is the service going thru after entering your network. How do plan to evenly distribute the load. A round-robin approach is okay for the start, but as you will move on, identify which calls are heavy, which ones are not and configure the router accordingly. Caching If the system is non-transactional/non-payment system, where 99% accuracy is as good as 100%, then caching helps a lot. Be aware of all the edge cases for caching though. Performance comes only second to the actual business. **A bad caching configuration can go really bad, including lower performance and maybe wrong results.**Also be aware of caches which are already present (a lot of frameworks/orms lately include hidden caching) You can apply caching at all levels: database level, functional level, controller level, api level. Database to increase through-put (this sort of falls under caching, although its a different type). Also make sure you account for what happens when a replica goes down. for a majority-read/minor-writes system, you can leverage database slaves - If the case is opposite, major-write/minor-reads, you will want to check what database you are using (having multiple master is helpful) - Also how you are storing data in that particular db. . How much data needs to be written to disk for each DB transaction Above is a brief summary of approach. For each point above, you should also consider the downsides what if one machine goes down what if 5 go down what if network latency increases for some reason what if io speed goes down for some reason Some general points for above: Make note of all the calls going out of your system (to a third party) If a third party cannot guarantee their RPS/QPS, you might want to shift to async flows (queues) Keep note of which of your services are IO heavy, vs CPU heavy vs Network heavy (many times you can couple two of these in a single service) Try to minimise mocks when load testing.

To start thinking about load testing

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Creating a todolist backend with persistence

101 Stories To Learn About Cloud Infrastructure

10 Things in Engineering We Don't Spend Enough Time On

10 Things I Did To Increase CloudTrail Logs Security

10 reasons to give cloud computing a go

10 Lessons from 10 Years of AWS (part 2)

Creating a todolist backend with persistence

101 Stories To Learn About Cloud Infrastructure

10 Things in Engineering We Don't Spend Enough Time On

10 Things I Did To Increase CloudTrail Logs Security

10 reasons to give cloud computing a go

10 Lessons from 10 Years of AWS (part 2)

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps