Using Queueing Theory simulations to model capacity planning allows for a deeper understanding of system performance and client experience when compared to a strictly rate based approach. This article details why queueing theory is essential for modeling capacity and provides 2 hands on examples of using queueing theory to model capacity. The first example will perform capacity planning for a software service. The second example models an organization process as queueing system in order to show how capacity can be forecast for people based systems as well as software based systems! Many of the articles that I’ve read covering queueing theory is math heavy. This article requires nothing more than basic algebra and a queueing theory calculator. There are many free calculators available; while the math is still there under the hood it’s not necessary to understand it in order to model capacity forecasting using queueing theory.
Consider a service that processes a request in 1 second, and has a load of 2 requests / second. How many instances of the service would be necessary to keep up with load and minimize processing time (assuming the services are perfectly horizontally scalable)? The intuitive answer is 2 but this is only correct if the requests arrive one after another perfectly timed. To illustrate this consider the base case of a single instance of the service, and a request rate of 1 per second:
In the real world requests arrive slightly overlapping, but monitoring system usually aggregate them to per second rate, giving the false impression that everything is arriving in a line:
In the offset arrival above, transaction2 needs to wait for the service to finish processing transaction1 before it can even begin to be processed! The total time transaction2 spends in the system is almost 2 seconds (service time + wait time)! Queueing theory takes into consideration the offset arrival of transactions.
Inputting our data into a queuing theory calculator 1 requests / second and 3 services shows something really interesting:
The queueing theory calculations are able to model a random distribution of arrivals. With 3 services there are an average of ~0.9 clients in the system!! this means that each request has to wait on another request to finish processing!!! Creating an average latency of 1.4 seconds per request!!! 40% more than expected!!
While this is enough capacity to handle the incoming load (as long as a connection can be queued somewhere (ie which TCP does in servers through TCP backlog/SOMAXCONN) . Even though there is a 66% utilization the client experience is impacted because they spend more time waiting for a service instance to become available. This leads to a tradeoff along the cost/latency dimension:
The x axis in this photo is utilization percentage, and the y axis is latency. At 100% utilization there would be maximum cost effectiveness but the service will be fully saturated causing client requests to queue. In fact I personally believe this is the intuition behind the 80/20 (Pareto Principle) as the performance starts to degree heavily @ 80% utilization meaning that attacking the 20% utilization is responsible for the majority of wins/results. This tradeoff between utilization and latency a well studied field and is modeled using Little’s Law and the Universal Scalability Model. While this is super interesting and useful it’s out of the scope of this article.
How many instances are necessary to provide a capacity that gives a good client experience close to 1 second full transaction time (service + waiting)? 6 instances are necessary to achieve an average customer wait time of 1004 ms:
Compare this to 4 instances which would offer a 1087ms average time spent in the system (service time + wait time):
Rate based modeling does not take this into account. Under a rate based modeling approach n + 1 would have suggested to use 3 services, which would have resulted in significant latency for clients.
The really amazing thing is that queuing theory is an abstract model, it doesn’t require software systems to work. In order to illustrate this we’ll model a organizational process as a queueing system and perform capacity planning on it.
Suppose that a company has a manual change approval step. This requires each PR hitting a specific code environment have manual approval from a certified approver. Each approval takes an approver ~24 hours from the time its started (ignoring weekends) until the time it’s marked as finished (service time of 24 hours). This could be because the approver has their own queue of items that they need to focus on. Next there are 5 teams and each team produces about 2 features per working day which need approval (2* 5 = 10 requests / day):
How many approvers are necessary to keep wait times under 3 days? By plugging these values directly into a queueing theory calculator we can see that 11 approvers provide a (145335 seconds / 60 seconds/minute / 60 minutes/hour / 24 hours/day) = 1.7 day wait time. By modeling arrival times queueing theory is able to provide insight into the total time a client spends in the system, which isn’t available in a strictly rate based capacity planning approach.
Queues show up in tons of people based systems and processes: Physical lines like at movie theater, fast food drive throughs, any line at any restaurant, JIRA, CI pipelines, CD pipelines, Pull Request Reviewers, meeting room scheduling, physical manufacturing pipelines, call center capacity, anything involving choosing an appropriate number of people to service incoming requests from a single (or multiple lines) of work.
I hope that this article explains why queueing theory is essential for capacity planning. It provides view into the system that helps to model a full client experience, by taking into account arrival distributions. While the math behind queueing theory is complicated (at least to me) there are many free calculators available that make it accessible to everyone. Finally, queueing theory isn’t only for software systems. Many business processes and systems can be modeled as queueing systems enabling them to be analyzed using queuing theory. I appreciate you reading this!!