Simple Code Execution Monitoring, built for developers
Hi, I’m Valerio, founder and CTO at Inspector.
You may have already heard of rate limiting associated with API consumption. In this article, I’ll show you a more complex use of this component, using it to coordinate data ingestion pipelines.
Building Inspector, I’m learning a lot about data-intensive applications, and pipelines are one of the most critical component of their internal architecture.
The architecture of a data-intensive application can be simplified with the schema below:
Large, unpredictable volumes of incoming data requires a well designed data processing pipeline to ingest that data without disrupting the entire system at every incoming data spike.
While the Ingestion Node and the Ingestion Pipeline can easily scale horizontally with an “auto scaling” configuration in your cloud platform (Google Cloud, AWS, Azure, or Digital Ocean, provide this feature), or using a modern serverless infrastructure, for the datastore, it is not so easy.
Databases are often the real bottleneck for data intensive systems because they need to support a big big number of write requests per second.
Write requests can hardly be scaled.
I talked about database scalability in a recent article: https://inspector.dev/how-i-handled-the-scalability-of-the-sql-database-at-inspector/
Yes there are many technologies that claim their ability to “infinite scale”. Think about Elastic, Scilla DB, SingleStore, Rockset, MongoDB, and many many more. Perhaps technically they can do it without problems, but that the costs are compatible with your business constraints is far from obvious.
Here comes the Rate Limiter.
In Inspector the Rate Limiter protects the datastore from inadvertent or malicious overuse by limiting the rate at which an application can store monitoring data.
Without rate limiting, each application may make a request as often as they like, leading to “spikes” of requests that starve other consumers. Once enabled, rate limiting can only perform a fixed number of write requests per second against the datastore. A rate limiting algorithm helps automate the process.
But a monitoring system can’t lose data. It would mean generating fake metrics. But at the same time it should be capable to store all data without breaking down the entire system at a reasonable costs.
For this reason, requests that exceeds the limit are not lost, but they are re-scheduled again onto the messages queue, waiting for a time window with free capacity.
Fixed window algorithm divides the timeline into fixed-size windows and assigns a counter to each window.
Each request, based on its arriving time, is mapped to a window. If the counter in the window has reached the limit, requests falling in this window should be rejected.
The current timestamp floor typically defines the windows, so if we set the window size to 1 minute. Then the windows are (12:00:00 – 12:01:00), (12:01:00 – 12:02:00), etc.
Suppose the limit is 2 requests per minute:
Request at 00:00:24 and 00:00:36 increase the window’s counter up to 2. The next request that comes at 00:00:49 is re-scheduled because the counter has exceeded the limit. Then the request coming at 00:01:12 can be served because it belongs to a new window.
There are two main downsides to this algorithm:
Many consumers waiting for a reset window
If a window becomes too busy, the entire capacity can be consumed in a single second, overloading the system (e.g. during peak hour like black Friday sale).
A burst of traffic that occurs near the boundary of a window can result in twice the rate of requests being processed
Suppose, the counter is empty, and 10 requests spikes arrive at 00:00:59, they will be accepted. If another 10 requests spike arrives at 00:01:00, they would be accepted too since this is a new window and the counter will be set to 0 for this window. That would mean the sever is now handling 20 requests in a few seconds (not really 10 requests/minute).
Sliding window counter is similar to fixed window but it smooths out bursts of traffic near the boundary by adding a weighted count in previous window to the count in current window.
Let me show you a real example.
Suppose a new request arrives at “1:15”. To decide, whether we should accept this request or deny it will be based on the approximation.
The current rate will be calculated considering the weighted sum below:
limit = 100 requests/hour rate = 84 * ((60-15)/60) + 36 = 84 * 0.75 + 36 = 99 rate < 100 hence, the request will be accepted.
As discussed in this article we didn’t use the rate limiting to control the incoming traffic in a public API , but we used it internally to protect the datastore against burst of data.
We started with the fixed window and now we moved to the sliding window algorithm improving the speed at which developers see data available in their dashboard.
Are you looking for a “code-driven” monitoring tool instead of having to install things at the server level?
Get a monitoring environment specifically designed for software developers avoiding any server or infrastructure configuration.
Thanks to Inspector, you will never install things at the server level or make complex configurations in your cloud infrastructure.
Inspector works with a lightweight software library that you can install in your application like any other dependencies. Checkout the supported technology on our GitHub (https://github.com/inspector-apm).
Visit our website for more details: https://inspector.dev
Also published here