Since AWS doesn’t support auto-scaling for kinesis streams most of the time we either over provision the shards and pay more or under provision them and taking a hit on the performance.
Using a combination of cloud watch alarms, SNS topics and Lambda we can implement auto-scaling for kinesis streams through which we can manage the shard count and hitting the right balance between cost and performance.
Both scale up and scale down by implemented by monitoring “PutRecords.Bytes” metrics of the stream:
Scale up (doubling of shards) will automatically happen once the stream is utilized more than 80% of its capacity at least once in the given 2 minutes rolling window.Scale down (reducing the shards into half) will automatically happen once the stream is not utilized more than 40% of its capacity at least 3 times in the given 5 minutes rolling window
Note:
We can also use “IncomingBytes” or “IncomingRecords” to implement the same.
We scale up really quick so we don’t take a hit on performance and scale down slowly so we can avoid too many scale up and scale down operations.
Determine % of utilization for kinesis streams:
Lets say:
Payload size = 100 KB
Total records (per second) = 500
AWS recommended shard count = 49
To determine whether its 80% utilized we do the following:
Total bytes max can be written in 2 minutes = (((100 * 500)*60)*2) = 6,000,000 KB
80% of it would be = 4,800,000 KB
Similarly 40% of it would be 2,400,000 KB
The below diagram shows the flow when we perform a scale out operation
Configuration for “Scale out alarm” would be:
So when we reach 80% of the stream capacity the following happens:
The below diagram shows the flow when we perform a scale in operation
Note:
40% value is calculated for 15 minute interval = (100*1000*15*60)*40/100 = 36,000,000
Configuration for “Scale in alarm” would be:
So when we utilize only 40% of the stream capacity the following happens:
The attached code “Lambda.js” is written in nodeJs and just for demo purpose so instead of dynamically calculating the threshold value its hardcoded, but the same code can be enhanced to determine all of things mentioned in this post
In order to get 1000 TPS for payload of record size 100KB we need to pay 1223.62$, by this approach we can control scale up and scale down which directly reduces the cost of these AWS resources
I am happy to take any feedback or improvements to this approach, thanks for reading it.
Previously published at https://medium.com/@hariohmprasath/autoscaling-with-kinesis-stream-cogs-reduction-dfd87848ce9a