3 Ways to Avoid that Huge Cloudwatch Bill

Written by lroberts | Published 2020/10/10
Tech Story Tags: cloudwatch | billing | aws | costs | elk | reduction | amazon-web-services | amazonwebservices

TLDR It is surprisingly easy to rack up a bill while using CloudWatch if you’re not cautious. The most important rule to keeping your CloudWatch bill down is keeping only what you need. There are a few tips there are also some smaller things you should always keep in mind when working with CloudWatch. For example CloudWatch Insights query: This query took me two seconds to write, then scanned over 13 different log streams with a total of almost 46,000 records. We're working on a fairly small scale here, just some API Gateway logs.via the TL;DR App

Anybody with a bit of experience working with AWS has had that time at the start of the month where you get that shockingly high bill in your inbox.
You head over to the billing dashboard to explore what’s gone on, see a few things you’d expected to be a little higher than usual, but… what’s that? Double digits from CloudWatch? Maybe even triple if you’re working in larger scales? That can’t be right.
Unfortunately it is surprisingly easy to rack up a bill while using CloudWatch if you’re not cautious, but luckily for us it’s just as easy to work on preventing the same thing!

1. Watch that Retention Period

The most important rule to keeping your CloudWatch bill down is keeping only what you need. If you've ever ran an application that dumps all of its logs into CloudWatch, you probably went back a month later and realized you're holding onto gigabytes of logs. It’s annoying to find them and manually clear them out, but luckily there’s a simple solution available: retention periods.
Whether you’re setting up your CloudWatch streams through the AWS console or through IaC tooling, there should be a method to adjust retention times. To do so on the AWS console, go to the list of log groups through the left-side navigation on the CloudWatch dashboard, then click on the value in the “Expire Events After” column for your log group. By default, this will be set to “Never Expire”, so you can see why we want to adjust this!
You’ll then be given a number of options for how long to set your retention rates:
When you are configuring your retention periods, however, ensure you put some good thought into them. While you probably don’t want to be holding all logs for a year, you also don’t want to risk accidentally deleting week-old production logs just for a customer to say “Hey, I saw this bug a week ago and only decided to report it now!”
A good tip is to remember that you can always lower it later on and delete logs, but you can never bring logs back from deletion, so set it high to begin with and lower it once you’ve got a greater understanding of your business requirements.

2. Query Cautiously

Anyone who’s had fun with running every kind of query they can think of on an SQL database, selecting every column from one table, joining on five more, all without a care in the world, might come over to CloudWatch and do the same.
However, we need to realise that CloudWatch is both very powerful, and a managed tool - meaning costs can be higher and they can scale much quicker. Let’s take a look at an example CloudWatch Insights query:
This query took me two seconds to write, then scanned over 13 different log streams with a total of almost 46,000 records. Luckily we’re working on a fairly small scale here, just some API Gateway logs for a few low-traffic Lambda functions, but for somebody just two seconds away from scanning over gigabytes of logs, the cost can add up quickly. Some good tips for preventing making queries like this get out of hand are:
  • Putting reasonable time ranges on your queries (unlike me above!).
  • Know the scale of contents inside the log group(s) you’re accessing.
  • Avoid querying too many log groups at once, unless you’re sure of what you’re doing.
However, don’t let this discourage you from exploring and experimenting. CloudWatch, and especially Insights, is a fantastically powerful service and it can help greatly in debugging, monitoring, and researching.
Realistically, your costs aren’t going to go skyrocketing just because you spent an evening querying your personal application logs. Worries like this are only for those operating in a larger scale environment.

3. Small Tips to Remember

On top of these core tips there are also some smaller things you should always keep in mind when working with CloudWatch.
Of course we’re charged for data ingestion, storage, and querying, but there are a few more costs too: the AWS free-tier only covers 10 CloudWatch alarms before you’ll start being charged, as well as only 3 dashboards - so if these are tools you and your team use a lot, just make sure you keep these values in mind when you’re making a new dashboard every time when you could simply modify an existing one!
One more important tip vital for lowering costs across AWS services is setting up billing alarms - these can be set up to notify you whenever your bill goes higher than a set amount, or on a more specific per-service level, very convenient for designating set budgets for each service you use! These can be created within the CloudWatch “alarms” section, by creating an alarm on a billing metric.
If you do have logs that are accessed, manipulated, and visualized regularly, it might be a consideration for you to start preloading your logs into an ELK stack rather than directly querying them inside CloudWatch. This will provide you with a cheaper per-query system as well as a number of more options for your queries.
As long as you realise that unfortunately due to the high cost of a managed ELK stack or the expertise required to self-host, this can lend them to only being truly viable at a higher scale as the cost will end up higher than that of your CloudWatch bill.
However if you’re using noticing a significant buildup of logs from your ELB(s), here are a number of tips to make the most of them with CloudWatch!
Overall, we have to remember that managing CloudWatch costs is a battle between functionality and efficiency, through both configuration such as setting up retention periods on logs, and behaviour - the way we use our tools.

Written by lroberts | Fresh, homemade, organic web engineer with a love for niche browser techs.
Published by HackerNoon on 2020/10/10