Cloud is awesome, but it can hurt really bad when it bites back on your wallet. Cloud is awesome: almost-100% availability, near-zero maintenance, pay-as-you-go, and above all, infinitely scalable. But the last two can easily bite you back, turning that awesomeness into a billing nightmare. And occasionally you see stories like: _This is a real story. One that is still unfolding.. Within a week we accumulated a bill close to $10K due to a…_medium.com Lambda programming errors that could cost you thousands of dollars a day! And here I unveil a few tips that we learned from our not-so-smooth journey of building , that could help others to avoid some “interesting” pitfalls. the world’s first serverless IDE Careful with that config! One thing we learned was to never underestimate the power of a configuration. If you read the above linked article you would have noticed that it was a simple misconfiguration: a that was writing logs to one of the buckets it was already monitoring. CloudTrail logging config You could certainly come up with more elaborate and creative examples of creating “service loops” yielding billing black-holes, but the idea is simple: AWS is only as intelligent as the person who configures it. Welcome to Infinite Loop (Well, in the above case it was one of my colleagues who configured it, and the one who validated it; so you can stop here if you feel like it ;) ) I was So, when you’re about to submit a new config update, try to rethink the consequences. You won’t regret it. It’s S3, not your attic. that 7% of cloud billing is wasted on “unused” storage — space taken up by content of no practical use: obsolete bundles, temporary uploads, old hostings, and the like. AWS has estimated Life in a bucket However, it is true that cleaning up things is easier said than done. It is way too easy to forget about an abandoned file than to keep it tracked and delete it when the time comes. Probably for the same reason, S3 has provided — time-based automated cleanup scheduling. You can simply say “delete this if it is older than 7 days”, and it will be gone in 7 days. lifecycle configurations This is an ideal way to keep temporary storage (build artifacts, one-time shares etc.) in check, hands-free. Like that daily garbage truck. Lifecycle configs can also become handy when you want to delete a huge volume of files from your bucket; rather than deleting individual files (which in itself would ), you can simply to expire everything in 1 day. Sit back and relax, while S3 does the job for you! incur API costs — while deletes are free, listing is not! set up a lifecycle config rule {"Rules": [{"Status": "Enabled","Prefix": "","Expiration": {"Days": 1}}]} Alternatively you can move the no-longer-needed-but-not-quite-ready-to-let-go stuff into ; say, for stuff under the subpath : Glacier, for a fraction of the storage cost archived {"Rules": [{"Filter": {"Prefix": "archived"},"Status": "Enabled","Transitions": [{"Days": 1,"StorageClass": "GLACIER"}]}]} But before you do that… Ouch, it’s versioned! (Inspired by true events.) I put up a lifecycle config to delete about 3GB of bucket access logs (millions of files, obviously), and thought everything was good — until, a month later, I got the same S3 bill as the previous month :( Turns out that the bucket had had versioning enabled, so . deletion does not really delete the object So with versioning enabled, you need to explicitly tell the S3 lifecycle logic to: , and discard non-current (deleted) object versions expire old delete markers in order to completely get rid of the “deleted” content and the associated . delete markers So much for “simple” storage service ;) CloudWatch is your pal Whenever you want to occupied by your buckets, just iterate through your . There's no way—suprise, surprise—to check bucket size natively from S3; even the S3 dashboard relies on CloudWatch, so why not you? find out the total sizes [AWS/S3](https://docs.aws.amazon.com/AmazonS3/latest/dev/cloudwatch-monitoring.html) CloudWatch Metrics namespace Quick snippet to view everything? (uses and on ) [aws-cli](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html) bc bash yesterday=$(date -d @$((($(date +%s)-86400))) +%F)for bucket in `aws s3api list-buckets --query 'Buckets[*].Name' --output text`; dosize=$(aws cloudwatch get-metric-statistics --namespace AWS/S3 --start-time ${yesterday}T00:00:00 --end-time $(date +%F)T00:00:00 --period 86400 --metric-name BucketSizeBytes --dimensions Name=StorageType,Value=StandardStorage Name=BucketName,Value=$bucket --statistics Average --output text --query 'Datapoints[0].Average')if [ $size = "None" ]; then size=0; fiprintf "%8.3f %s\n" $(echo $size/1048576 | bc -l) $bucketdone EC2: sweep the garbage, plug the holes EC2 makes it trivial to manage your virtual machines — compute, storage and networking. However, its simplicity also means that it can leave a trail of unnoticed garbage and billing leaks. EC2: Elastic Compute Cloud Pick your instance type There’s a plethora of settings when creating a new instance. Unless there are specific performance requirements, picking a with and 2–4 GB of RAM would suffice for most needs. T2-class instance type Elastic Block Store (EBS)-backed storage Despite being free tier-eligible, can be a PITA if your server could receive compute-or memory-intensive loads at some point; in these cases tends to simply freeze (probably has to do with ?), causing more trouble than it's worth. t2.micro t2.micro running out of CPU credits Clean up AMIs and snapshots We habitually tend to take periodic snapshots of our EC2 instances as backups. Some of these are made into for reuse or . Machine Images (AMIs) sharing with other AWS users We easily forget about the other snapshots. While snapshots , they can add up to significant garbage over time. So it is important to periodically visit and clean up your . don’t get billed for their full volume sizes EC2 snapshots tab Moreover, creating new AMIs would usually mean that older ones become obsolete; they can be “deregistered” from the as well. AMIs tab But… Who’s the culprit — AMI or snapshot? The actual charges are on , not on AMIs themselves. snapshots And it gets tricky because . deregistering an AMI does not automatically delete the corresponding snapshot You usually have to copy the AMI ID, go to snapshots, look for the ID in the description field, and nuke the matching snapshot. Or, if you are brave (and lazy), select and delete snapshots; AWS will prevent you from deleting the ones that are being used by an AMI. all Likewise, for instances and volumes Compute is billed while an EC2 instance is running; but its is billed all the time — right up to deletion. storage volume Volumes usually get nuked when you terminate an instance; however, if you’ve played around with volume attachment settings, there’s a chance that detached volumes are left behind in your account. Although not attached to an instance, these still occupy space; and so AWS charges for them. Again, simply go to the , select the volumes in “available” state, and hit delete to get rid of them for good. volumes tab Tag your EC2 stuff: instances, volumes, snapshots, AMIs and whatnot Tag ‘em. It’s very easy to forget what state was in the instance, at the time that snapshot was made. Or the purpose of that running/stopped instance which nobody seems to take ownership or responsibility of. Naming and tagging can help avoid unpleasant surprises (“Why on earth did you delete that last month’s prod snapshot?!”); and also help you quickly decide what to toss (“We already have an 11–05 master snapshot, so just delete everything older than that”). You stop using, and we start billing! Sometimes, the AWS Lords work in mysterious ways. For example, are free as long as they are attached to a running instance. But they start getting charged by the hour, as soon as the instance is stopped; or if they get into a “detached” state (not attached to a running instance) in some way. Elastic IP Addresses (EIPs) Some prior knowledge about the service you’re about to sign up for, can prevent some nasty surprises of this fashion. A quick pricing page lookup or google can be a deal-breaker. Pay-per-use vs pay-per-allocation Many AWS services follow one or both of the above patterns. The former is trivial (you simply pay for the time/resources you actually use, and enjoy a zero bill for the rest of the time) and hard to miss; but the latter can be a bit obscure and quite easily go unnoticed. Consider EC2: you mainly pay for but you also pay for the storage (volumes, snapshots, AMIs) and allocations (like Elastic IPs) even if your instance has been stopped for months. instance runtime network inactive There are many more examples, especially in the (which we ourselves are incidentally more familiar with): serverless domain charges — even if all your shards are idle Kinesis by shard-hour charges for — thankfully there’s a non-expiring free tier! DynamoBB storage and read/write in terms of “capacity units” (very similar to EC2) charges for instance runtime, whether busy or idle ( seems to be trying to change this to some extent) RDS Aurora Serverless charges whether you use it or not KMS a flat fee for each customer-managed key (CMK) Each block adds a bit more to your cost. Meanwhile, some services secretly set up their own monitoring, backup and other “utility” entities. These, although (probably!) meant to do good, can secretly seep into your bill: DynamoDB sets up ; these are left behind even after the corresponding tables have been deleted (at least when managed via CloudFormation). CloudWatch Alarms RDS automatically creates , at termination as well as during daily maintenance (esp. when deployed via the ; these can easily add up over your storage quotas instance volume snapshots “default” CloudFormation configs These are the main culprits that often appear in our bills; certainly there are better examples, but you get the point. AWS CloudWatch (yeah, again) Many services already — or can be configured to — report to CloudWatch. Hence, with some domain knowledge of which metric maps into which billing component (e.g. S3 storage cost is represented by the summation of the metric across all entries of the namespace), you can build a complete billing and monitoring solution around (or delegate the job to a third-party service like ). usage metrics BucketSizeBytes AWS/S3 CloudWatch Metrics DataDog CloudWatch CloudWatch in itself is , and its metrics have automatic so you don’t have to worry about overwhelming it with age-old garbage — or getting overwhelmed with off-the-limit capacity bills. mostly free summarization mechanisms The Billing API Although AWS does have a dedicated , logging in and checking it every single day is not something you would add to your agenda (at least not for API/CLI minds like you and me). Billing Dashboard Luckily, AWS offers a whereby you can obtain a fairly granular view of your current outstanding bill, over any preferred time period — broken down by services or actual API operations. billing API Catch is, this API is not free: each invocation costs you $0.01. Of course this is negligible — considering the risk of having to pay several dozens — or even hundreds or thousands in some cases — it is worth having a $0.30/month billing monitor to track down any anomalies before it’s too late. Food for thought: with support for offered , one might be able to set up a serverless workflow that logs into the AWS dashboard and checks the bill for you. Something to try out during free time (if some ingenious folk hasn’t hacked it together already). headless Chrome for Google Cloud Functions Billing alerts Strangely (or perhaps not ;)) AWS doesn’t offer a way to put up a hard limit for billing; despite the and disturbing incident reports all over the web. Instead, they offer for various billing “levels”; you can subscribe for notifications like “bill at x% of the limit” and “limit exceeded”, via email or SNS (handy for !). numerous user requests alerts automation via Lambda this is a for every AWS account. If we had one in place, we could already have saved well over thousands of dollars to date. My advice: must-have Don’t wait till they become worthless pieces of plastic. Organizational accounts If you want to delegate AWS access to third parties (testing teams, contract-basis devs, demo users etc.), it might be a good idea to create a sub-account by converting your root account into an enabled. AWS organization with consolidated billing (While it is possible to do almost the same using an , it will not provide resource isolation; everything would be stuffed in the same account, and painstakingly complex IAM policies may be required to isolate entities across users.) IAM user Our has so I’m gonna stop at that. CEO and colleague Asankha written about this quite comprehensively And finally: Monitor. Monitor. Monitor. No need to emphasize on this — my endless ramblings should already have conveyed its importance. So, good luck with that! Originally published at randomizd.blogspot.com on November 30, 2018.