64,297 reads

How to Earn $1 Million With AWS in One Year

by Gianpi ColonnaApril 28th, 2024

Too Long; Didn't Read

Slash your AWS cloud costs by 90%! Learn 4 steps to optimize spending: challenge assumptions, tune resources, use Graviton instances, and monitor usage.

featured image - How to Earn $1 Million With AWS in One Year

If you bumped into this page thinking you’re gonna get rich with some get-rich-quick scheme, I’m sorry to disappoint you. This article will rather talk about how to decrease your cloud cost bills by $1 million. By doing that, you will have essentially generated an extra million dollars in revenue — which you can spend buying my online course on how to get rich with AWS (link to course here).

Cloud cost is often overlooked and unaccounted for at the beginning of Companies’ projects. The 2021 HashiCorp survey found that almost 40% of companies overspent on cloud costs in 2021 [1]. In 2023, almost all companies (94%) admitted that they were wasting money on the cloud [1] and at least 30% of cloud cost was wasted [2]. Cloud spending was almost $500 billion in 2022 — therefore we’re talking about $150 billion wasted a year!!

Not only this is a concern of missed revenues but also poor sustainability practices. $150 billion of wasted energy!

These findings involve large enterprises as well as smaller ones, from high-cloud maturity to low-cloud maturity. It refers to AWS, but the same principles can be applied to any other cloud provider. So, if any part of your job is in the cloud, then this article is for you.

I’m speaking from a data engineer perspective, but the same learnings can be applied to other software engineering practices.

Let’s dive in.

What does it take to spend $1 million in cloud costs in a year?

This kind of cloud bill is usually restricted to very large enterprises that operate globally with millions of customers.

To give you an idea, a $1 million cloud bill can result from a Spark ETL job processing ~1.5Tb per hour 24x7 for 365 days a year. Another example might be an application that receives billions of requests a day from multiple locations in the world.

In a large enterprise, there are hundreds of applications at this size — resulting in billion-dollar contracts with cloud-providers. For example, Airbnb had a commitment to spend $1.2 billion on cloud resources over five years at the end of 2019 [3].

At Expedia we slashed costs for a data processing ETL costing $1.1 million dollar a year to a mere $100,000 a year by implementing optimisation practices. That’s a 91% cost reduction!!

Not all companies have applications of such a huge size but imagine cutting your cloud cost by 90% just for a single application or for your entire company.

How do we start saving?

STEP 1: Challenge your design assumptions

Go and get a list of your most expensive applications and challenge your design assumptions.

Are you building an application that has a 99.999% availability and sub-millisecond latency but realistically users would be good enough with a 99% availability and hundreds of millisecond latency?
Are you creating datasets with billions of rows but users would only be using aggregations of some of the measures?
Are you landing data real-time but data is only analysed once a day?
Are you refreshing the cache every 10 seconds but it’s only really changing across days?

All these questions go back to the most important question: how is the application going to be used? What’s the business value for it to exist? How is the application helping us to achieve a given goal?

Of course, all these answers are very often unclear at the beginning of a project; but that’s why design should always be an iterative process — allowing changes to happen as seamlessly as possible. Engineers should embrace evolution and change, aligning application development with impact.

STEP 2: Fine-tune your infrastructure resources to your needs

The second step consists of providing the application with the right resources and tuning it to the right infrastructure.

As an engineer, be aware of how cloud costs are calculated. For example, AWS provides spot instances, where you can bid for the cluster price — this is particularly useful if you have fault-tolerant and flexible applications. Use them if you can — AWS claims up to 90% reduction in costs [4].

Some other considerations you might want to address are:

Are you serving customers globally or only in one geographical area? Do you really need your infrastructure to live across the globe or can you set it up closer to your customer base?
Are you over-provisioning your cluster instances? Try to ensure there is enough capacity to handle peak loads without unnecessary costs. Utilise auto-scaling to dynamically adjust resources based on actual demand, preventing overpayment for idle resources.
If you’re working with data and Spark, make sure you understand Spark concepts and tuning! If you don’t, take a look at the following resources [5] [6] [7] [8] [9].

STEP 3: Use AWS Graviton instances

There are little to no drawbacks in utilising AWS Graviton instances. AWS has invested heavily in creating the most cost-effective processors. You can get up to 40% reduction in cloud spending just by switching from an intel-based processor to an ARM-based processor [10].

The only caveat to this is that your application needs to be compatible with the ARM-based processors that Graviton runs on. If you’re dealing with a managed service such as RDS or OpenSearch then there’s no complication at all in switching — AWS deals with the underlying OS and application compatibility. If you’re building your own application, then you might need to recompile the package depending on which language you’re using — Java and other languages require no change whereas Python requires some attention.

STEP 4: Monitor your cost spending and educate on cost awareness

Lastly, don’t forget to keep monitoring your costs for unexpected peaks and surprises. The cost on day 0 of your application will be different from the cost on day 170. Make sure you keep track of the changes, and you understand why the change is happening: is it stacking s3 storage costs or is it just a one-off spike?

Set up the necessary alerts and operational guidebooks!

Importantly, implement cost allocation tags to track spending by department, project, or environment. Avoid the risk of creating a data swamp where cost is untraceable or requires a long journey across different log systems. It should be quick and simple to go back to any given application cost.

Final thoughts

Wherever you are working, balancing the delivery of new features with the optimisation of current ones is hard. Who hasn’t been pressured to deliver new quirky features at the speed of light.

However, it is essential for both engineers and managers to make deliberate and proactive decisions about their current projects, managing risks and opportunities effectively.

L O A D I N G
. . . comments & more!

About Author

Gianpi Colonna@gianpicolonna

ML Engineer @ Expedia Group

Read my stories