Did you know that 90% of applications have 5x more resources than they actually need?* Overprovisioning or cloud cost savings may not be a top-of-mind issue for your deployments yet. But it’s likely to become a growing concern as your cloud bill grows.
And when the CFO and their financial team start asking you about how you allocate cloud expenses, better give them a good answer. You need to have a solid response to infrastructure utilization and cloud cost efficiency.
Compute resources are the biggest line items on your cloud bill. Selecting the right VM instance types is a key aspect of cloud resource optimization and in some cases can save you 50% of your overall compute bill.
If you’ve optimized your VM choices already, check our blog for more cost optimization tips. Perhaps your cloud bill needs some optimizing?
Table of contents:
1. It’s so easy to overprovision, here’s why
2. How to choose the right VM for the job in 6 steps
3. Saving on cloud costs – case study
4. Wrap up
* Christina Delimitrou and Christos Kozyrakis, “Quasar: resource-efficient and QoS-aware cluster management”, https://doi.org/10.1145/2644865.2541941.
Let’s say that you need a machine with 4 cores for your cluster. You can choose from some 40 different options, even in a single cloud scenario where you work with only one CSP.
How can you compare all these instances? All these machine shapes are just too hard for human minds to understand and analyze.
For example, take a look at the following pricing table from AWS:
… and this is just a portion of the table, for one region, and one OS type.
And we’re talking only about one cloud at this point. Things get exponentially more complicated when you go multi-cloud.
So, let’s say that you pick an instance that costs $0.19 per hour. It seems reasonable on paper, so you modify your Infrastructure as Code (AoC) scripts and deploy.
But when you deploy a compute-heavy application, you soon see that it ends up underutilizing other resources you’re paying for. You’re using only half of the memory allocated per node, underutilizing the network, and not even utilizing the SSDs that may be attached to the instance. This doesn’t help to maximize cloud cost savings at all.
If you took a deeper look into the CSP offering, you might have found another VM shape that could support your application for a much lower cost. Sure, it gives you less memory and network, but that’s absolutely fine by your application. And it costs much less – only $0.10 an hour.
This scenario isn’t made up – these are the results of a cost / performance test that we carried out on a demo e-commerce application. You’ll find the case study further down this article, so bear with us.
1. Define your minimum requirements
Your workload matters a lot when you’re choosing a VM instance type with cloud cost savings. You should be making a deliberate effort to order only what you need across all compute dimensions including CPU (and type x86 vs ARM), Memory, SSD, and network connectivity.
On the flip side, while an affordable instance might look tempting, you might suffer performance issues when you start running memory-intensive applications.
Identify the minimum size requirements of your application and make sure that the instance type you select can meet them across all dimensions:
Once you have a targeted set of instance types that fit your application needs, buying a “good enough” instance type might not be the best possible choice. For example, different VMs across different clouds have varying price vs. performance ratios. This means that it’s possible to get better performance for the cloud compute dollar, and save more on cloud costs overall.
Next, you select between CPU and GPU dense instances. Consider this scenario:
If you’re building a machine learning application, you’re probably looking for GPU dense instance types. They train models much faster than CPUs. Interestingly, the GPU wasn’t initially designed for machine learning – it was designed to display graphics.
In 2007, Nvidia came up with CUDA to help developers with machine learning training for deep learning models. Today, CUDA is widely adopted across most popular machine learning frameworks. So, training your ML models will be much faster with GPUs. To find out exactly why that is the case, check out this interesting post that dives into the details.
What about running predictions through your trained models? Can we achieve better price/performance with specialized instance types? CSPs are now introducing new instance types designed for inference – for example, AWS EC2 Inf. According to AWS, EC2 Inf1 instances deliver up to 30% higher throughput and up to 45% lower cost per inference than Amazon EC2 G4 instances.
Side note: We haven’t tested this claim at CAST AI yet, as this instance type is fairly new.
Unfortunately, the selection matrix gets more complicated.
2. Choose an instance type with cloud cost savings in mind
CSPs provide a wide range of instance types optimized to match different use cases. They offer various combinations of CPU, memory, storage, and networking capacity. Each type includes one or more instance sizes, so you can scale your resources to fit your workload’s requirements.
CSPs roll out different computers and the chips in those computers come with different performance characteristics.
You might be getting an older-generation processor that’s slightly slower – or a new-generation processor that’s slightly faster. You might choose an instance type that has strong performance characteristics that you don’t actually need, without even knowing it.
Reasoning about this on your own is hard. The only way to verify this is through benchmarking – dropping the same workload on each machine type and checking its performance characteristics.
This is one of the first things that we did at CAST AI when we started over a year ago. Here are two examples.
Example 1: Unpredictable CPUs within one provider
This chart shows the CPU operations in AWS (Amazon t2-2x large: 8 virtual cores) at different times after several idle CPU periods.
Source: CAST AIExample 2: Cloud endurance
To understand VM performance better, we created a metric “Endurance Coefficient” and here’s how we calculate it:
We measure how much work VM type can do in 12 hours and how variable CPU performance is. For a sustained base load, you’d want as much stability. For a bursty workload (website that experiences traffic once in a while or an occasional batch job), lower stability is fine.
You should make an informed decision since sometimes it’s not clear how much stability you’re getting for your money with shared-core, hyperthreaded, overcommitted, between generations or burstable credit-based VM types. In our calculation, instances with stable performance edge close to 100, and ones with random performance are closer to 0 value.
In this example, the DigitalOcean s1_1 machine achieved the endurance coefficient of 0.97107 (97%), while AWS t3_medium_st got only a weirdly shaped 0.43152 (43%) – not that it’s a burstable instance.
Source: CAST AILet’s get back to choosing the instance type
AWS, Azure, and Google all have these four instance types:
Each provider also offers instance types for GPU workloads under different names:
AWS
Azure
Accelerator-optimized – Based on the NVIDIA Ampere A100 Tensor Core GPU, these VMs get you up to 16 GPUs in a single VM. Great for demanding workloads like HPC and CUDA-enabled machine learning (ML) training and inference.
Check these links for more info:
A note about ARM-powered VMs
Recently, there has been a lot of attention around Apple’s M1 series, a new ARM-based System-on-Chip (SOC) the company announced in November 2020. According to Apple, the M1 will increase CPU performance by 3.5x and GPU performance by 6x.
It’s incredibly fast as it’s based on ARM architecture. Cloud providers already offer ARM-powered VMs like the AWS EC2 A1 family powered by their Graviton2 ARM processor.
Here’s what you need to know about ARM: it’s less power-hungry, so cheaper to run and cool – CSPs charge less for this type of processor. But if you’re toying with the idea of using it, you might need to re-architect your delivery pipeline to compile your application for ARM. If you’re running an interpreted stack like Python, Ruby or NodeJS, your apps will likely just run.
3. Consider the pros and cons of different pricing models
Your next step is picking the right pricing model for your needs. Cloud providers offer the following models:
But didn’t you go to the public cloud to avoid CAPEX in the first place? By choosing reserved instances or the AWS savings plan, you’re running the risk of locking yourself in with the cloud vendor and securing resources that might not make sense for you in a year or two. In cloud terms, 3 years is an eternity.
This is an evil practice on the part of CSPs. They lock customers in for a discount and cut them off from alternatives for years to come.
Note: Don’t forget about the extra charges. AWS, Azure, and GCP all charge for things like egress traffic, load balancing, block storage, IP addresses, and premium support among other line items. Take them into account when comparing instance pricing and building your cloud budget.
And each item deserves your attention.
Take egress traffic as an example. In a mono-cloud scenario, you’ll have to pay egress costs between different availability zones, which is in most cases $0.01/GB. In a multi-cloud setup, you pay a slightly higher rate, like $0.02 when using direct fiber (for the US/EU).
4. Take advantage of CPU bursting
Take a closer look at each CSP, and you’re bound to see “burstable performance instances.”
These are instances designed to offer teams a baseline level of CPU performance, with the option to burst to a higher level when your workload needs it. They’re a good match for low-latency interactive applications, microservices, small and medium databases, and product prototypes, among others.
Note: The amount of accumulated CPU credits depends on the instance type – larger instances collect more credits per hour. However, there’s also a cutoff to the number of credits you can collect, and larger instance sizes come with a higher cutoff.
Where can you get burstable performance instances?
AWS
Instance families: T2, T3, T3a, and T4g.Restarting an instance in T2 family = losing all the accrued credits.Restarting an instance in T3 and T4 = credits persevere for seven days and then are lost. Learn more here.
Azure
B series VMs of CPU bursting. When you redeploy a VM and it moves to another node, you lose the credits.If you stop/start a VM while keeping it on the same node, it retains the accumulated credits.Learn more here.
GCP
Shared-core VMs offer bursting capabilities: e2-micro, e2-small, e2-medium.CPU bursts are charged by the on-demand price for f1-micro, g1-small, and e2 shared-core machine types. Learn more here.
Our research into AWS showed that if you load your instance for 4 hours or more per day, on average, you’re better off with a non-burstable one. But if you run an online store that gets a stream of visitors once in a while, it’s a good fit.
Note: CPU capacity has its limits
We discovered that compute capacity tends to increase linearly during the first four hours. But after that, it becomes much more limited. The amount of available compute reduces almost by 90% until the end of the day.
Source: CAST AI - 5. Check your storage transfer limitations
Here’s another thing to consider when maximizing your cloud cost savings: data storage.
AWS EC2 instances use Elastic Block Store (EBS) to store disk volumes, Azure VMs use data disks,And GCP has the Google Persistent Disk for block storage. You get local ephemeral storage in AWS, Azure, and GCP too.
Every application has unique storage needs. When choosing a VM, make sure that it comes with a storage throughput your application requires.
Also, don’t opt for pricy drive options like premium SSD – unless you expect to employ them to the fullest.
6. Count in the network bandwidth
If you have a massive migration of data or a high volume of traffic, pay attention to the size of the network connection between your instance and the consumers assigned to it. You can find some instances that can bolster 10 or 20 Gbps of transfer speed.
But here’s the caveat: only those instances will support this level of network bandwidth.
We recently tested the CAST AI approach to cloud cost savings on an open-source e-commerce demo app adapted from Google.
Here’s what we did:
This is how we calculated the monthly costs of running our app on the AWS test cluster and compared our solution to see how it helps to increase cloud cost savings.
Note that our demo scenario assumes a fixed number of servers in the original EKS cluster and an instance type of m5.xlarge. The original cluster doesn’t use auto scaling or spot instances.
Originally, we selected the m5.xlarge VM that comes with 4 CPUs, 16 GB of memory and a high throughput Network Interface (~10 GiB). The cost of this instance type on-demand is $0.192/hour.
Then we launched CAST AI and let the magic happen.
Through automated analysis, our optimizer selected an alternative shape: a1.xlarge. This instance type has 4 CPUs and only 8 GB of RAM. The pods deployed per VM could easily fit into 8 GB. The additional 8 GB on our former instance (m5.xlarge) was a pure waste of RAM. We were able to select the a1 (ARM) processor by re-compiling our apps and building new container images.
Our new instance had a reduced network throughput. This wasn’t an issue for our app because we were not network bound. That is, the traffic generated with the available compute resources didn’t max out the bandwidth capacity of the selected instance.
And now comes the best part:
The a1.xlarge instance costs only $0.102/hour. This means that we instantly achieved cloud cost savings of 46% per compute-hour.
Compute is what you go to cloud providers for. It’s the biggest part of your cloud bill. So if you manage to optimize costs there, you’re on the right track to dramatically reducing your cloud bill.
Previously published at https://cast.ai/blog/how-to-choose-the-best-vm-for-the-job/