One of the most common concerns when moving to the cloud is cost. Given that cloud allows you to turn IT costs from CAPEX ( ) into OPEX ( ), it’s crucial to choose the right service and estimate it properly. In this article, we’ll look at the common pitfalls and discuss how you can avoid them to truly benefit from the cloud’s elasticity. long-term investments ex. in hardware equipment and software licenses day-to-day operating expenses #1 Following the lift and shift approach The lift and shift approach means that you are moving an of your workload to the cloud with . Even though this pattern may be useful , it may lead to suboptimal usage of your resources. AWS acknowledged that this is a difficult problem by creating services to make this migration easier ( ). Still, for the best possible resource utilization, it’s best to consider rearchitecting your solution for the cloud. exact copy as few changes as possible if you want to move to the cloud quickly CloudEndure Migration and AWS Server Migration Service With lift and shift, you are potentially leaving a lot of money on the table, when looking at it . You would also likely miss out on many benefits your cloud provider can offer. For instance, when choosing fully-managed AWS Aurora over a traditional Postgres instance, you can gain ( ) 3x more throughput, storage autoscaling, and low-latency read replicas. This may be the reason why Aurora is currently one of the most popular and fastest-growing services on AWS. long-term among others #2 Not tagging your resources It’s difficult to improve something if you don’t have enough data to make an informed decision about it. If you have no way of tracking how your cloud resources perform and how much costs they incur, it’s difficult to optimize their utilization. It’s considered a best practice to or to correctly allocate costs to the corresponding services. tag your resources based on projects organizational units #3 Failing to monitor resource usage over time Managing cloud architecture is not a one-off process. It’s a continuous practice of monitoring and evaluating what you use, how you use it, and why. Perhaps your original assumptions about the growth of a specific application turned out to be not entirely right and making a change could significantly lower costs. For instance, consider an with many more nodes than needed. Perhaps moving to a serverless version ( ) makes more sense in such a scenario. overprovisioned Kubernetes cluster EKS on Fargate Leaving running unmonitored is not as uncommon as you may think. In a larger organization, it can happen that some projects get abandoned and the corresponding resources remain active due to incomplete handover processes. “zombie” resources #4 Always doing everything yourself As software engineers, we may sometimes be tempted to building our own custom solutions and services for everything. A potentially better approach is to first do proper research of what’s already available. Examples: Perhaps you don’t need this and can instead use a fully managed RDS which can help you scale and operate the instance much easier? self-hosted database on EC2 Or maybe you don’t need this instance and can instead use the battle-tested serverless message queue SQS? self-managed RabbitMQ In general, if there is a serverless or fully-managed solution, it makes sense to at least consider it before investing too much time and effort into your own solution that you would have to maintain entirely yourself. #5 Using only tools you are familiar with Often when reading some Reddit or blog posts, I see many engineers who are reluctant to use serverless or container orchestration platforms simply because all they know is EC2 and manually administered servers. They assume that it’s all just a that will “come and go” and there is, therefore, no need to change your ways. This implies that there is no merit in moving to container orchestration platforms, serverless and other cloud services. This seems to be a close-minded approach. It’s better to challenge our assumptions and judge new technologies with clear facts, costs, and performance benchmarks, rather than by skepticism towards what’s new. new technology #6 Not making use of serverless and container orchestration platforms If you would create an EC2 instance for every service and tool you manage, you would likely end up in a maintenance nightmare. But if you instead deploy each of your services to a container deployed to a Kubernetes (EKS) or Fargate (ECS) cluster, you can due to dynamic port mapping and more compact resource utilization of containers ( ). allocate much more resources into a single server instance ex. shared layers Container orchestration platform will help you ensure that you balance the load between the instances and that your workloads will stay healthy. They take the capacity guesswork, to some extent, out of the picture. You can specify how many container instances should be running at all times and the control plane will ensure that it happens, just as you defined it. If you can easily load balance your workload across many containers or serverless resources, then you no longer have to guess which EC2 or RDS instance size will be appropriate for your use case. #7 Not taking TCO into account If you only consider the hardware or service costs, you may end up thinking that many resources can be more cost-effective on-prem. But if you add up the costs of additional maintenance, upgrades, and employees managing those servers, that’s an entirely different story. #8 Thinking short term If you scale your resources purely based on your current situation, you may fail to take into account how your needs may change in the future. What if your business and data grow much faster? What if it turns out to be the opposite? Is your application still easy to change and adapt to unknown future scenarios? And finally, will you be able to find and retain enough employees that can operate around those needs in the long run? #9 Overprovisioning everything “just in case” On the other extreme, if you want to be cautious, you may be tempted to overprovision everything to make sure you are ready for usage spikes. It’s a good strategy provided that you can justify the spikes based on past usage patterns. But it can be a bad strategy if you are doing it out of gut feeling. Cloud allows elasticity in the sense that you can add nodes to your clusters, load balance the workload across more containers, or increase the number of vCPUs or memory size when you see the need for it. If configured and monitored properly, there is no need to overprovision anything. I’m not saying that right-sizing is easy ( ), but with and in place, it’s doable and can significantly save costs, especially when operating numerous resources at scale. far from it good processes automation Overprovisioned prod resources #10 Choosing the wrong datastore Sometimes the bottlenecks are not the compute resources, but rather a poorly chosen data store. It’s good to consider: whether you need a rich query language (SQL) or perhaps your application can do just fine with a simple key-value store (ex. DynamoDB), whether you need a database in the first place; perhaps a simple S3 data dump is enough. It’s naturally use-case dependent, but the . databases often constitute the main bottleneck of any scalable architecture How to mitigate the right-sizing problem? One possible solution to optimize your cloud resource utilization is to leverage automation. For instance, with Dashbird, you can keep track of your under- and overprovisioned resources and get notified about them. When using the , we can find out that our ECS cluster with EC2 instance type ( ) had a CPU utilization of over 90% within the last hour. well-architected lens dashboard non-serverless data plane Well-architected lens dashboard Then, we can drill down into specific time intervals and inspect further why this spike occurred. Underprovisioned ECS cluster reaching the CPU capacity limits At the same time, another containerized service may be overprovisioned, potentially leaving money on the table. Having this information allows you to optimize your resource configuration based on the actual usage patterns. Overprovisioned ECS service Conclusion In this article, we investigated common pitfalls when sizing your cloud resources and discussed how to avoid them to truly benefit from the cloud’s elasticity. By making use of container orchestration platforms, serverless and fully-managed solutions, and by continuously monitoring your usage patterns over time, you can optimize your architecture for performance and costs. Previously published at https://dashbird.io/blog/sizing-cloud-resources-mistakes/