This is a short series that I wanted to share for a long time about the basics of “Cost Optimization” on AWS. Let’s start this journey with ! DocumentDB Don’t hesitate to 👏 if you liked this post ;) Okay, to be really honest, this title is *.* clickbait I could definitely write something like ” but it’s way less catchy, nah? “how I made cost optimization on our AWS infrastructure by respecting some commons guidelines provided in the documentation Maybe some of you guys will already know these tricks and good practices. If you’re looking straight for the checklist that I’m suggesting, scroll here. Understand the hell-tricky I/O cost from the DocumentDB service If you look at their pricing page, it’s divided by 4 costs dimensions, I resume it here : : well this cost is quickly understandable: you pay for an instance, and pricing will depend on its resources (CPU, RAM, etc.) The price of the instance : same as above, it’s fairly understandable, and we can track and estimate easily the costs of them, AWS bill per GB stored. Legit. Storage and backup storage costs : AWS will bill between and (depending on the region of the instance) The hell-tricky part is the database I/O 0.20$ 0.30$ per 1-million I/O !! So, what’s behind I/O’s? AWS explains that with the DocumentDB service, you don’t have to provision I/O resources in advance, which is kind of interesting, because you don’t have storage limitations and you can easily handle a pick of I/O operations. It seems fair, as you’re bill for the usage. AWS describes in their documentation what covers I/O operations, it’s mainly all operations like find, insert, update, and delete or some features like change streams and TTL (time to live) indexes. Well, everything that will hit the storage volume will be billed to you. Wait, what, 0.20$ per million I/O? Let’s make AWS lose money, right now! There’s a phrase on AWS DocumentDB documentation that will catch your eyes (and wallet 💸) : Once, once the data has been read from the storage volume and continues to reside in memory, subsequent reads of the same data do not incur additional I/Os. This phrase is key to understanding what’s behind I/Os. Which operations use less I/Os? will likely use fewer I/Os as you’re not scanning the all storage of your collection. It’ll certainly consume I/Os but way less than scanning an entire collection. Queries that use an index Furthermore, the RAM of your instance needs to cover your index size, it’ll allow you to not incur additional I/Os. Please have in mind that you need to respect some principles with index usage. Checklist ✅ Here’s my advice/checklist when you want to optimize your I/O usage and reduce your costs and improve performance. You’ll see that I'm not a genius as I just aggregate information from the AWS DocumentDB Documentation page with some common best practices that are not strictly applicable to DocumentDB. It’s always good to refresh our minds with principles. 🧠 : fewer I/O’s= cheaper = better performance, here it’s not all about costs or not all about performances, but the two things are linked. First, remember this ❌ : you don’t know how expensive is an unused index for a busy collection. 🤌 , by deleting unused indexes. And it’s very easy to track unused indexes with this query: Remove unused indexes I made my company save 2,000$/month just like that Index Stats query The query will output the field which is corresponding to the number of times that your index is hit. Depending on the load of you’re application, please consider removing the unused index. ops 🧐 : if you use RDS, you might be aware of , it gives you some very helpful metrics and information about the queries that are hitting your DocumentDB performance, and you can quickly see the queries that consume I/Os operations (and the amount of them), so it’s very good to track easily a bottleneck. Another way to monitor slow queries or collscan queries is by activating , as the name suggests it’s profiling for you some operations (here’s a link to get more info: ), you can set a threshold which will put on CloudWatch a log of an operation that is taking more than . Very useful to track the number of queries that are performing COLLSCAN for example. Please activate both of these options as they’re very valuable! Activate and performance insights profiling operations performance insights Profiling operations n ms 💾 : you’ll need to identify the best high-cardinality field that you want to index, if you’re not used to the concept of index cardinality, the documentation of AWS DocumentDB is well explained :) Look always first at your data 🫠 : if you plan to have a collection that will have three fields with one of them with a unique key, and if you’re planning to perform a lot of updates/inserts, please consider the modelization of your collection, because your I/O ops will hit like hell and so your I/O usage. Avoid small tricky collections ⏱️ : (most of the time) you can handle it without setting a time-to-leave index, so please check that the TTL parameter is not enabled on the instance or cluster. Avoid TTL, aka time-to-leave indexes 💡 A very simple way to check the index selectivity of the query planner when you’re making a new query (or not) is to perform an operation with the parameter. You’ll be surprised that some queries that you’re thinking hit index, just don’t hit any index… Explain! explain executionStats ☯️ . Just don’t. Remember cardinality. Don’t create an index for a boolean field ⚖️ for each collection that you have with this command: An extreme average size can create quickly a lock on your queries and increase I/Os ops because the RAM of you’re instance is not enough. Please monitor closely objects and not store unnecessary fields. If you need to store many fields, consider optimizing queries by not selecting all the fields. Monitor the average size of an object db.<mycollection>.stats(1024) ⚠️ It’s mainly compatible with MongoDB but it’s not MongoDB as there are some shitty specific behaviors. For example, if you want to perform a query with the operator, you’ll need to `hint()` you’re index, as it is mandatory. The exclusion operators will never use any index, so please consider these behaviors when making or optimizing your indexes! Be aware that DocumentDB is not MongoDB. $regex 👉 . Except for the very-specific use-cases mentioned above, you should avoid the usage of , have in mind that if the query planner doesn’t elect your index, it’s for a good reason. Most of the time it’s because it’s longer or equivalent to scanning the index instead of all the documents from the collection. Never hint hint Hope you’ll appreciate these tricks that I learned while working on AWS Cost Optimization for my company. Stay tuned for another post! Don’t hesitate to 👏 if you liked this post ;) PS: if something seems wrong or misunderstanding, don’t hesitate to DM me. Also Published Here

I made AWS Lose Money - Here's How!

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

10 Lessons from 10 Years of AWS (part 1)

10 Lessons from 10 Years of AWS (part 2)

Top 10 AI Development Companies in USA

12 Strategies to Reduce Amazon S3 Costs

17 of the Best Amazon Web Services (AWS) for Web Developers to Learn

3 Risk-Mitigation Lessons That We Learned The Hard Way This Year

10 Lessons from 10 Years of AWS (part 1)

10 Lessons from 10 Years of AWS (part 2)

Top 10 AI Development Companies in USA

12 Strategies to Reduce Amazon S3 Costs

17 of the Best Amazon Web Services (AWS) for Web Developers to Learn

3 Risk-Mitigation Lessons That We Learned The Hard Way This Year

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps