Amazon Simple Storage Service (referred to as AWS S3) is AWS’s cloud-based file storage platform. Amazon S3 is one of Amazon’s most widely used services, but it’s also one of the easiest to overspend on. If you’re not familiar with AWS S3’s pricing model or taking advantage of lifecycle policies, you’re likely paying more for storage than you need.
In this AWS S3 Pricing and Cost Optimization Guide, you’ll learn the following:
Before you start optimizing your S3 buckets, you need to understand Amazon S3’s pricing model. At a high level, you are charged based on five factors:
AWS S3 Six Storage Classes. Source: https://aws.amazon.com/s3/cost-optimization/
The amount you will pay for each of these factors depends on your S3 storage class. Using the correct S3 storage class is the easiest way to help control AWS S3 cost, so it’s important to start here. You can set the S3 storage class at the S3 bucket or individual object level.
S3 Standard
The Standard tier is the default class, and is the best option for most frequently accessed files. Retrieving files will take a matter of milliseconds, but this tier is the most expensive for ongoing file storage.
S3 Intelligent-Tiering
If you work with files over 128 KB, you can use AWS S3 Intelligent-Tiering to automatically classify your files into the best access tier based on their usage. It monitors your files and moves infrequently accessed files to Infrequent Access or Glacier tiers for you. Files smaller than the 128 KB minimum will be kept in the Standard tier for frequent access when using Intelligent-Tiering.
S3 Infrequent Access
For files that you need less frequently, you can cut your storage costs in half by putting them in S3 Infrequent Access (aka S3 Standard-IA) or One Zone-Infrequent Access tier. While these tiers offer millisecond access time, you will pay about twice as much for data retrieval as you would in the Standard tier.
The S3 Infrequent Access tier offers redundancy, making it good for backups or inactive account data that you might need to retrieve occasionally and don’t want to lose.
One Zone-Infrequent Access
The S3 One Zone-IA tier is 20 percent cheaper than the Standard-IA, but offers no redundancy. This makes it less useful for critical data like backups, but it’s appropriate for storing infrequently used data that you can recreate from the original source. For example, you might use One Zone-IA to store thumbnail copies of inactive user profile pictures.
AWS S3 Glacier
AWS S3 Glacier Archive tier offers resilient storage at a fraction of the price of the Standard tier but requires you to settle for one minute to twelve-hour retrieval times. You’ll have to pay more to retrieve your data and there’s a ninety-day minimum time for storage. These limitations make Amazon Glacier appropriate for things like storing original versions of large media files that are distributed in an encoded format. You might occasionally need to retrieve the originals, but likely not more than once per month.
S3 Glacier Deep Archive
For an even deeper discount on your storage rate, you can try S3 Glacier Deep Archive. It takes twelve hours to retrieve data from a Deep Archive, and files must be stored for 180 days at a minimum. As your data is even more expensive to retrieve, this tier is most appropriate for compliance data you are required to keep, but only need to access once or twice per year at most.
S3 Pricing by Region
Another way to influence pricing in Amazon S3 is to select the cheapest region that makes sense for your application. Some AWS regions are as much as 50 percent cheaper, but you have to be careful if you pick a region purely based on price.
First, latency between your S3 storage and other AWS resources could be a problem. If most of your infrastructure is hosted in South America, but your S3 files are in Virginia, it could add precious milliseconds to each API call.
Similarly, putting your S3 region far from your users might degrade performance if file access is a critical part of your application. That said, if you use S3 for longer-term storage and cache most of your files in a CDN like CloudFront, users might not notice the difference between S3 regions much.
Finally, if you’re a very light user of Amazon S3 or you’re just trying it out, it’s worth noting that AWS offers up to 5 GB of storage for twelve months as part of their Free tier. This might be helpful for early-stage startups or side projects looking for a free way to get started with S3 file storage.
It’s helpful to know about the different storage classes in S3, but in practice, most organizations start by putting everything in the Standard tier. Depending on your usage patterns, you’ll be able to optimize your S3 storage classes after AWS has a few weeks of data to analyze.
In this section, I’ll show you how to use the Amazon S3 storage class analysis to determine which Standard tier files can be moved to a less frequent access tier. This will help you decide whether it’s worth investing in a S3 lifecycle policy and how much you might save by using one. It could also be used to tag objects in S3 and move them to the right tier accordingly.
Running AWS S3 Storage Class Analysis
You can run storage class analysis on an entire bucket or a subset of your S3 files using prefixes or tags. To configure the analysis, open your S3 bucket and click the Metrics tab.
Scroll down and click Create analytics configuration. From here you can decide whether to analyze the whole bucket or just a subset of files. You can also decide whether to export the results or not.
Understanding the Results
AWS will start the storage class analysis, and depending on the size of your bucket, you might be able to see results fairly quickly. Here’s an example of what the results might look like:
The chart and summary boxes in the image show you the total size of the objects in Amazon S3 that you are storing versus the amount that’s been retrieved. These results show you that even many of the older files in this bucket are being retrieved regularly. While this bucket only contains 13.92 GB of files over a year old, 40.84 GB have been retrieved in the past seven days.
On the other hand, the analytics in the following image show a very different usage pattern:
In this bucket, none of the files over thirty days old have been accessed at all in the past seven days. This might indicate that you can move some of these older files to an Infrequent Access tier or even Glacier. You’ll have to be careful, though, as moving files in and out of these longer-term storage patterns incurs a cost.
Once the storage class analysis is complete, AWS will offer some recommendations as well. Here’s an example from another bucket:
This shows you that the rate of retrieval (the solid orange line) drops to almost zero after a file is thirty days old, and that there is a lot of storage being consumed by older, infrequently accessed files in this bucket.
Now that you know you can use a better Amazon S3 storage tier, what do you do with this information? That’s where lifecycle rules come in.
By creating a lifecycle rule, you are instructing Amazon S3 to move files to another storage tier or delete files automatically based on their age. These rules can be applied to all files in your bucket or you can use tags or file prefixes to filter files first.
To get started, go to the Management tab in your S3 bucket and click Create lifecycle rule.
For the second example above where no files older than thirty days are being accessed, you could create a lifecycle rule that:
In this case, your access rule setup would look like this in the Amazon S3 GUI:
These rules can also be created in XML and applied using the S3 API if you would like to automate this process to apply policies to many buckets at once. Be careful when setting up these policies, though. Moving files in and out of Glacier frequently will get very expensive, so a small mistake in your lifecycle policy could add thousands to your AWS bill.
Once you’ve set up the appropriate policies, your S3 files will be automatically moved to the desired storage tier as they age. This is one of the best ways to lower your Amazon S3 bill, but it’s not the only strategy you can take.
Depending on the size of your organization and the way you’ve stored data in Amazon S3, access patterns and lifecycle policies might only get you so far. If you’re taking a deep look at optimizing your S3 cost, here are a few more things you can explore.
Prioritize the Biggest Wins First
If your organization manages a lot of data in Amazon S3, going through the storage class analysis, checking with the appropriate teams, and implementing lifecycle policies for every set of data can be overwhelming.
Start with the areas where you can get the biggest wins first and invest time in them accordingly. If you’re only storing a few gigabytes in a bucket, it might not be worth setting up a lifecycle policy at all.
S3 Bucket Organization
Because lifecycle policies can use file prefixes or tags as a filter, it’s vital that you implement an organization system across your S3 buckets. Common tagging strategies might include:
Having a consistent system for organizing objects in Amazon S3 will ensure that you can write effective lifecycle policies, and it will prevent you from deleting important files.
S3 Usage Reports
If you want a granular look at how each file in your S3 bucket has been created, modified, or accessed, you can download an AWS usage report.
Bulk Retrieval and Storage
If you decide to move data into or out of Glacier, you might want to consider zipping your files first. This will help you save money on your data transfer and make retrieving files that are likely to be needed at the same time easier. The time to compress your data is negligible compared to the five minute access time Glacier imposes.
Partial File Uploads
If you use Amazon S3 for multipart file uploads, be sure to remove unfinished parts from your buckets. Amazon provides a lifecycle policy for this in their documentation, and it’s an easy way to make sure you’re not paying for files you don’t need.
Don’t Forget Your Logs
If you’re storing CloudWatch logs in Amazon S3, one of the easiest ways to decrease your S3 cost is to expire (ie, delete) old log files automatically. You can do this manually or using a lifecycle policy (as mentioned above), but be careful. Many regulations require organizations to retain logs for months or years.
Amazon S3 is one of the most widely used AWS products, and because it has such a wide array of use cases, it can be especially hard to understand its pricing. After you get a handle on the different storage tiers available, you can start to dive into your S3 buckets.
Run the storage class analysis tool to understand your usage, and create a lifecycle policy that properly transfers your S3 files to the appropriate storage class. Keep prioritization and organization in mind, though, because S3 pricing optimization is an ongoing process.
If you’d like help understanding and keeping up with your AWS spending, CloudForecast can help.
Reach out to our CTO, [email protected] to help reduce your cost or schedule a time with us.
Also published on: https://www.cloudforecast.io/blog/aws-s3-pricing-and-optimization-guide/