The response to our previous post,
To repatriate data from AWS S3, you will follow these general guidelines:
Review Data Requirements: Determine the specific buckets and objects that need to be repatriated from AWS S3. Make sure you understand business needs and compliance requirements on a bucket-by-bucket basis.
Identify Repatriation Destination: You’ve already decided to repatriate to MinIO, now you can choose to run MinIO in an on-premises data center or at another cloud provider or colocation facility. Using the requirements from #1, you will select hardware or instances for forecasted storage, transfer and availability needs.
Data Transfer: Plan and execute the transfer of data from AWS S3 to MinIO. Simply use MinIO's built-in Batch Replication or mirror using the MinIO Client (see
Data Access and Permissions: Ensure that appropriate access controls and permissions are set up for the repatriated data on a per-bucket basis. This includes IAM and bucket policies for managing user access, authentication, and authorization to ensure the security of the data.
Object Locks: It is critical to preserve the object lock retention and legal hold policies after the migration. The target object store has to interpret the rules in the same way as Amazon S3. If you are unsure, ask for the
Data Lifecycle Management: Define and implement a data lifecycle management strategy for the repatriated data. This includes defining retention policies, backup and recovery procedures, and data archiving practices on a per-bucket basis.
Data Validation: Validate the transferred data to ensure its integrity and completeness. Perform necessary checks and tests to ensure that the data has been successfully transferred without any corruption or loss. After the transfer, the object name, ETag and metadata, checksum and the number of objects all match between the source and destination.
Update Applications and Workflows: The good news is that if you follow cloud-native principles to build your applications, then all you will have to do is reconfigure them for the new MinIO endpoint. However, if your applications and workflows were designed to work with the AWS ecosystem, make the necessary updates to accommodate the repatriated data. This may involve updating configurations, reconfiguring integrations or in some cases modifying code.
Monitor and Optimize: Continuously monitor and optimize the repatriated data environment to ensure optimal performance, cost-efficiency, and adherence to data management best practices.
There are many factors to consider when budgeting and planning for cloud repatriation. Fortunately, our engineers have done this with many customers and we’ve developed a detailed plan for you. We have customers that have repatriated everything from a handful of workloads to hundreds of petabytes.
The biggest planning task is to think through choices around networking, leased bandwidth, server hardware, archiving costs for the data not selected to be repatriated, and the human cost of managing and maintaining your own cloud infrastructure. Estimate these costs and plan for them. Cloud repatriation costs will include data egress fees for moving the data from the cloud back to the data center. These fees are intentionally high enough to compel cloud lock-in. Take note of these high egress fees - they substantiate the economic argument to leave the public cloud because, as the amount of data you manage grows, the egress fees increase. Therefore, if you’re going to repatriate, it pays to take action sooner rather than later.
We’re going to focus on data and metadata that must be moved – this is eighty percent of the work required to repatriate. Metadata includes bucket properties and policies (access management based on access/secret key, lifecycle management, encryption, anonymous public access, object locking and versioning).
Let’s focus on data (objects) for now. For each namespace you want to migrate, take inventory of the buckets and objects you want to move. It is likely that your DevOps team already knows which buckets hold important current data. You can also use
Namespace |
Total Buckets |
Total Object Count |
Total Object Size (GB) |
Daily Total Upload (TB) |
Daily Total Download (TB) |
---|---|---|---|---|---|
ns-001 |
166 |
47,751,258 |
980,014.48 |
50.04 |
14.80 |
ns-002 |
44 |
24,320,810 |
615,033.35 |
23.84 |
675.81 |
ns-002 |
648 |
88,207,041 |
601,298.91 |
328.25 |
620.93 |
ns-001 |
240 |
68,394,231 |
128,042.16 |
62.48 |
12.45 |
The next step is to list, by namespace, each bucket and its properties for every bucket you’re going to migrate. Note the application(s) that store and read data in that bucket. Based on usage, classify each bucket as hot, warm or cold tier data.
In an abridged version, this will look something like
Bucket Name |
Properties |
App(s) |
Hot/Warm/Cold Tier |
---|---|---|---|
A |
Copy and paste JSON here |
Spark, Iceberg, Dremio |
Hot |
B |
Copy and paste JSON here |
Elastic |
Warm |
C |
Copy and paste JSON here |
Elastic (snapshots) |
Cold |
You have some decisions to make about data lifecycle management at this point and pay close attention because here’s a great way to save money on AWS fees. Categorize objects in each bucket as hot, warm or cold based on how frequently they are accessed. A great place to save money is to migrate cold tier buckets directly to S3 Glacier – there’s no reason to incur egress fees to download just to upload again.
Depending on the amount of data you’re repatriating, you have a few options to choose how to migrate. We recommend that you load and work with new data on the new MinIO cluster while copying hot and warm data to the new cluster over time. The amount of time and bandwidth needed to copy objects will, of course, depend on the number and size of the objects you’re copying.
Here’s where it will be very helpful to calculate the total data that you’re going to repatriate from AWS S3. Look at your inventory and total the size of all the buckets that are classified as hot and warm.
Total Hot and Warm Tier Data = 1,534,096.7 GB |
---|
Available bandwidth = 10 Gbps |
Minimum Transfer Time required (total object size / available bandwidth) = 14.2 days |
Calculate data egress fees based on the above total. I’m using
Total Data Tiered to S3 Glacier = 767,048.337 GB |
---|
S3 to S3 Glacier transfer fees ($0.05/1000 objects) = $3,773.11 |
S3 Glacier Deep Archive monthly storage fee = $760 |
Don’t forget to budget for S3 Glacier Deep Archive usage moving forward.
Total Data to be Transferred = 1,534,096.7 GB |
---|
First 10 TB at $0.09/GB = $900 |
Next 40 TB at $0.085/GB = $3,400 |
Next 100 TB at $0.07/GB = $70,000 |
Additional over 150 TB at $0.05/GB = $69,205 |
Total Egress Fees = $143,504 |
For the sake of simplicity, the above calculation includes neither the fee for per object operations ($0.40/1m) nor the cost of LISTing ($5/1m). For very large repatriation projects, we can also compress objects before sending them across the network, saving you some of the cost of egress fees.
Another option is to use AWS Snowball to transfer objects. Snowball devices are each 80TB, so we know up front that we need 20 of them for our repatriation effort. The per-device fee includes 10 days of use, plus 2 days for shipping. Additional days are available for $30/device.
20 Snowball Devices Service Fee ($300 ea) = $6,000 |
---|
R/T shipping (3-5 days at $400/device) = $8,000 |
S3 data out ($0.02/GB) = $30,682 |
Total Snowball Fees = $38,981.93 |
AWS will charge you standard request, storage, and data transfer rates to read from and write to AWS services including
Now we know how long it will take to migrate this massive amount of data and the cost. Make a business decision as to which method meets your needs based on the combination of timing and fees.
At this point, we also know the requirements for the hardware needed to run MinIO on-prem or at a colocation facility. Take the requirement above for 1.5PB of storage, estimate data growth, and consult our
The first step is to recreate your S3 buckets in MinIO. You’re going to have to do this regardless of how you choose to migrate objects. While both S3 and MinIO store objects using server-side encryption, you don’t have to worry about migrating encryption keys. You can connect to your KMS of choice using
You have multiple options to copy objects: Batch Replication and mc mirror
. My previous blog post,
Typically, customers use tools we wrote combined with AWS Snowball or TD SYNNEX’s data migration hardware and services to move larger amounts of data (over 1 PB).
MinIO recently partnered with Western Digital and TD SYNNEX to field a Snowball alternative. Customers can schedule windows to take delivery of the Western Digital hardware and pay for what they need during the rental period. More importantly, the service is not tied to a specific cloud - meaning the business can use the service to move data into, out of, and across clouds - all using the ubiquitous S3 protocol. Additional details on the service can be found on the
Bucket metadata, including policies and bucket properties, can be read using get-bucket
Pay particular attention to
Pay particular attention to Data Lifecycle Management, such as object retention, object locking and archive/tiering. Run a get-bucket-lifecycle-configuration
on each bucket to obtain a human-readable JSON list of lifecycle rules. You can easily recreate AWS S3 settings using MinIO Console or MinIO Client (mc). Use commands such as get-object-legal-hold
and get-object-lock-configuration
to pinpoint objects that require special security and governance treatment.
While we’re on the subject of lifecycle, let’s talk about backup and disaster recovery for a moment. Do you want an additional MinIO cluster to replicate to, for backup and disaster recovery?
After objects are copied from AWS S3 to MinIO, it’s important to validate data integrity. The easiest way to do this is to use the MinIO Client to run mc diff
against old buckets in S3 and new buckets on MinIO. This will compute the difference between the buckets and return a list of only those objects that are missing or different. This command takes the arguments of the source and target buckets. For your convenience, you may want to create
mc diff s3/bucket1 minio/bucket1
The great news is that all you have to do is point existing apps at the new MinIO endpoint. Configurations can be rewritten app by app over a period of time. Migrating data in object storage is less disruptive than a filesystem, just change the URL to read/write from a new cluster. Note that if you previously relied on AWS services to support your applications, those won’t be present in your data center, so you’ll have to replace them with their open-source equivalent and rewrite some code. For example, Athena can be replaced with Spark SQL, Apache Hive and Presto, Kinesis with Apache Kafka, and AWS Glue with Apache Airflow.
If your S3 migration is part of a larger effort to move an entire application on-prem, then chances are you used
Now that you’ve completed repatriating, it’s time to turn your attention to storage operation, monitoring and optimization. The good news is that no optimization is needed for MinIO – we’ve built optimization right into the software so you know you’re getting the best performance for your hardware. You’ll want to start monitoring your new MinIO cluster to assess resource utilization and performance on an ongoing basis. MinIO exposes
With
It’s far from a secret that the days of writing blank checks to cloud providers are gone. Many businesses are currently evaluating their cloud spend to find potential savings. Now you have everything you need to start your migration from AWS S3 to MinIO, including concrete technical steps and a financial framework.
If you get excited about the prospect of repatriation cost savings, then please reach out to us at
Also appears here.