MinIO runs on anything –  bare metal, Kubernetes, Docker, Linux and more. Organizations choose to run MinIO to host their data on any of these platforms, and increasingly rely on multiple platforms to satisfy multiple requirements. The choice of underlying hardware and OS is based on a number of factors, primarily the amount of data to be stored in MinIO plus requirements for integration with other cloud-native software, performance and security. Many of our customers run MinIO on bare metal, while the majority run on Kubernetes. Running multiple instances of MinIO in a containerized architecture that is orchestrated by Kubernetes is extremely efficient. MinIO customers roll out new regions and update services without disruption, with separate Kubernetes clusters running in each region, and the operational goal of shared-nothing for greatest resiliency and scalability. Customers switch to MinIO for a variety of reasons, including: S3 Compatible API Multi-Cloud Cloud Agnostic Deployments S3 style IAM style ACL management Distributed and Fault tolerant storage using Erasure Coding Tiering and Versioning of objects across multiple clusters Bucket and Site-to-Site Replication Batch Replication via Batch Framework Server side object and client data encryption Transport Layer Network Encryption of Data Due to these diverse reasons and environments where MinIO can be utilized and installed, it's realistic to assume there are a number of data sources where data is already stored that you would want to get into MinIO. In this post, let's review some of the tools available to get data out of S3, local FileSystem, NFS, Azure, GCP, Hitachi Content Platform, Ceph, and others, and into MinIO clusters where it can be exposed to cloud-native AI/ML and analytics packages. MinIO Client To get started, we’ll be using the (mc) during the course of this post for a few of these options. Please be sure to install it and set the alias to your running . MinIO Client MinIO Server mc alias set destminio https://myminio.example.net minioadminuser minioadminpassword We will be adding some more “source” aliases as we go through the different methods. FileSystems The majority of use cases for migrating data into MinIO start with a mounted filesystem or NFS volume.  In this simple configuration, you can use to sync the data from the source to the destination. Think of as a swiss army knife for data synchronization. It takes the burden off of the user to determine the best way to interact with the source from which you are fetching the objects. It supports a number of sources and, based on the source you are pulling from, the right functions are used to enable them. mc mirror mc mirror For example, let's start with a simple FileSystem that is mounted from a physical hard disk, virtual disk, or even something like a GlusterFS mount. As long as it's a file system readable by the OS, MinIO can read it too: filesystem          kbytes   used     avail    capacity  mounted on
/dev/root           6474195  2649052  3825143  41%       /
/dev/stand          24097    5757     18340    24%       /stand
/proc               0        0        0         0%       /proc
/dev/fd             0        0        0         0%       /dev/fd
/dev/_tcp           0        0        0         0%       /dev/_tcp
/dev/dsk/c0b0t0d0s4 10241437 4888422  5353015  48%       /home
/dev/dsk/c0b0t1d0sc 17422492 12267268 5155224  71%       /home2 Let’s assume your objects are in , you would then run the following command to mirror the objects (if the bucket does not already exist, you would have to create it first): /home/mydata mydata mc mirror /home/mydata destminio/mydata This command will ensure that objects that are no longer in the source location are removed from the destination or when new objects get added to the source they will get copied to the destination. But if you want to overwrite existing objects modified in the source, pass the flag. --overwrite NFS Network File Share (NFS) is generally used to store objects or data that are not accessed often because, while ubiquitous, often the protocol is very slow across the network. Nonetheless, a lot of ETL and some legacy systems use NFS as a repository for data to be used for operations, analytics, AI/ML, and additional use cases. It would make better sense for this data to live on MinIO because of the scalability, security and high performance of a MinIO cluster, coupled with MinIO’s ability to provide services to cloud-native applications using the S3 API. Install the required packages to mount the NFS volume apt install nfs-common Be sure to add the directory  to /home /etc/exports /home               
client_ip(rw,sync,no_root_squash,no_subtree_check) Note: Be sure to restart your NFS server, for example on Ubuntu servers systemctl restart nfs-kernel-server Create a directory to mount the NFS mount mkdir -p /nfs/home Mount the NFS volume mount <nfs_host>:/home /nfs/home Copy the data from NFS to MinIO mc mirror /nfs/home destminio/nfsdata There you go, now you can move your large objects from NFS to MinIO. S3 As we mentioned earlier, is a swiss army knife of data synchronization. In addition to filesystems, it also copies objects from S3 or S3 API compatible stores and mirrors it to MinIO. One of the more popular use cases of this is mirroring an Amazon S3 bucket. mc mirror Follow these to create an AWS S3 bucket in your account. If you already have an existing account with data we could use that too. steps Once a bucket has been created or data has been added to an existing bucket, create a new with access key and secret key allowing access only to our bucket. Save the generated credentials for the next step. IAM policy We can work with any S3 compatible storage using the MinIO Client. Next let’s add an alias using the S3 bucket name we created along with the credentials we downloaded mc alias set s3 https://s3.amazonaws.com BKIKJAA5BMMU2RHO6IBB V7f1CwQqAcwo80UEIJEjc5gVQUSSx5ohQ9GSrr12 --api S3v4 Use to copy the data from S3 to MinIO mc mirror mc mirror s3/mybucket destminio/mydata Depending on the amount of data, network speeds and the physical distance from the region where the bucket data is stored, it might take a few minutes or more for you to mirror all the data. You will see a message when mc is done copying all the objects. HDFS For the next set of tools, we write dedicated scripts to satisfy some of the non-standard edge case data migration requirements that we need to fulfill. One of these is migrating from HDFS and Hadoop. Many enterprises have so much data stored in Hadoop that it's impossible to ignore it and start fresh with a cloud-native platform. It is more feasible to transfer that data to something more modern (and cloud-native) like MinIO and run your ETL and other processes that way. It's rather simple to set up. Create a file called with the following contents core-site.xml <configuration>
  <property>
    <name>fs.s3a.path.style.access</name>
    <value>true</value>
  </property>
  <property>
    <name>fs.s3a.endpoint</name>
    <value>https://minio:9000</value>
  </property>
  <property>
    <name>fs.s3a.access.key</name>
    <value>minio-sample</value>
  </property>
  <property>
    <name>fs.s3a.secret.key</name>
    <value>minio-sample123</value>
  </property>
</configuration> Set the following environment variables export HDFS_SOURCE_PATH=hdfs://namenode:8080/user/minio/testdir
export S3_DEST_PATH=s3a://mybucket/testdir Download the following file, chmod +x and run it curl -LSs -o https://github.com/minio/hdfs-to-minio/blob/master/hdfs-to-minio.sh
chmod +x hdfs-to-minio.sh
./hdfs-to-minio.sh If you’ve been storing data in Hadoop for several years, then this process might take several hours. If it's on a production cluster, then we recommend  migrating data in off hours during maintenance windows to minimize the impact of any performance degradation to your Hadoop cluster while data is being mirrored. More details about migrating from HDFS to MinIO are available in this , and we’ve got a blog post as well, . GitHub Repo Migrating from HDFS to Object Storage HCP We previously wrote an amazing blog post on and how to migrate your data to a MinIO cluster. I would recommend reading the blog post for full details but the crux is as follows. Hitachi Content Platform Once you have the necessary HCP cluster and input file configured, and run the following command to start the migration process download the migration tool $ hcp-to-minio migrate --namespace-url https://finance.europe.hcp.example.com
--auth-token "HCP bXl1c2Vy:3f3c6784e97531774380db177774ac8d"
--host-header "s3testbucket.sandbox.hcp.example.com"
--data-dir /mnt/data
--bucket s3testbucket
--input-file /tmp/data/to-migrate.txt Ceph Last but not least, we’ve kept the elephant in the room until the end. Although aging, Ceph is a popular store for data and it has a S3 compatible API. It is used by other Kubernetes projects as the backend for object storage, such as Rook. Ceph, however, is an unwieldy behemoth to set up and run. So it's natural that folks would want to move their data to something simpler, easier to maintain and with greater performance. There are two ways to copy data from Ceph: Bucket Replication: Creates the object but if the object is deleted from the source it will not delete it on the destination. https://min.io/docs/minio/linux/administration/bucket-replication.html Mc mirror: Synchronizes objects and versions, it will even delete objects that do not exist https://min.io/docs/minio/linux/reference/minio-mc/mc-mirror.html Similar to S3, since Ceph has S3 compatible API, you can add a alias to MinIO Client mc alias set ceph http://ceph_host:port cephuser cephpass You can then use to copy the data to your MinIO cluster mc mirror mc mirror ceph/mydata destminio/mydata We suggest that you run the command with the flag to continuously monitor for objects and sync them to MinIO. mc mirror --watch Migrate Your Data to MinIO Today! There are just a few examples to show you how easy it is to migrate your data to MinIO.  It doesn’t matter if you are using older legacy protocols such as NFS or the latest and greatest such as S3, MinIO is here to support you. In this post we went into detail on how to migrate from filesystems and other data stores such as NFS, filesystem, GlusterFS, HDFS, HCP, and last but not least Ceph. Regardless of the tech stack running against it, backend MinIO provides a performant, durable, secure, and scalable yet simple software-defined object storage solution. If you have any questions feel free to reach out to us on ! Slack Also published . here

The code in this story is for educational purposes. The readers are solely responsible for whatever they build with it.

Walkthroughs, tutorials, guides, and tips. This story will teach you how to do something new or how to do something better.

The Data Migration Tools to Help You Get Into MinIO

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

A Closer Look Into the MinIO Enterprise Object Store Firewall

$DEFI Token Hits 7 Major Exchanges: A Milestone Achievement

$JTC Network To List On BitMart Exchange

$500k Presale: TG.Casino Passes Milestone with Upcoming Telegram-Powered Platform

$3 Million in Seed Funding for Web3 Founders Announced By Necto Labs

$2M Backing and a Vision: How GAM3S.GG is Reshaping Web3 Gaming

A Closer Look Into the MinIO Enterprise Object Store Firewall

$DEFI Token Hits 7 Major Exchanges: A Milestone Achievement

$JTC Network To List On BitMart Exchange

$500k Presale: TG.Casino Passes Milestone with Upcoming Telegram-Powered Platform

$3 Million in Seed Funding for Web3 Founders Announced By Necto Labs

$2M Backing and a Vision: How GAM3S.GG is Reshaping Web3 Gaming

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps