MinIO makes a powerful primary TileDB backend because both are built for performance and scale. MinIO is a single Go binary that can be launched in many different types of cloud and on-prem environments. It's very lightweight, but also feature-packed with things like and , and it provides with various applications. MinIO is the perfect companion for TileDB because of its industry-leading performance and scalability. MinIO is capable of tremendous performance – we’ve benchmarked it at 325 GiB/s (349 GB/s) on GETs and 165 GiB/s (177 GB/s) on PUTs with just 32 nodes of – and is used to build data lakes/lake houses with analytics and AI/ML workloads. replication encryption integrations off-the-shelf NVMe SSDs TileDB is used to store data in a variety of applications, such as Genomics, Geospatial, Biomedical Imaging, Finance, Machine Learning, and more. The power of TileDB stems from the fact that any data can be modeled efficiently as either a dense or a sparse multi-dimensional array, which is the format used internally by most data science tooling. By storing your data and metadata in TileDB arrays, you abstract all the data storage and management pains, while efficiently accessing the data with your favorite programming language or data science tool via our numerous APIs and integrations. Set Up TileDB Let’s dive in and create some test data using TileDB. Install the TileDB module, which should also install the dependency. pip numpy % pip3 install tiledb Collecting tiledb Downloading tiledb-0.25.0-cp311-cp311-macosx_11_0_arm64.whl (10.4 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.4/10.4 MB 2.7 MB/s eta 0:00:00 Collecting packaging Downloading packaging-23.2-py3-none-any.whl (53 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.0/53.0 kB 643.1 kB/s eta 0:00:00 Collecting numpy>=1.23.2 Downloading numpy-1.26.3-cp311-cp311-macosx_11_0_arm64.whl (14.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.0/14.0 MB 2.5 MB/s eta 0:00:00 Installing collected packages: packaging, numpy, tiledb Successfully installed numpy-1.26.3 packaging-23.2 tiledb-0.25.0 Create a test array by running the below Python script, name it . tiledb-demo.py import tiledb import numpy as np import os, shutil # Local path array_local = os.path.expanduser("./tiledb_demo") # Create a simple 1D array tiledb.from_numpy(array_local, np.array([1.0, 2.0, 3.0])) # Read the array with tiledb.open(array_local) as A: print(A[:]) Run the script % python3 tiledb-demo.py [1. 2. 3.] This will create a directory called to store the actual data. tiledb_demo % ls -l tiledb_demo/ total 0 drwxr-xr-x 3 aj staff 96 Jan 31 05:27 __commits drwxr-xr-x 2 aj staff 64 Jan 31 05:27 __fragment_meta drwxr-xr-x 3 aj staff 96 Jan 31 05:27 __fragments drwxr-xr-x 2 aj staff 64 Jan 31 05:27 __labels drwxr-xr-x 2 aj staff 64 Jan 31 05:27 __meta drwxr-xr-x 4 aj staff 128 Jan 31 05:27 __schema You can continue using it as is but it's no bueno if everything is local because if the local disk or node fails then you lose your entire data. Let's do something fun, like reading this same data from a MinIO bucket instead. Migrating Data to MinIO Bucket We’ll start by pulling mc in our docker ecosystem and then using play.min.io to create the bucket. Pull mc docker image % docker pull minio/mc Test with MinIO Play by listing all the buckets % docker run minio/mc ls play [LONG TRUNCATED LIST OF BUCKETS] Create a bucket to move our local TileDB data to, name it . tiledb-demo % docker run minio/mc mb play/tiledb-demo Bucket created successfully `play/tiledb-demo`. Copy the contents of the data directory to the MinIO bucket tiledb_demo tiledb-demo % docker run -v $(pwd)/tiledb_demo:/tiledb_demo minio/mc cp --recursive /tiledb_demo play/tiledb-demo `/tiledb_demo/__commits/__1706696859767_1706696859767_777455531063403b811b2a2bf79d40e7_21.wrt` -> `play/tiledb-demo/tiledb_demo/__commits/__1706696859767_1706696859767_777455531063403b811b2a2bf79d40e7_21.wrt` `/tiledb_demo/__fragments/__1706696859767_1706696859767_777455531063403b811b2a2bf79d40e7_21/a0.tdb` -> `play/tiledb-demo/tiledb_demo/__fragments/__1706696859767_1706696859767_777455531063403b811b2a2bf79d40e7_21/a0.tdb` `/tiledb_demo/__fragments/__1706696859767_1706696859767_777455531063403b811b2a2bf79d40e7_21/__fragment_metadata.tdb` -> `play/tiledb-demo/tiledb_demo/__fragments/__1706696859767_1706696859767_777455531063403b811b2a2bf79d40e7_21/__fragment_metadata.tdb` `/tiledb_demo/__schema/__1706696859758_1706696859758_74e7040e138a4cca93e34aca1c587108` -> `play/tiledb-demo/tiledb_demo/__schema/__1706696859758_1706696859758_74e7040e138a4cca93e34aca1c587108` Total: 3.24 KiB, Transferred: 3.24 KiB, Speed: 1.10 KiB/s List the contents of to make sure the data has been copied. tiledb-demo % docker run minio/mc ls play/tiledb-demo/tiledb_demo [2024-01-15 14:15:57 UTC] 0B __commits/ [2024-01-15 14:15:57 UTC] 0B __fragments/ [2024-01-15 14:15:57 UTC] 0B __schema/ Note: The MinIO Client ( ), or any S3 compatible client, only copies non-empty folders. The reason for this is that in the object storage world the data is organized based on bucket prefixes, so non-empty folders are not needed. In a future blog we’ll dive deeper into how data is organized with prefixes and folders. Hence, you see only these 3 folders and not the rest that we saw in the local folder. mc Now let’s try to read the same data directly from the MinIO bucket using the Python code below, name the file . tiledb-minio-demo.py import tiledb import numpy as np # MinIO keys minio_key = "minioadmin" minio_secret = "minioadmin" # The configuration object with MinIO keys config = tiledb.Config() config["vfs.s3.aws_access_key_id"] = minio_key config["vfs.s3.aws_secret_access_key"] = minio_secret config["vfs.s3.scheme"] = "https" config["vfs.s3.region"] = "" config["vfs.s3.endpoint_override"] = "play.min.io:9000" config["vfs.s3.use_virtual_addressing"] = "false" # Create TileDB config context ctx = tiledb.Ctx(config) # The MinIO bucket URI path of tiledb demo array_minio = "s3://tiledb-demo/tiledb_demo/" with tiledb.open(array_minio, ctx=tiledb.Ctx(config)) as A: print(A[:]) The output should look familiar % python3 tiledb-minio-demo.py [1. 2. 3.] We've read from MinIO, next let's see how we can write the data directly in a MinIO bucket, instead of copying it to MinIO from an existing source. Writing Directly to the MinIO Bucket So far we’ve shown you how to read data that already exists, either in local storage or an existing bucket. But if you wanted to start fresh by writing directly to MinIO from the get-go, how would that work? Let’s take a look. The code to write data directly to the MinIO bucket is the same as above except with two line changes. The path to the MinIO bucket where TileDB data is stored must be updated to (instead of ). tiledb_minio_demo tiledb_demo We’ll use the function, as we did earlier with local storage, to create the array to store in the MinIO bucket. tiledb.from_numpy [TRUNCATED] # The MinIO bucket URI path of tiledb demo array_minio = "s3://tiledb-demo/tiledb_minio_demo/" tiledb.from_numpy(array_minio, np.array([1.0, 2.0, 3.0]), ctx=tiledb.Ctx(config)) [TRUNCATED] After making these 2 changes, run the script and you should see the output below % python3 tiledb-minio-demo.py [1. 2. 3.] If you run the script again it will fail with the below error because it will try to write again. tiledb.cc.TileDBError: [TileDB::StorageManager] Error: Cannot create array; Array 's3://tiledb-demo/tiledb_minio_demo/' already exists Just comment out the following line and you can re-run it multiple times. # tiledb.from_numpy(array_minio, np.array([1.0, 2.0, 3.0]), ctx=tiledb.Ctx(config)) % python3 tiledb-minio-demo.py [1. 2. 3.] % python3 tiledb-minio-demo.py [1. 2. 3.] Check the MinIO Play bucket to make sure the data is in there as expected % docker run minio/mc ls play/tiledb-demo/tiledb_minio_demo/ [2024-01-15 16:45:04 UTC] 0B __commits/ [2024-01-15 16:45:04 UTC] 0B __fragments/ [2024-01-15 16:45:04 UTC] 0B __schema/ There you go, getting data into MinIO is that simple. Did you get the same results as earlier? You should have, but if you didn't there are a few things you can check out. Common Pitfalls We’ll look at some common errors you might encounter while trying to read/write to MinIO. If your access key and secret key are incorrect, you should expect to see an error message like below tiledb.cc.TileDBError: [TileDB::S3] Error: Error while listing with prefix 's3://tiledb-demo/tiledb_minio_demo/__schema/'... The request signature we calculated does not match the signature you provided. Check your key and signing method. Next, you need to ensure the hostname and port are correct, without a proper endpoint these are the errors you would encounter. Incorrect Hostname: tiledb.cc.TileDBError: [TileDB::S3] Error: … Couldn't resolve host name Incorrect Port: tiledb.cc.TileDBError: [TileDB::S3] Error: … Couldn't connect to server Last but not least, one of the most cryptic errors I’ve seen is the following tiledb.cc.TileDBError: [TileDB::S3] Error: … [HTTP Response Code: -1] [Remote IP: 98.44.32.5] : curlCode: 56, Failure when receiving data from the peer After a ton of debugging it turns out that if you are connecting using http but the MinIO server has TLS activated then you will see the above error. Just be sure the connection scheme is set to the right configuration, in this case, config["vfs.s3.scheme"] = "https". Racks on Racks on Racks There is a rap song (you can search for it) where they rap about having stacks on stacks on stacks of * cash. But there is another rap song where they claim they have so many stacks of cash that they can’t be called “stacks” anymore, they are now “racks”. Essentially when your stacks get so big and so high you need racks on racks on racks to store your stacks of cash. cough* This is an apt comparison because your stacks of data mean as much (or more) to you as the stacks of cash they're rapping about. If only there was something like MinIO to keep all your objects – physical or virtual – safe and readily accessible. With MinIO in the mix, you can easily scale TileDB to with relative ease. You also get all the features that make MinIO great like , , , , among others right out of the box. By having all your data in MinIO, you decrease required storage complexity and therefore realize considerable savings on data storage costs, while at the same time running MinIO on commodity hardware provides the best possible performance-to-cost ratio. MinIO supercharges your TileDB engine with industry-leading performance that makes querying a joy. multiple racks across multiple datacenters Security and Access Control Tiering Object Locking and Retention Key Encryption Service (KES) We’ve added the code snippets used in this blog to a . If you have any questions on how to connect MinIO to TileDB or migrate data into MinIO be sure to reach out to us on ! git repository Slack Also appears . here