The MinIO DataPod: A Reference Architecture for Exascale Computing

The modern enterprise defines itself by its data. This requires a data infrastructure for AI/ML as well as a data infrastructure that is the foundation for a Modern Datalake capable of supporting business intelligence, data analytics, and data science. This is true if they are behind, getting started or using AI for advanced insights. For the foreseeable future, this will be the way that enterprises are perceived. There are multiple dimensions or stages to the larger problem of how AI goes to market in the enterprise. Those include data ingestion, transformation, training, inferencing, production, and archiving, with data shared across each stage. As these workloads scale the complexity of the underlying AI data infrastructure increases. This creates the need for high performance infrastructure while minimizing total cost of ownership (TCO). MinIO has created a comprehensive blueprint for data infrastructure to support exascale AI and other large scale data lake workloads. It is called the MinIO DataPod. The unit of measurement it uses is 100 PiB. Why? Because the reality is that this is common today in the enterprise. Here are some quick examples: A North American automobile manufacturer with nearly an exabyte of car video A German automobile manufacturer with more than 50 PB of car telemetry A biotech firm with more than 50 PB of biological, chemical, & patient-centric data A cybersecurity company with more than 500 PB of log files A media streaming company with more than 200 PB of video A defense contractor with more than 80 PB of geospatial, log and telemetry data from aircraft Even if they are not at 100 PB today, they will be within a few quarters. The average firm is growing at 42% a year, data-centric firms are growing at twice that rate, if not more. The MinIO Datapod reference architecture can be stacked in different ways to achieve almost any scale - indeed we have customers that have built off of this blueprint - all the way past an exabyte and with multiple hardware vendors. The MinIO DataPod offers an end-to-end architecture that enables infrastructure administrators to deploy cost-efficient solutions for a variety of AI and ML workloads.Here is the rationale for our architecture. AI Requires Disaggregated Storage and Compute AI workloads, especially generative AI, inherently require GPUs for compute. They are spectacular devices with incredible throughput, memory bandwidth and parallel processing capabilities. Keeping up with GPUs that are getting faster and faster requires high-speed storage. This is especially true when training data cannot fit into memory and training loops have to make more calls to storage. Furthermore, enterprises require more than performance, they also need security, replication, and resiliency. The enterprise storage requirement demands that the architecture fully disaggregate storage from compute. This allows for storage to scale independently of the compute and given that storage growth is generally one or more orders of magnitude more than compute growth, this approach ensures the best economics through superior capacity utilization. AI Workloads Demand a Different Class of Networking Networking infrastructure has standardized on 100 Gigabits per second (Gbps) bandwidth links for AI workload deployments. Modern day NVMe drives provide 7GBps throughput on average making the network bandwidth between the storage servers and the GPU compute servers the bottleneck for AI pipeline execution performance. Solving this problem with complex networking solutions like Infiniband (IB) has real limitations. We recommend that enterprises leverage existing, industry-standard Ethernet-based solutions (e.g., HTTP over TCP) that work out of the box to deliver data at high throughput for GPUs for the following reasons: Much larger and open ecosystem Reduced network infrastructure cost High interconnect speeds (800 GbE and beyond) with RDMA over ethernet support (i.e.: RoCEv2) Reuse existing expertise and tools in deploying, managing, and observing ethernet Innovation around GPUs to storage server communication is happening on ethernet based solutions The Requirements of AI Demand Object Storage It is not a coincidence that AI data infrastructure in public clouds are all built on top of object stores. Nor is it a coincidence that every major foundational model was trained on an object store. This is a function of the fact that POSIX is too chatty to work at the data scale required by AI - despite what the chorus of legacy filers will claim. The same architecture that delivers AI in the public cloud should be applied to the private cloud and obviously the hybrid cloud. Object stores excel at handling various data formats and large volumes of unstructured data and can effortlessly scale to accommodate growing data without compromising performance. Their flat namespace and metadata capabilities enable efficient data management and processing that is crucial for AI tasks requiring fast access to large datasets. As high-speed GPUs evolve and network bandwidth standardizes at 200/400/800 Gbps and beyond, modern object stores will be the only solution that meets the performance SLAs and scale of AI workloads. Software Defined Everything We know that GPUs are the star of the show and that they are hardware. But even Nvidia will tell you the secret sauce is CUDA. Move outside the chip, however, and the infrastructure world is increasingly software-defined. Nowhere is this more true than storage.Software-defined storage solutions are essential for scalability, flexibility, and cloud integration, surpassing traditional appliance-based models for the following reasons: Cloud Compatibility: Software-defined storage aligns with cloud operations, unlike appliances that cannot run across multiple clouds. Containerization: Appliances cannot be containerized, losing cloud-native advantages and preventing Kubernetes orchestration. Hardware Flexibility: Software-defined storage supports a wide range of hardware, from edge to core, accommodating diverse IT environments. Adaptive Performance: Software-defined storage offers unmatched flexibility, efficiently managing different capacities and performance needs across various chipsets. At exabyte scale, simplicity and a cloud-based operating model are crucial. Object storage, as a software-defined solution, should work seamlessly on commodity off-the-shelf (COTS) hardware and any compute platform, be it bare metal, virtual machines, or containers. Custom-built hardware appliances for object storage often compensate for poorly designed software with costly hardware and complex solutions, resulting in a high total cost of ownership (TCO). MinIO DataPOD Hardware Specification for AI: Enterprise customers using MinIO for AI initiatives build exabyte scale data infrastructure as repeatable units of 100PiB. This helps infrastructure administrators ease the process of deployment, maintenance and scaling as the AI data grows exponentially over a period of time. Below is the bill of materials (BOM) for building a 100PiB scale data infrastructure. Cluster Specification Component Quantity Total number of Racks 30 Total number of Storage Servers 330 Total number of storage servers per rack 11 Total number of TOR switches 60 Total number of Spine switches 10 Erasure Code Stripe Size 10 Erasure Code Parity 4 Single Rack Specification Component Description Quantity Rack Enclosure 42U/45U slot Rack 1 Storage Server 2U form factor 11 Top Of the Rack Switches Layer 2 switch 2 Management Switch Combined Layer 2 and Layer 3 1 Network Cables AOC cables 30-40 Power Dual power supply with RPDU 17kW - 20kW Storage Server Specification Component Specification Server 2U, single socket CPU 64 core, 128 * PCIe 4.0 lanes Memory 256 GB Network Dual port, 200gbe NIC Drive Bays 24 hot-swap 2.5" U.2 NVMe Drives 30TB * 24 NVMe Power 1600W Redundant Power Supplies Total Raw Capacity 720 TB Storage Server Reference Dell: PowerEdge R7615 Rack Server HPE: HPE ProLiant DL345 Gen11 Supermicro: A+ Server 2114S-WN24RT Network Switch Specification Component Specification Top of the Rack (TOR) Switch 32 * 100GbE QSFP 28 Ports Spine Switch 64 * 100GbE QSFP 28 Ports Cable 100G QSFP 28 AOC Power 500 Watts per switch Price MinIO has validated this architecture with multiple customers and would expect others to see the following average price per terabyte per month. This is an average street price and the actual price may vary depending on the configuration and the hardware vendor relationship. Scale Storage Hardware price**(Per TB/month)** MinIO Software Price**(Per TB/month)** 100PiB $1.50 $3.54 Vendor specific turnkey hardware appliances for AI will result in high TCO and is not scalable from an unit economics standpoint for large data AI initiatives at exabyte scale. Conclusion Data Infrastructure setup at exabyte scale while meeting the TCO objectives for all AI/ML workloads can be complex and hard to get right. MinIO’s DataPOD infrastructure blueprint makes it simple and straightforward for Infrastructure administrators to set up the required commodity off the shelf hardware with highly scalable, performant cost effective S3 compatible MinIO enterprise object store resulting in improved overall time-to-market and faster time to value from AI initiatives across organizations within the enterprise landscape. The modern enterprise defines itself by its data. This requires a data infrastructure for AI/ML as well as a data infrastructure that is the foundation for a Modern Datalake capable of supporting business intelligence, data analytics, and data science. This is true if they are behind, getting started or using AI for advanced insights. For the foreseeable future, this will be the way that enterprises are perceived. There are multiple dimensions or stages to the larger problem of how AI goes to market in the enterprise. Those include data ingestion, transformation, training, inferencing, production, and archiving, with data shared across each stage. As these workloads scale the complexity of the underlying AI data infrastructure increases. This creates the need for high performance infrastructure while minimizing total cost of ownership (TCO). MinIO has created a comprehensive blueprint for data infrastructure to support exascale AI and other large scale data lake workloads. It is called the MinIO DataPod. The unit of measurement it uses is 100 PiB. Why? Because the reality is that this is common today in the enterprise. Here are some quick examples: A North American automobile manufacturer with nearly an exabyte of car video A German automobile manufacturer with more than 50 PB of car telemetry A biotech firm with more than 50 PB of biological, chemical, & patient-centric data A cybersecurity company with more than 500 PB of log files A media streaming company with more than 200 PB of video A defense contractor with more than 80 PB of geospatial, log and telemetry data from aircraft A North American automobile manufacturer with nearly an exabyte of car video A North American automobile manufacturer with nearly an exabyte of car video A German automobile manufacturer with more than 50 PB of car telemetry A German automobile manufacturer with more than 50 PB of car telemetry A biotech firm with more than 50 PB of biological, chemical, & patient-centric data A biotech firm with more than 50 PB of biological, chemical, & patient-centric data A cybersecurity company with more than 500 PB of log files A cybersecurity company with more than 500 PB of log files A media streaming company with more than 200 PB of video A media streaming company with more than 200 PB of video A defense contractor with more than 80 PB of geospatial, log and telemetry data from aircraft A defense contractor with more than 80 PB of geospatial, log and telemetry data from aircraft Even if they are not at 100 PB today, they will be within a few quarters. The average firm is growing at 42% a year, data-centric firms are growing at twice that rate, if not more. The MinIO Datapod reference architecture can be stacked in different ways to achieve almost any scale - indeed we have customers that have built off of this blueprint - all the way past an exabyte and with multiple hardware vendors. The MinIO DataPod offers an end-to-end architecture that enables infrastructure administrators to deploy cost-efficient solutions for a variety of AI and ML workloads.Here is the rationale for our architecture. AI Requires Disaggregated Storage and Compute AI workloads, especially generative AI, inherently require GPUs for compute. They are spectacular devices with incredible throughput, memory bandwidth and parallel processing capabilities. Keeping up with GPUs that are getting faster and faster requires high-speed storage. This is especially true when training data cannot fit into memory and training loops have to make more calls to storage. Furthermore, enterprises require more than performance, they also need security, replication, and resiliency. The enterprise storage requirement demands that the architecture fully disaggregate storage from compute. This allows for storage to scale independently of the compute and given that storage growth is generally one or more orders of magnitude more than compute growth, this approach ensures the best economics through superior capacity utilization. AI Workloads Demand a Different Class of Networking Networking infrastructure has standardized on 100 Gigabits per second (Gbps) bandwidth links for AI workload deployments. Modern day NVMe drives provide 7GBps throughput on average making the network bandwidth between the storage servers and the GPU compute servers the bottleneck for AI pipeline execution performance. Solving this problem with complex networking solutions like Infiniband (IB) has real limitations. We recommend that enterprises leverage existing, industry-standard Ethernet-based solutions (e.g., HTTP over TCP) that work out of the box to deliver data at high throughput for GPUs for the following reasons: Much larger and open ecosystem Reduced network infrastructure cost High interconnect speeds (800 GbE and beyond) with RDMA over ethernet support (i.e.: RoCEv2) Reuse existing expertise and tools in deploying, managing, and observing ethernet Innovation around GPUs to storage server communication is happening on ethernet based solutions Much larger and open ecosystem Reduced network infrastructure cost High interconnect speeds (800 GbE and beyond) with RDMA over ethernet support (i.e.: RoCEv2) Reuse existing expertise and tools in deploying, managing, and observing ethernet Innovation around GPUs to storage server communication is happening on ethernet based solutions The Requirements of AI Demand Object Storage It is not a coincidence that AI data infrastructure in public clouds are all built on top of object stores. Nor is it a coincidence that every major foundational model was trained on an object store. This is a function of the fact that POSIX is too chatty to work at the data scale required by AI - despite what the chorus of legacy filers will claim. The same architecture that delivers AI in the public cloud should be applied to the private cloud and obviously the hybrid cloud. Object stores excel at handling various data formats and large volumes of unstructured data and can effortlessly scale to accommodate growing data without compromising performance. Their flat namespace and metadata capabilities enable efficient data management and processing that is crucial for AI tasks requiring fast access to large datasets. As high-speed GPUs evolve and network bandwidth standardizes at 200/400/800 Gbps and beyond, modern object stores will be the only solution that meets the performance SLAs and scale of AI workloads. Software Defined Everything Software Defined Everything We know that GPUs are the star of the show and that they are hardware. But even Nvidia will tell you the secret sauce is CUDA. Move outside the chip, however, and the infrastructure world is increasingly software-defined. Nowhere is this more true than storage.Software-defined storage solutions are essential for scalability, flexibility, and cloud integration, surpassing traditional appliance-based models for the following reasons: Cloud Compatibility: Software-defined storage aligns with cloud operations, unlike appliances that cannot run across multiple clouds. Containerization: Appliances cannot be containerized, losing cloud-native advantages and preventing Kubernetes orchestration. Hardware Flexibility: Software-defined storage supports a wide range of hardware, from edge to core, accommodating diverse IT environments. Adaptive Performance: Software-defined storage offers unmatched flexibility, efficiently managing different capacities and performance needs across various chipsets. Cloud Compatibility: Software-defined storage aligns with cloud operations, unlike appliances that cannot run across multiple clouds. Cloud Compatibility : Software-defined storage aligns with cloud operations, unlike appliances that cannot run across multiple clouds. Cloud Compatibility Containerization: Appliances cannot be containerized, losing cloud-native advantages and preventing Kubernetes orchestration. Containerization : Appliances cannot be containerized, losing cloud-native advantages and preventing Kubernetes orchestration. Containerization Hardware Flexibility: Software-defined storage supports a wide range of hardware, from edge to core, accommodating diverse IT environments. Hardware Flexibility : Software-defined storage supports a wide range of hardware, from edge to core, accommodating diverse IT environments. Hardware Flexibility Adaptive Performance: Software-defined storage offers unmatched flexibility, efficiently managing different capacities and performance needs across various chipsets. Adaptive Performance : Software-defined storage offers unmatched flexibility, efficiently managing different capacities and performance needs across various chipsets. Adaptive Performance At exabyte scale, simplicity and a cloud-based operating model are crucial. Object storage, as a software-defined solution, should work seamlessly on commodity off-the-shelf (COTS) hardware and any compute platform, be it bare metal, virtual machines, or containers. Custom-built hardware appliances for object storage often compensate for poorly designed software with costly hardware and complex solutions, resulting in a high total cost of ownership (TCO). MinIO DataPOD Hardware Specification for AI: Enterprise customers using MinIO for AI initiatives build exabyte scale data infrastructure as repeatable units of 100PiB. This helps infrastructure administrators ease the process of deployment, maintenance and scaling as the AI data grows exponentially over a period of time. Below is the bill of materials (BOM) for building a 100PiB scale data infrastructure. Cluster Specification Cluster Specification Component Quantity Total number of Racks 30 Total number of Storage Servers 330 Total number of storage servers per rack 11 Total number of TOR switches 60 Total number of Spine switches 10 Erasure Code Stripe Size 10 Erasure Code Parity 4 Component Quantity Total number of Racks 30 Total number of Storage Servers 330 Total number of storage servers per rack 11 Total number of TOR switches 60 Total number of Spine switches 10 Erasure Code Stripe Size 10 Erasure Code Parity 4 Component Quantity Component Component Quantity Quantity Total number of Racks 30 Total number of Racks Total number of Racks Total number of Racks 30 30 30 Total number of Storage Servers 330 Total number of Storage Servers Total number of Storage Servers Total number of Storage Servers 330 330 330 Total number of storage servers per rack 11 Total number of storage servers per rack Total number of storage servers per rack 11 11 Total number of TOR switches 60 Total number of TOR switches Total number of TOR switches 60 60 Total number of Spine switches 10 Total number of Spine switches Total number of Spine switches 10 10 Erasure Code Stripe Size 10 Erasure Code Stripe Size Erasure Code Stripe Size 10 10 Erasure Code Parity 4 Erasure Code Parity Erasure Code Parity 4 4 Single Rack Specification Single Rack Specification Component Description Quantity Rack Enclosure 42U/45U slot Rack 1 Storage Server 2U form factor 11 Top Of the Rack Switches Layer 2 switch 2 Management Switch Combined Layer 2 and Layer 3 1 Network Cables AOC cables 30-40 Power Dual power supply with RPDU 17kW - 20kW Component Description Quantity Rack Enclosure 42U/45U slot Rack 1 Storage Server 2U form factor 11 Top Of the Rack Switches Layer 2 switch 2 Management Switch Combined Layer 2 and Layer 3 1 Network Cables AOC cables 30-40 Power Dual power supply with RPDU 17kW - 20kW Component Description Quantity Component Component Component Description Description Description Quantity Quantity Quantity Rack Enclosure 42U/45U slot Rack 1 Rack Enclosure Rack Enclosure 42U/45U slot Rack 42U/45U slot Rack 1 1 Storage Server 2U form factor 11 Storage Server Storage Server 2U form factor 2U form factor 11 11 Top Of the Rack Switches Layer 2 switch 2 Top Of the Rack Switches Top Of the Rack Switches Layer 2 switch Layer 2 switch 2 2 Management Switch Combined Layer 2 and Layer 3 1 Management Switch Management Switch Combined Layer 2 and Layer 3 Combined Layer 2 and Layer 3 1 1 Network Cables AOC cables 30-40 Network Cables Network Cables AOC cables AOC cables 30-40 30-40 Power Dual power supply with RPDU 17kW - 20kW Power Power Dual power supply with RPDU Dual power supply with RPDU 17kW - 20kW 17kW - 20kW Storage Server Specification Storage Server Specification Component Specification Server 2U, single socket CPU 64 core, 128 * PCIe 4.0 lanes Memory 256 GB Network Dual port, 200gbe NIC Drive Bays 24 hot-swap 2.5" U.2 NVMe Drives 30TB * 24 NVMe Power 1600W Redundant Power Supplies Total Raw Capacity 720 TB Component Specification Server 2U, single socket CPU 64 core, 128 * PCIe 4.0 lanes Memory 256 GB Network Dual port, 200gbe NIC Drive Bays 24 hot-swap 2.5" U.2 NVMe Drives 30TB * 24 NVMe Power 1600W Redundant Power Supplies Total Raw Capacity 720 TB Component Specification Component Component Component Specification Specification Specification Server 2U, single socket Server Server 2U, single socket 2U, single socket CPU 64 core, 128 * PCIe 4.0 lanes CPU CPU 64 core, 128 * PCIe 4.0 lanes 64 core, 128 * PCIe 4.0 lanes Memory 256 GB Memory Memory 256 GB 256 GB Network Dual port, 200gbe NIC Network Network Dual port, 200gbe NIC Dual port, 200gbe NIC Drive Bays 24 hot-swap 2.5" U.2 NVMe Drive Bays Drive Bays 24 hot-swap 2.5" U.2 NVMe 24 hot-swap 2.5" U.2 NVMe Drives 30TB * 24 NVMe Drives Drives 30TB * 24 NVMe 30TB * 24 NVMe Power 1600W Redundant Power Supplies Power Power 1600W Redundant Power Supplies 1600W Redundant Power Supplies Total Raw Capacity 720 TB Total Raw Capacity Total Raw Capacity 720 TB 720 TB Storage Server Reference Storage Server Reference Dell : PowerEdge R7615 Rack Server Dell PowerEdge R7615 Rack Server HPE : HPE ProLiant DL345 Gen11 HPE HPE ProLiant DL345 Gen11 Supermicro : A+ Server 2114S-WN24RT Supermicro A+ Server 2114S-WN24RT Network Switch Specification Network Switch Specification Component Specification Top of the Rack (TOR) Switch 32 * 100GbE QSFP 28 Ports Spine Switch 64 * 100GbE QSFP 28 Ports Cable 100G QSFP 28 AOC Power 500 Watts per switch Component Specification Top of the Rack (TOR) Switch 32 * 100GbE QSFP 28 Ports Spine Switch 64 * 100GbE QSFP 28 Ports Cable 100G QSFP 28 AOC Power 500 Watts per switch Component Specification Component Component Component Specification Specification Specification Top of the Rack (TOR) Switch 32 * 100GbE QSFP 28 Ports Top of the Rack (TOR) Switch Top of the Rack (TOR) Switch 32 * 100GbE QSFP 28 Ports 32 * 100GbE QSFP 28 Ports Spine Switch 64 * 100GbE QSFP 28 Ports Spine Switch Spine Switch 64 * 100GbE QSFP 28 Ports 64 * 100GbE QSFP 28 Ports Cable 100G QSFP 28 AOC Cable Cable 100G QSFP 28 AOC 100G QSFP 28 AOC Power 500 Watts per switch Power Power 500 Watts per switch 500 Watts per switch Price Price MinIO has validated this architecture with multiple customers and would expect others to see the following average price per terabyte per month. This is an average street price and the actual price may vary depending on the configuration and the hardware vendor relationship. Scale Storage Hardware price**(Per TB/month)** MinIO Software Price**(Per TB/month)** 100PiB $1.50 $3.54 Scale Storage Hardware price**(Per TB/month)** MinIO Software Price**(Per TB/month)** 100PiB $1.50 $3.54 Scale Storage Hardware price**(Per TB/month)** MinIO Software Price**(Per TB/month)** Scale Scale Scale Storage Hardware price**(Per TB/month)** Storage Hardware price **(Per TB/month)** Storage Hardware price MinIO Software Price**(Per TB/month)** MinIO Software Price **(Per TB/month)** MinIO Software Price 100PiB $1.50 $3.54 100PiB 100PiB $1.50 $1.50 $3.54 $3.54 Vendor specific turnkey hardware appliances for AI will result in high TCO and is not scalable from an unit economics standpoint for large data AI initiatives at exabyte scale. Conclusion Conclusion Data Infrastructure setup at exabyte scale while meeting the TCO objectives for all AI/ML workloads can be complex and hard to get right. MinIO’s DataPOD infrastructure blueprint makes it simple and straightforward for Infrastructure administrators to set up the required commodity off the shelf hardware with highly scalable, performant cost effective S3 compatible MinIO enterprise object store resulting in improved overall time-to-market and faster time to value from AI initiatives across organizations within the enterprise landscape.