Apache Iceberg seems to have taken the data world by storm. Initially incubated at Netflix by Ryan Blue, it was eventually transmitted to the Apache Software Foundation where it currently resides. At its core it is an open table format for at-scale analytic data sets (think hundreds of TBs to hundreds of PBs).
It is a multi-engine compatible format. What that means is that Spark, Trino, Flink, Presto, Hive, and Impala can all operate independently and simultaneously on the data set. It supports the lingua franca of data analysis, SQL, as well as key features like full schema evolution, hidden partitioning, time travel, and rollback and data compaction.
This post focuses on how Iceberg and MinIO complement each other and how various analytic frameworks (Spark, Flink, Trino, Dremio, and Snowflake) can leverage the two.