paint-brush
The Ultimate Directory of Apache Iceberg Resourcesby@alexmerced
311 reads
311 reads

The Ultimate Directory of Apache Iceberg Resources

by Alex MercedOctober 8th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This article is a comprehensive directory of Apache Iceberg resources, including educational materials, tutorials, and hands-on exercises. Whether you're a beginner or an experienced data engineer, this guide will help you navigate the world of Apache Iceberg and its applications.
featured image - The Ultimate Directory of Apache Iceberg Resources
Alex Merced HackerNoon profile picture

This article is a comprehensive directory of Apache Iceberg resources, including educational materials, tutorials, and hands-on exercises. Whether you're a beginner or an experienced data engineer, this guide will help you navigate the world of Apache Iceberg and its applications.

Apache Iceberg?

What is Apache Iceberg?

Apache Iceberg is open-source data lakehouse table format. That means it is a standard for how metadata defining a group of files as a table is stored. This metadata enables the files to be read and written to in the same way as a table in a data warehouses by any tool that supports the standard with the same features and ACID guarantees.

Why Does it Matter?

  • By operating off tables in a seperate storage layer, you can use all your favorite analytical tools on a single copy of your data.
  • Reduing the number of copies needed can reduce your compute costs, storage costs and network costs of your overall data platform.
  • By storing your data in a standard format, it reduces future migration costs when changing tooling or adopting new tools.

Who does Apache Iceberg benefit?

  • Data Engineers since it means less data movement so less data pipelines to manage.
  • Data Analysts since it means they can have more immediate access to data since it requires fewer data movements to make available especially when paired with data virtualization available in tools like Dremio which allows for Lakehouse Querying and Federated Querying (Virtualization) on one platform.
  • Data Scientists cause they can also have more immediate data access when training their AI/ML models.
  • Data Leaders since they can reduce their overall platform costs making it easier to fund other data initiatives.

Apache Iceberg Directory

Apache Iceberg Education

Here is a list of resources to help you learn Apache Iceberg:

Apache Iceberg Hands-on Tutorials

Here is a list of hands-on tutorials that will help you get started with Apache Iceberg:

Apache Iceberg's Architecture

Here is a list of resources to help you learn Apache Iceberg's architecture and internals:

Getting Data into Apache Iceberg

Here is a list of resources to help you get data into Apache Iceberg:

Apache Iceberg Migration

Here is a list of resources to help you migrate your data to Apache Iceberg:

Streaming with Apache Iceberg

Here is a list of resources to help you stream data into Apache Iceberg:

Partitioning with Apache Iceberg

Here is a list of resources to help you learn how to partition your data with Apache Iceberg:

Maintaining and Auditing Apache Iceberg Tables

Here is a list of resources to help you maintain and audit your Apache Iceberg tables:

Apache Iceberg Catalogs

Here is a list of resources to help you learn about Apache Iceberg Catalogs:

Querying Apache Iceberg Tables

Here is a list of resources to help you query your Apache Iceberg tables:

Hybrid Apache Iceberg Lakehouses

Here is a list of resources about implementing hybrid on-premises and cloud Apache Iceberg lakehouses:

Apache Iceberg and Other Formats

Here is a list of resources about Apache Iceberg and other formats (Apache Hudi, Apache Paimon, Delta Lake):

Python and Apache Iceberg

Here is a list of resources about Apache Iceberg and Python:

Governing Apache Iceberg Tables

Miscellaneous Apache Iceberg Resources

Here is a list of miscellaneous resources to help you learn Apache Iceberg: