Introduction It has been a while since my last “What the heck is?” article, and I’ve recently seen some rapid growth from and wanted to learn more. What really piqued my interest was the recent announcements of support for and a new model. So, what the heck is GlareDB? Let’s take a look! GlareDB Apache Iceberg hybrid execution Overview GlareDB is an project utilizing the project, part of the Apache Arrow project. DataFusion is a fast, extensible query engine for building high-quality data-centric systems in , using the in-memory format. It offers SQL and Dataframe APIs and built-in support for CSV, Parquet, JSON, and Avro. There are also as well as extensive customization possibilities. GlareDB is adding many features on top of it, such as cloud storage and the aforementioned hybrid execution feature, providing a  layer on top of various compute engines that can: open-source DataFusion Rust Apache Arrow Python bindings Query local and remote files Query other databases and data sources Store data and queries (as views) Copy data from sources to destinations Interop with DataFrame libraries in Python Run one-off queries from the command line They describe how it fits in the stack in this diagram: It supports data located on GCS or S3 of the following types: BigQuery MongoDB (early release) MySQL Postgres Snowflake Preliminary Iceberg support Redshift (coming soon) ClickHouse (coming soon) They are quickly adding support for various engines, so this list could be incomplete by the time you read this. What can I do with it? At first blush, you look at this and think, hey, this seems a lot like in that it is a federated query engine. On second glance, it seems kind of like for a couple of reasons. The first is that, like DuckDB, GlareDB is a single, tight executable but written in Rust instead of C++. Second, they also support having this model (MotherDuck did it first), which I’ll cover shortly. Trino Motherduck hybrid execution Given that Trino is written in Java, that means there is a lot of Java ecosystem you need to deal with if you want to use it. Sure, there are pre-built Docker containers around that can shorten this path, but generally, if you are “just trying to do something,” then you have a heavy lift to install and set up Trino. With GlareDB, you have a single executable to download and use or make use of their SaaS product, which looks like this when you first use it: Now to Hybrid Execution. I’ll paraphrase some of what GlareDB had to say in their blog post on the topic. Say you have a CSV list of user IDs that had gotten extracted from some other tool from your database. Now, you want to enrich that data with some of the user's demographic information from your database. We’ll say our table name is user_demo and our CSV file is user_id.csv, and our query would look something like this: SELECT
   m.user_id,
   m.first_name,
   m.last_name,
   m.birth_date
FROM
   user_demo m
INNER JOIN '/user_id.csv' u on m.user_id = u.id
GROUP BY m.user_id; Clearly, this is a simple example, but you could enhance it to get information out of other joined tables as well. You can also go in the other direction, where you have some local file with a key field and some data you are interested in that you can join to a table in a database where that extra data in the file doesn’t exist in the database. This has the advantage of not having to go through the process of creating a new table and loading it for this ad-hoc report, thus saving a lot of time. That’s all just meant to give you a quick tickle about what GlareDB can do and where it is at currently. The docs and blogs on their site are well done, making it pretty quick to jump in. Summary GlareDB is very interesting, and I appreciate how quickly they are iterating and updating the software. I need to spend some more time thinking about how it plays in the , or space. Between the speed and the federated queries, there are some exciting possibilities. I really like the new hybrid execution, which could shortcut work in various situations. Try out a free account yourself if you’d like to give it a spin at . Trino StarRocks, DuckDB GlareDB You can read the other “What the heck” articles at these links: (I was pretty out front on this one.) What The Heck Is DuckDB? (I was out front on this one, too.) What the Heck Is Malloy? (slower, but also growing) What the Heck is PRQL?

Walkthroughs, tutorials, guides, and tips. This story will teach you how to do something new or how to do something better.

Branches & Tags: Comparing Iceberg, Hudi, and Delta Lake Tables

Developer Relations: To Be, Or Not To Be, A Personality?

Check out my library

What the Heck is GlareDB?

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Announcing the COBOL Streamhouse

The Noonification: Feature Optimization for Price Prediction (11/26/2023)

10 Ways to Optimize Your Database

10 Essential Computer Skills for Data Mining

10 Most Evolving Big Data Technologies to Catch Up on in 2022

Top 10 JavaScript Charting Libraries for Every Data Visualization Need

Announcing the COBOL Streamhouse

The Noonification: Feature Optimization for Price Prediction (11/26/2023)

10 Ways to Optimize Your Database

10 Essential Computer Skills for Data Mining

10 Most Evolving Big Data Technologies to Catch Up on in 2022

Top 10 JavaScript Charting Libraries for Every Data Visualization Need

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps