What the heck is ? That was the first thing I asked myself when I came across it in the Summer of 2023. It was my involvement with that brought it to my attention; they wanted to add Iceberg support to PuppyGraph, but just what the heck is it? PuppyGraph Apache Iceberg This blog is going to be more of a hot take on PuppyGraph to get you thinking about how you might use it in your own projects. I have no affiliation with the company or project other than thinking it was pretty cool. Co-founder recently (Feb 2024) gave a presentation at the that was interesting, and well received, according to my friends that were there. Weimo Liu Chill Data Summit What is PuppyGraph? Simply, PuppyGraph is a cloud-native graph data lakehouse providing a graph analytics engine for your data. They address graph scalability through the auto-sharding of data so the compute and storage are separate, much like the lakehouse design. So, they provide a graph data warehouse, data lake, and multi-data models on a single copy of your data. That means you can do some pretty cool graphing on your data in one of the supported formats. What can it connect to? PuppyGraph has rapidly added support for various platforms, catalogs, and connection engines. Currently, we see: Apache Iceberg Apache Hudi Delta Lake MySQL PostgreSQL DuckDB BigQuery Redshift LanceDB (coming soon) JDBC Catalog Data Lake Catalog Hive Metastore AWS Glue Their SaaS interface also gives you direct access to both a Gremlin and Cypher console to perform graph queries, in addition to a graph notebook, which uses Jupyter. Using PuppyGraph A Docker container is provided to allow you to get started on a local machine. You’ll need a schema defined in JSON format that will define your data layout to PuppyGraph. Once you ingest that and it is verified, then away you go. The integrated graph browser is pretty nifty. You can easily zoom in/out to see the clustering and attributes in addition to queries. Zooming in further, we can see more of the details: Clicking on a node will give us a pop-up of details: This allows you to explore different vertices and edges easily. These static pictures don’t really represent how fast the performance is or how much fun it is to bounce around your data. I should have utilized some genealogical data for fun. Because they are using the Gremlin and Cypher query languages, that means any 3rd party UI tool will also be compatible. A real advantage here is that PuppyGraph works on the data where it lives and isn’t making you copy it elsewhere. Without going into the particulars on a specific platform, this gives you a general idea of what features and functions are available. Summary Certainly, graph databases and their representation don’t apply generically as a structured database does, but we are seeing more and more how these kinds of data representations are being used to model the real world. I didn’t see that this is an open-source project, and I didn’t find it on GitHub. There is no mention of pricing, so I’m not sure where they are going with all of this. The documentation isn’t amazing, but it seems to be enough to get started and try it out. Overall, this is a fun project to play with. I need to percolate on it more to see where I might use it, but I can envision some interesting use cases combining it with other self-contained projects like DuckDB and LanceDB. Check out my other What the Heck is… articles at the links below: What The Heck Is DuckDB? What the Heck Is Malloy? What the Heck is PRQL? What the Heck is GlareDB? What the Heck is SeaTunnel? What the Heck is LanceDB? What the heck is SDF? What the Heck is Paimon? What the Heck is Proton?

Spotlight on Ask On Data

What the Heck is Proton?

What the Heck is PuppyGraph?

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Announcing the COBOL Streamhouse

Branches & Tags: Comparing Iceberg, Hudi, and Delta Lake Tables

Data Analytics: Apache Doris' Impact in Reporting, Tagging, and Data Lake Operations

Data Management in 2024: Will Open Data Formats Shape a “Sixth Platform”?

Effective Adoption of Data Warehouses in Healthcare: A Complete Guide

How to Tell the Difference Between Data Warehouses, Data Lakes and Data Lakehouses

Announcing the COBOL Streamhouse

Branches & Tags: Comparing Iceberg, Hudi, and Delta Lake Tables

Data Analytics: Apache Doris' Impact in Reporting, Tagging, and Data Lake Operations

Data Management in 2024: Will Open Data Formats Shape a “Sixth Platform”?

Effective Adoption of Data Warehouses in Healthcare: A Complete Guide

How to Tell the Difference Between Data Warehouses, Data Lakes and Data Lakehouses

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps