What the heck is
This blog is going to be more of a hot take on PuppyGraph to get you thinking about how you might use it in your own projects. I have no affiliation with the company or project other than thinking it was pretty cool. Co-founder
Simply, PuppyGraph is a cloud-native graph data lakehouse providing a graph analytics engine for your data. They address graph scalability through the auto-sharding of data so the compute and storage are separate, much like the lakehouse design. So, they provide a graph data warehouse, data lake, and multi-data models on a single copy of your data. That means you can do some pretty cool graphing on your data in one of the supported formats.
PuppyGraph has rapidly added support for various platforms, catalogs, and connection engines. Currently, we see:
Their SaaS interface also gives you direct access to both a Gremlin and Cypher console to perform graph queries, in addition to a graph notebook, which uses Jupyter.
A Docker container is provided to allow you to get started on a local machine. You’ll need a schema defined in JSON format that will define your data layout to PuppyGraph. Once you ingest that and it is verified, then away you go.
The integrated graph browser is pretty nifty. You can easily zoom in/out to see the clustering and attributes in addition to queries.
Zooming in further, we can see more of the details:
Clicking on a node will give us a pop-up of details:
This allows you to explore different vertices and edges easily. These static pictures don’t really represent how fast the performance is or how much fun it is to bounce around your data. I should have utilized some genealogical data for fun.
Because they are using the Gremlin and Cypher query languages, that means any 3rd party UI tool will also be compatible. A real advantage here is that PuppyGraph works on the data where it lives and isn’t making you copy it elsewhere. Without going into the particulars on a specific platform, this gives you a general idea of what features and functions are available.
Certainly, graph databases and their representation don’t apply generically as a structured database does, but we are seeing more and more how these kinds of data representations are being used to model the real world. I didn’t see that this is an open-source project, and I didn’t find it on GitHub. There is no mention of pricing, so I’m not sure where they are going with all of this. The documentation isn’t amazing, but it seems to be enough to get started and try it out. Overall, this is a fun project to play with. I need to percolate on it more to see where I might use it, but I can envision some interesting use cases combining it with other self-contained projects like DuckDB and LanceDB.
Check out my other What the Heck is… articles at the links below: