paint-brush
Using dbt with Materialize and Redpandaby@bobbyiliev
162 reads

Using dbt with Materialize and Redpanda

by Bobby IlievJanuary 10th, 2022
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This is a self-contained demo using [Materialize] and dbt. This demo shows you how to use [DBt] together with Materialize. The reviews and users will be stored in a database. We use dbt to model our business logic, like getting a list of important users that left bad reviews. We will then explore how to potentially reach out to the flagged users and improve our website experience. This is just a shortcut to a Docker container with a compatible CLI pre-installed.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Using dbt with Materialize and Redpanda
Bobby Iliev HackerNoon profile picture

Introduction

This is a self-contained demo using Materialize.


This demo shows you how to use dbt together with Materialize.


For this demo, we are going to monitor the reviews left by users on a demo website, and use dbt to model our business logic, like getting a list of important users that left bad reviews. We will then explore how to use this data to potentially reach out to the flagged users and improve our website experience.

How to use dbt with Materialize

Prerequisites

Before you get started, you need to make sure that you have Docker and Docker Compose installed.

You can follow the steps here on how to install Docker:

Installing Docker


Also, you would need to make sure that you have dbt (v0.18.1+) installed:

Installing dbt


You can find the files for this demo in this GitHub repository here.

Overview

As shown in the diagram above, we will have the following components:

  • A mock service to continually generate reviews and users.
  • The reviews and the users will be stored in a MySQL database.
  • As the database writes occur, Debezium streams the changes out of MySQL to a Redpanda topic.
  • We then ingest this Redpanda topic into Materialize directly.
  • After that, we use dbt to define transformations on the data and create a model that lists any VIP users that left bad reviews.
  • You could, later on, visualize the data in a BI tool like Metabase.

As a side note here, you would be perfectly fine using Kafka instead of Redpanda. I just like the simplicity that Redpanda brings to the table, as you can run a single Redpanda instance instead of all of the Kafka components.

Running the demo

First, start by cloning the repository:

git clone https://github.com/bobbyiliev/materialize-tutorials.git

After that, you can access the directory:

cd materialize-tutorials/mz-user-reviews-dbt-demo

Let's start by running the Redpanda container:

docker-compose up -d redpanda

Build the images:

docker-compose build

Then pull all of the other Docker images:

docker-compose pull

Finally, start all of the services:

docker-compose up -d

In order to launch the Materialize CLI, you can run the following command:

docker-compose run mzcli

This is just a shortcut to a Docker container with a compatible CLI pre-installed; if you already have psql installed, you could instead connect to the running Materialize instance using that: psql -U materialize -h localhost -p 6875 materialize.

As soon as the demo is running, the mock service will start generating reviews and users.

Prepare DBT Configuration

First, we will need to install the dbt-materialize plugin:

python3 -m venv dbt-venv
source dbt-venv/bin/activate
pip install dbt-materialize

After that, with your favorite text editor, open the ~/.dbt/project.yml file and add the following lines:

user_reviews:
  outputs:
    dev:
      type: materialize
      threads: 1
      host: localhost
      port: 6875
      user: materialize
      pass: pass
      dbname: materialize
      schema: analytics

  target: dev

After that, to make sure that the connection to the Materialize container is working, run:

dbt debug

Finally, we can use dbt to create materialized views on top of the 3 Redpanda/Kafka topics. To do so just run the following dbt command:

dbt run

This command generates executable SQL from our model files (found in the models directory of this project) and executes that SQL against the target database, creating our materialized views.

Note: If you installed dbt-materialize in a virtual environment, make sure it's activated. If you don't have it installed, please revisit the setup above.

Finally, you can run your dbt tests:

dbt test

Verify the Materialized Views and Sources are created

Congratulations! You just used dbt to create materialized views in Materialize.

You can check the columns of the reviews source by running the following statement:

SHOW COLUMNS FROM analytics.reviews_raw;

You'll see that, as Materialize is pulling the message schema from the Redpanda registry, it knows the column types to use for each attribute:

     name      | nullable |   type
---------------+----------+-----------
 id            | f        | bigint
 user_id       | t        | bigint
 review_text   | t        | text
 review_rating | t        | integer
 created_at    | t        | text
 updated_at    | t        | timestamp

You can verify the views were created from your psql shell connected to Materialize:

SHOW VIEWS FROM analytics;

Output:

        name
--------------------
 badreviews
 vipusers
 vipusersbadreviews

You can also verify the data is being pulled from Redpanda by running the following query a few times:

SELECT COUNT(*) FROM analytics.vipusersbadreviews;

You will be able to see that the result changes each time you run the query, meaning that the data is being incrementally updated without you having to run dbt run again.

Generate the DBT Docs

Once we have our materialized views created, we can generate the dbt docs. To do so, run the following command:

dbt docs generate

After that you can serve the docs by running the following command:

dbt docs serve

Then visit the docs at http://localhost:8080/dbt/docs/. There, you will have a list of all the views that were created and you can click on any of them to see the SQL that was generated. You would also see some nice Lineage Graphs that show the relationships between the views:

dbt Lineage Graph

Metabase

In order to access the Metabase instance, visit http://localhost:3030 if you are running the demo locally or http://your_server_ip:3030 if you are running the demo on a server. Then follow the steps to complete the Metabase setup.

Materialize integrates natively with Metabase using the official PostgreSQL connector. To connect to your Materialize database, specify the following connection properties:

Field

Value

Database

PostgreSQL

Name

user_reviews

Host

materialized

Port

6875

Database name

materialize

Database username

materialize

Database password

Leave empty

Once ready, you will be able to visualize your data just as you would with a standard PostgreSQL database.

Stopping the Demo

To stop all of the services, run the following command:

docker-compose down

Conclusion

As you can see, this is a barebones example of how to use Materialize together with dbt. You can use Materialize to ingest data from a variety of sources and then stream it to a variety of destinations.

To learn more about dbt and Materialize, check out the documentation here:

Also make sure to check out this video here on how to go from Batch to Streaming in One Easy Step (w/ Emily Hawkins & Arjun Narayan):

https://www.youtube.com/watch?v=_oVF217KPvY&ab_channel=dbt

Helpful resources:

  • CREATE SOURCE
  • CREATE MATERIALIZED VIEW
  • SELECT