Leveling Up Your dbt Tests in 7 Steps

Written by lizbigeye | Published 2022/09/13
Tech Story Tags: data-reliability | dbtesting | data-testing | data-observability | dbt-test | coding-skills | data-pipeline | data-quality

TLDRDbt tests are great, but sometimes might not go far enough. When and how can you level up your dbt testing capabilities? via the TL;DR App

Dbt is a wildly popular tool. When it comes to the data pipeline, companies do (and should!) use dbt tests to mitigate problems.

Dbt tests are great for organizations who want to improve data quality and reliability. But sometimes, dbt might not go far enough. Testing has inherent limitations. For example - tests only provide information from the areas that are tested. But what about the areas you don’t have time to test, because of resource or time limitations?

When an organization gets big enough, it might need to fold in some additional data observability. In this post, we’ll talk about 7 ways to level up your dbt tests, and some indicators that it’s time to move beyond the test.

1. Get a handle on the different dbt test types

When it comes to dbt tests, you have two options: data tests, and schema tests.

Data tests are specific queries that you run against a specific model. They live in the “tests” folder of your dbt project.

Schema tests are generic. They can be configured and applied to multiple models.

Out of the box, dbt has four schema tests that can be immediately used: unique, not_null, accepted_values, and relationships. You can also write custom schema tests, as macros with the prefix test and stored in the macros folder.

If you have one model that you want to validate in a specific way, you can simply write a data test in the tests folder. But if you find that you’re copying and pasting that query and configuring it. across multiple models - you might want to transition your data test into a schema test.

2. Use plugins and packages

Level up your dbt with integrations, like:

OpenLineage - an open platform for data lineage collection and analysis. It tracks metadata about datasets, jobs, and runs. OpenLineage consumes data produced by dbt runs.

dbt_expectations - a dbt plugin that allows users to deploy GE-like tests in their data warehouse directly from dbt

Unit tests - small test cases for Macros that verify data is working as expected

3. For all dbt runs, schedule dbt tests

For daily dbt jobs, you likely run associated tests on the same schedule. You can configure your dbt job to have multiple commands, including tests, running, freshness checks, and more. You can set up automated emails and Slack notifications in the case of failure.

4. Run dbt tests on a model change / new PR

When a PR is created in your database repo with a migration, use dbt tests to verify changes won’t break your dbt models before merging the PR.

You can do this by using data model blue-green deployment. Your CI/CD tool should do the dbt run and dbt test commands to target a staging database environment or a staging schema within the production environment.

If the tests pass, continue on with dbt run against the production environment or production schema.

Also, make sure that your staging environment is fully replicated. That way, the intitial staging dbt test results are an accurate proxy for how production will play out.

5. Run dbt tests on your source models

To catch potential problems as early as possible, run “dbt test” against your source models first:

  1. Run dbt tests against the source
  2. Run the dbt job
  3. Test the resulting outputs

In other words:

dbt test --models source:[sources]

dbt run -m [models]

dbt test -m [models]

6. Distribute dbt test results

Set up an interactive way to display and visualize dbt test results. That way, you’ll always be aware when something in your dbt transformations goes wrong.

Several tools on the market can help you with these displays. For example, re_data and elementary do these visualizations for you. Or, you can create a custom visualization.

For example, the team at Snapcommerce leveraged dbt artifacts and the query log to build reporting and alerts on dbt model/job runs and performance. This may also be a good point to begin considering a more generic data observability tool like Bigeye.

7. Add data observability into your pipeline

Testing and data observability have the same end goal - detecting problems. However, they approach this end results differently. Testing is opinionated - it originates in a human expressing a condition they think should be true. Observability is neutral - it asks, “Do we know the state of the data system at all times? How has it changed?” Much like a dashboard on a vehicle, data observability provides insight into how fast you’re going, whether the engine light is on, and whether you’re due for an oil change.

In conclusion: Optimize your dbt testing sophistication

Data engineers will go to some trouble to bridge the gap between dbt tests and complete data observability. It often results in a hacked-together setup that leaves out large parts of the data stack, outside the realm of dbt.

As organizations scale, there’s infinite opportunity to test. While you can only write tests for so many things, you can layer in data observability on top of your set of tests. It will help you catch the things for which you don’t want to (or can’t) write tests.


Written by lizbigeye | Director of Content Marketing, Bigeye
Published by HackerNoon on 2022/09/13