Too Long; Didn't Read
Data scientists often spend a lot of time waiting for data to arrive. Creating a test dataset can be modeled off the schema of your source data. This test dataset should also have the output that, once processed through your modeling pipeline, is known exactly exactly as it is designed to do. The test can also be automated and is cheap to run (if you keep your test dataset small), so you get a free integration test on your full pipeline. This is especially important when you have quite a complex pipeline that is doing multiple data transformations before applying the model.