How to add variety to your tests without compromising on reproducible outcomes?
Tests are important. Tests are good. Tests are safety. Tests can be fooled.
Even with the best of intentions we can sometimes end up with false positive tests. We write the test first (Red), do the simplest possible thing to make it pass (Green) and then someone walks in, lunchtime happens, we lose focus and forget to refactor.
Following TDD principles closely, we could end up with something like this:
Even though our test wants to assert that based on a known input we get a known output, all it tests is that the function returns “I’m a teapot”
To address this issue, we have to add variance to the tests to make sure that for multiple inputs we get the correct outputs. Traditionally this can be done by triangulation which would force us to write more and more abstract solutions. This works, but with a finite number of inputs there is always a way to game the system.
I don’t want to go into the intricacies of writing randomised tests, that topic deserves its own article which I’ll write if there’s a demand for it. For now, let’s consider the following:
Here we generate test data that has a random input and the corresponding expected output. This way, every single test execution works with unpredictable data, thus faking the response would be nearly impossible.
We’re using the ChanceJS library to get random stuff, because it’s amazing at doing exactly that.
Now here’s the kicker. Imagine you have hundreds of tests randomising data left and right. Your variance is huge and over the months, you’ve tested your application with all sorts of values that you probably didn’t even think of before. At one point, something will fail, and it will fail hard.
Your first reaction would be to replay the tests locally to see if you could reproduce the outcome and figure out what went wrong.
But how do you do that, when all your values are completely random and change every time?
Unless you’re building regulated gambling solutions or highly specialised solutions, most of the randomisers you’ll end up working with will be pseudo-random generators. The main characteristic of such a thing is that the values they produce are completely determined by an initial value, called a seed. This means that if you ask for 5 random numbers in a sequence, the sequence will give you the exact same numbers every time if you use the same seed.
Companies who need something more random than that, have been getting quite creative with their solutions which includes pointing cameras at an array of lava lamps. https://www.youtube.com/watch?v=1cUUfMeOijg
Fortunately for us, we’re completely content with not being securely random and we can use it to our benefit. As we’ve learned, given the same seed value, we’ll always get the same random values generated. This means that if we can find a way to expose and control our test seed, we could use that to reproduce our failing tests anywhere and any time we’d like. Let’s modify our test to deal with a seed.
Here you can see that we initialise our random generator with a specific seed which we either get from an environment variable, or … randomly create. It’s that simple in this isolated case.
Implementing on a project level
Having control over the randomness in one file is nice, but is definitely not enough if you’re creating an application that actually does something. There’s a lot of ways to make this work from this point on, allow me to share our solution that uses Jest’s globalSetup option paired with environment variables.
We want to achieve two things.
- Create a new seed if none is specified and make it available for all the tests
- Use an existing seed if the user specifies it
I have chosen to share the seed with environment variables because I believe it’s the most CI friendly and easy to manage with a command line.
This little snippet will run every time you run Jest and before your tests execute. All you have to do to wire it up is to tell Jest in your package.json to use this testSetup.js.
And that’s it.
Reproducing the tests
From now on, you will have a message with every test run in your console letting you know what the seed was. You can copy that seed from your circleci/jenkins/travis/gitlab/etc build log, and run your tests locally with
$ CHANCE_SEED=534a873a618e4e317060f9bc29f9115ad156168b jest
This sets the CHANCE_SEED environment variable to the value specified for the jest execution. You can also use this to keep the test data the same across your build pipelines if you have multi stage testing for example.
Isn’t there a simpler way?
As I was writing this article, I realised that this whole thing could be packaged up and abstracted away from the every day user, so I went ahead and made a package.
Helper for reproducible random tests with Jest and ChanceJS - meza/jest-chancegithub.com
This does exactly what we’ve been discussing in the article, but in a no-brainer way. It even makes using Chance a bit simpler, as you won’t need to manually instantiate it anymore.
This will give you a primed Chance object with the global seed so that you don’t have to remember to add the seed logic to every test you need random data in.
What do you think? Do you use randomised test data? I’d love to hear your learnings from the experience.