If you come from OO and have ever tested any piece of software chances are you’ve employed example-based testing. In other words, the function under test is called with some predefined input and the return value is checked against some predefined expectation. For instance, the function sum
is called with 1
and 2
and the expected result is 3
.
There are problems with this approach though. Firstly, only a limited amount of examples can be checked. Secondly, it’s a human being deciding what input-output to verify. This, unfortunately, reflects the developer’s bias. Therefore there’s a big chance some critical input is left unchecked.
But how is it possible to have expectations on the return value of a function if the input is not chosen by the developer?
Turns out, it’s possible to derive invariants (properties) out of the code that are always true, no matter what the input is. If you mix that with random values generators you get property-based testing.
In the case of sum
, we would use a number generator to produce the input. Then, we would derive some properties to test such as commutativity (i.e. x + y == y + x
) and associativity (i.e. (x + y) + z == x + (y + z)
). Lastly, the property-based testing framework would generate some sets of x
, y
and z
and check multiple times if the properties hold.
In presence of a failure, we would get back the first set of x
, y
and z
for which the property didn’t hold.
The sum
one is a simple example. But we could imagine something more complex that involved other function calls and less trivial input. In that case, just getting the input set that failed could be not enough.
That’s why most property-based testing frameworks provide a feature called shrinking. In other words, after a failure, the framework tries to shrink the input that made the property fail to its simplest form by removing or simplifying input data.
Let’s apply property-based testing to one of those stupid examples you are never going to see in real life. I’m going to use JavaScript and JSVerify.
Let’s say we have the following code which needs tests:
const div = (dividend, divisor) => dividend / divisor
Easy, right? The following example-based tests are green so we can call it a day!
describe('div', () => {it('with natural numbers', () => {const expected = 2
const actual = div(6, 3)
assert.strictEqual(actual, expected)
})
it('with decimal numbers', () => {const expected = 2
const actual = div(6.3, 3.15)
assert.strictEqual(actual, expected)
})})
Well, not really. Let’s see what happens with property-based tests.
Firstly, as the generator we can use jsc.nat
which returns natural numbers.
Secondly, as the properties let’s just check the “right-distributive” one (i.e. (n1 + n2) / n3 == (n1 / n3) + (n2 / n3)
).
describe('div', () => {const naturalNumber = jsc.nat
jsc.property('is right-distributive',naturalNumber, naturalNumber, naturalNumber,(n1, n2, n3) => div(n1 + n2, n3) === div(n1, n3) + div(n2, n3))})
BOOM!
Error: Failed after 4 tests and 4 shrinks. rngState: 08b51479f83d6a20ec; Counterexample: 0; 0; 0;
With n1 = 0
, n2 = 0
and n3 = 0
something went wrong. From the node REPL
div(0 + 0, 0) === div(0, 0) + div(0, 0)// false
div(0 + 0, 0)// NaN
div(0, 0) + div(0, 0)// NaN
NaN === NaN// false
JavaScript, right? If we run the property-based test again
Error: Failed after 4 tests and 4 shrinks. rngState: 08b51479f83d6a20ec; Counterexample: 2; 32; 3;
BOOM AGAIN. But this time it’s a different failure. From the node REPL
div(2 + 32, 3) === div(2, 3) + div(32, 3)// false
div(2 + 32, 3)// 11.333333333333334
div(2, 3) + div(32, 3)// 11.333333333333332
JavaScript and floating point arithmetic, right?
Now, if property-based testing found out 2 bugs in 2 runs out of
const div = (dividend, divisor) => dividend / divisor
imagine what else it can find out of more complex code.
Property-based testing takes away bias. That way it enables discovering bugs the developer didn’t think to test for.
Also, it forces to consider the code from yet another point of view, which is a good thing.
At the same time, properties are somewhat more abstract than examples (e.g. commutativity in sum vs sum(1, 2) == 3
). That’s why mixing the two styles is prolly the best idea.
Hungry for more? Check out I use property-based TDD in the follow up post.
[paper] The Practice of Theories: Adding “For-all”Statements to “There-Exists” Tests by David Saff and Marat Boshernitsan