\ *This second post in the series goes through the design and implementation of the original property-based testing library, QuickCheck. The first and introductory post is [Property-based testing #1: What is it anyway?](https://getcode.substack.com/p/property-based-testing-1-what-is?r=1dboko&s=w&utm_campaign=post&utm_medium=web) Even if you already know what property-based testing is, it's worth familiarizing yourself with the example.* \ *The complete code can be found on [GitHub](https://github.com/kurtschelfthout/substack-pbt) - in particular [example.py](https://github.com/kurtschelfthout/substack-pbt/blob/master/example.py) and [vintage.py](https://github.com/kurtschelfthout/substack-pbt/blob/master/vintage.py).* \ Last time we looked at why you'd want to write property-based tests, and introduced the basic functionality that a property-based testing library should provide. Now, we'll dive into how a property-based testing library with random generation is implemented - and more in particular the approach set out by the OG, QuickCheck. We'll create a mini reference implementation in just a couple pages of code (including comments and examples)! \ At its core, any modern PBT library provides three related modules. \ The first allows defining and combining generators - a few functions to create generators for primitive types, ways to skew or restrict generated values, and combine them to generate custom types. For example, to generate the `age` field of `Person` objects, you'd want to restrict to positive values up to say 100. To write a generator for `Person` objects, you'd want to combine the generator for `age` and the generator for names. I'll call this the **generator** module. \ The second allows defining properties, which means specifying some parametrized test, and specifying which generator you'd like to use for it. It also includes ways to observe and log the distribution of generated values so you can convince yourself the generated values are sensible. I'll call this the **property** module. \ The third and last allows running a series of tests based on some configuration, e.g. the number of test cases to generate, how long the test should be running, and so on. I'll call this the **runner** module. \ As a sneak peek ahead, at the end of this post we'll be able to define generators, and run tests for the `sort_by_age` property we discussed in the last post. Recall that `Person` is defined as: ```python @dataclass(frozen=True) class Person: name: str age: int ``` \ We'll be able to create a generator for random `Person` objects as follows: ```python ages = int_between(0,100) letters = map(chr, int_between(ord('a'), ord('z'))) simple_names = map("".join, list_of_length(6, letters)) persons = mapN(Person, (simple_names, ages)) lists_of_person = list_of(persons) ``` \ These are five random generators for random ages, lowercase letters, simple names consisting of 6 random lowercase letters, person objects and lists of person objects respectively. We'll implement and discuss the `int_between`, `map`, `mapN` and `list_of` functions below - they're part of the generator module of our library. \ We'll also be able to define properties though the `for_all` function: ```python def is_valid(persons_in: list[Person], persons_out: list[Person]) -> bool: same_length = len(persons_in) == len(persons_out) sorted = all(persons_out[i].age <= persons_out[i + 1].age for i in range(len(persons_out)-1)) unchanged = { p.name for p in persons_in } == { p.name for p in persons_out } return same_length and sorted and unchanged prop_sort_by_age = for_all( lists_of_person, lambda persons_in: is_valid(persons_in, sort_by_age(persons_in)) ) ``` \ Finally, through the `test` function, we'll check that the property holds for 100 randomly generated lists: ```python > test(prop_sort_by_age) Success: 100 tests passed. ``` ## All things random Let's first focus on how to create a generator API - as we'll see the two other modules are entirely based on it. \ The first question is - why bother at all and not just use the built-in random API? Most languages have one, here's Python's [random module](https://docs.python.org/3/library/random.html). \ The main reason is abstraction - at this point it may look like premature abstraction, but in the following posts we'll change the implementation of generators quite a few times, to add features and change their behaviour, and we won't have to change the users of the API. This is powerful, both from the user and the maintainer's perspective - the maintainer has considerable freedom to improve, without bothering users too much. One straightforward example is changing to a different pseudo-random number generator that is faster or has better properties. \ Another reason is that typical built-in random APIs are quite poor - Python's has more features than some other languages, but still, writing the above `Person` generator would require quite a bit of plumbing. \ Without further ado, let's first take a stab at what the representation of a generator should be. Since a generator is something that when asked, can produce a new random value, the simplest representation is a function without arguments that returns a value. To make the code more readable and safe, we wrap it in a small class: ```python class Random(Generic[Value]): def __init__(self, generate: Callable[[], Value]): self._generate = generate def generate(self) -> Value: return self._generate() ``` \ Pretty straightforward - `Random` is a class that just wraps a function that generates a value of some generic type. To make what follows visible, let's first implement a simple function to sample the actual values from a generator - even though we don't have any to sample yet! ```python def sample(gen: Random[T]) -> list[T]: return [gen.generate() for _ in range(5)] ``` `sample` takes a `Random` instance and calls `generate()` on it 5 times, collecting the results in a list. With that in place, we'll now build up increasingly powerful functions that allow us to create and combine `Random` generators, so we can ultimately generate a list of `Person` values - which is what we need to be able to check the properties of `sort_by_age`. \ Let's write the simplest generator possible - one that always generates the same value. *For each new function, I'll first give the implementation with a usage example. Then I'll show the output of sampling the generator.* ```python def constant(value:T) -> Random[T]: return Random(lambda: value) pie = constant(math.pi) ``` \ `constant` takes some value as argument, and creates a `Random` instance which always generates that value: ```python > sample(pie) [3.141592653589793, 3.141592653589793, 3.141592653589793, 3.141592653589793, 3.141592653589793] ``` \ Amazing. Now slightly more exciting. ```python def int_between(low: int, high: int) -> Random[int]: return Random(lambda: random.randint(low, high)) ages = int_between(0,100) ``` \ `int_between` creates generators for integers within a range. It's a straightforward wrapper around Python's `random.randint`. Let's use it to generate reasonable ages for people: ```python > sample(ages) [37, 15, 1, 62, 21] ``` \ That's one field done, now how about generating names for `Person`? Generating random letters seems like a good start. Since we already know how to generate integers, and it's straightforward to turn an integer into a letter, we should be able to build on `int_between`. It'll be generally useful to be able to convert, or map, generated values to another type. This idea is analogous to using the built-in function [map](https://docs.python.org/3/library/functions.html#map) for iterables. So let's add `map` to our arsenal, and use it to write a lowercase letter generator. ```python def map(f: Callable[[T], U], gen: Random[T]) -> Random[U]: return Random(lambda: f(gen.generate())) letters = map(chr, int_between(ord('a'), ord('z'))) ``` \ `map` takes a function and a `Random` instance, and then applies the function to each value of the underlying generator. ```python > sample(letters) ['t', 'm', 's', 'u', 'c'] ``` \ That's great, but a single letter doesn't make for a believable name. We could map again and repeat the same letter a number of times, but I think you'll agree that `zzzzz` is not exactly an interesting name either. What we'd like to have is a method that runs a given generator N times, and combines the result. \ The answer: `mapN. mapN` applies a function to `N` generators, like `map` applies a function to a single generator (so it's actually more general than what we need for this particular example). A related function you may be more familiar with is `zip`, which combines the inputs into a tuple. ```python def mapN(f: Callable[...,T], gens: Iterable[Random[Any]]) -> Random[T]: return Random(lambda: f(*[gen.generate() for gen in gens])) ``` \ Given `mapN` we can write a `list_of_length` function, which generates a list of a given length, where each element is generated from a given `Random` generator: ```python def list_of_length(l: int, gen: Random[T]) -> Random[list[T]]: gen_of_list = mapN(lambda *args: list(args), [gen] * l) return gen_of_list ``` \ With all that we're finally able to write a somewhat reasonable `Person` generator - thanks to `mapN` again, now used in its generality: ```python simple_names = map("".join, list_of_length(6, letters)) persons = mapN(Person, (simple_names, ages)) > sample(persons) [Person(name='vuaufc', age=38), Person(name='kmvmpy', age=34), Person(name='wiuoph', age=34), Person(name='cvuxbu', age=72), Person(name='fykqmn', age=33)] ``` Pleased to meet you, Mr vuaufc! \ Okay, perhaps these are not the most believable names - but they will do for our purposes, which is to eventually test `sort_by_age`. \ As the final piece of the puzzle, we'd like to generate lists of `Person`. For the moment, we can only generate lists of a given length, using `list_of_length`. Without breaking abstraction (e.g. by using `random` directly) we can't yet generate a list of random length with random elements. That's because we have no way to write a generator that's dependent on the random value generated by another generator - we simply have no way to put them together. Try to write it with only the functions so far to convince yourself, if you like. \ This is where the following function helps: ```python def bind(f: Callable[[T], Random[U]], gen: Random[T]) -> Random[U]: return Random(lambda: f(gen.generate()).generate()) ``` \ Before going into the details of its implementation, here's how you use it: ```python def list_of(gen: Random[T]) -> Random[list[T]]: length = int_between(0, 10) return bind(lambda l: list_of_length(l, gen), length) lists_of_person = list_of(persons) > sample(lists_of_person) [[Person(name='rsgiud', age=81), Person(name='ywcelx', age=55)], [Person(name='bskmxa', age=99), Person(name='jaonqc', age=55), Person(name='igelts', age=67)], [Person(name='yuhtkg', age=78)], [Person(name='coessh', age=43), Person(name='fpjnqi', age=18), Person(name='fjojvw', age=86), Person(name='tbneue', age=97), Person(name='ejirnj', age=76)], [Person(name='wopfbo', age=22), Person(name='yjgfkf', age=39), Person(name='xsslyn', age=34), Person(name='kjpqus', age=12), Person(name='cjlldt', age=24), Person(name='bqthxy', age=29), Person(name='ttqlhx', age=20), Person(name='wyumfv', age=99), Person(name='ucdeqz', age=14), Person(name='eqrxir', age=41)]] ``` Note there are 5 lists that each have a different, randomly chosen length. \ How does `bind` work? It first runs the given generator, and feeds the resulting value into `f`. `f` takes the value as an argument and returns a new generator, which is then run in turn. The fact that the user can write the `f` function and is given a value from `gen`, allows expressing the dependence we need - we can now generate a random list length, and then use `bind` to plumb that into a generator of a random list with fixed length. \ Before taking a step back to look at what we've got in our generator module - there's one implementation subtlety in `bind` that is worth mentioning. It's tempting to think that the implementation can be simplified to: ```python def bad_bind(f: Callable[[T], Random[U]], gen: Random[T]) -> Random[U]: # let's get rid of the lambda return Random(f(gen.generate()).generate) ``` Unfortunately that doesn't work - it only generates from `gen` and applies `f` once, which means in the example it would create a generator that chooses a random length *once*, and then forever returns random lists of that length. ## It's maps all the way down With a small but beautiful arsenal of functions - `int_between`, `constant`, `map`, `mapN` and `bind`, we can generate the list of person objects we need. With small additions like `choice` (which you can find in the [source code](https://github.com/kurtschelfthout/substack-pbt/blob/master/vintage.py#L75), implemented in terms of `bind`) and `filter` this handful of functions is the sufficient and necessary core of any generator module. \ As we touched on in the last post, these functions are sometimes called "combinators" because they are designed to be combined without limit. All of the functions return a `Random` object, and so can be used again as an argument of a further combinator - another way of saying that is that they are infinitely composable. That makes it a beautiful and powerful approach to API design. This may have all felt quite familiar if you've used collections APIs before - most of the generator functions have corresponding built-in python functions that work on collections (iterables, more precisely). * `int_between` is like `range` * `map` is like `map` * `mapN` is like `zip` + `map` * `bind` is like a nested `for` loop in a list comprehension \ This is perhaps not a surprise - generators are infinite streams of random values, and so any API that can work with a container of values of some sort can support similar operations. If Python would allow changing how [generator expressions](https://peps.python.org/pep-0289/) are processed, we'd be able to write random generators like: ```python # not actually Python lists_of_person = gen ( pl for length in int_between(0, 10) for pl in list_of_length(length, persons) ) ``` Which can be automatically translated into the version with `bind` (which actually works!) above - illustrating that `for x in generator ...` is equivalent to `bind(lambda x: ..., generator)`. \ An interesting aspect is that the sequence of functions `map`, `mapN` and `bind` gives the user an increasing amount of expressive power. One way to understand this is that `map` and `mapN` can be implemented in terms of `bind`, but `bind` can not be implemented in terms of `map` and `mapN`. The same is true for `map` vs `mapN` - if you have `mapN` you have `map`, but not vice versa. \ Why not just implement `bind` then? Indeed we could do that without any loss of functionality or expressivity. The flip side is that `bind` exposes strictly less information to the implementation and so `bind` can make fewer assumptions about what's intended compared to `map` and `mapN`. This is reflected in the relative complexity of their implementations. As a result typically `bind` is less performant than `mapN` which is less performant than `map` - if you can write it with `map`, don't use `bind` to give the implementation more to work with. \ This pattern of functions - `constant`, `map`, `mapN` and `bind` - returns time and time again in various guises in collection libraries, streams, reactive libraries and the like. In languages with expressive type systems like Haskell explicit abstractions or interfaces exist to encapsulate them - `Functor`, `Applicative` and `Monad` respectively. The concepts also have a deep connection to category theory, which is a branch of mathematics that Eric Meijer describes as "[the Mathematicians' interpretation of interface-based design](https://www.youtube.com/watch?v=JMP6gI5mLHc)". \ We can only scratch the surface of that here - hopefully you now have an intuitive idea of what these "API design patterns" entail with some leads to learn more if you're interested. Many libraries besides PBT use these patterns and recognizing them makes it easier to get started. As a library designer, it's definitely useful to know and learn when these patterns can be used - they're a valuable tool in your toolbox. \ If you've made it this far, now would be a good time to reward yourself with a slice of pecan pie! [ ](https://images.unsplash.com/photo-1619051805375-7b83440bb11b?crop=entropy&cs=tinysrgb&fit=max&fm=jpg&ixid=MnwzMDAzMzh8MHwxfHNlYXJjaHwxfHxwZWNhbiUyMHBpZXxlbnwwfHx8fDE2NTIyNTc3Nzk&ixlib=rb-1.2.1&q=80&w=1080) ## Let me write some tests already Now that we can write random generators, let's go the final distance and write a few functions to define properties and run some tests. \ First, we need a way for the user to write a property. A property is just a combination of a generator and an assertion. We'll run the assertion 100 times with randomly generated values. The function to connect a generator to an assertion is typically called "for all", because this is reminiscent of the logical for all quantifier. In prose, we want to write "For all lists of person, the result of sort by age is valid." In some sense "for all" is a misnomer, as of course we won't really be testing or proving anything for *all* values, but let's stick with it anyway. \ For simplicity we'll write assertions as functions that return `True` if the assertion holds, `False` if not. More traditionally you'd use `assert` and fail the test on exception, but doing it this way allows us to build up the implementation in an easier to understand way. \ Our first attempt at defining a property, is that it's a boolean generator indicating whether the test has failed or not. That means we don't even need a new class - we can just use `Random[bool]`: ```python Property1 = Random[bool] def for_all_1(gen: Random[T], property: Callable[[T], bool]) -> Property1: return map(property, gen) sort_by_age_1 = for_all_1(lists_of_person, lambda persons_in: is_valid(persons_in, sort_by_age(persons_in))) # wrong_sort_by_age is sort_by_age with a bug, # to show what a failing test looks like. wrong_sort_by_age_1 = for_all_1(lists_of_person, lambda persons_in: is_valid(persons_in, wrong_sort_by_age(persons_in))) ``` \ Our first attempt at implementing `for_all` is just a `map`, applying the assertion to each generated element. Since a property created by `for_all` is just a `Random` instance, we can `sample` it: ```python > sample(sort_by_age_1) [True, True, True, True, True] > sample(wrong_sort_by_age_1) [False, True, False, False, False] ``` \ That's not user-friendly output though. In the actual `test` function, we `generate` 100 values - if they're all `True`, our test has succeeded: ```python def test_1(property: Property1): for test_number in range(100): if not property.generate(): print(f"Fail: at test {test_number}.") return print("Success: 100 tests passed.") ``` \ In action: ```python > test_1(sort_by_age_1) Success: 100 tests passed. > test_1(wrong_sort_by_age_1) Fail: at test 0. ``` \ Okay that seems to work, but it doesn't allow writing properties that draw from two or more generators - it would be nice if we could nest `for_all`. Here's `for_all_2` which allows that: ```python def for_all_2(gen: Random[T], property: Callable[[T], Union[Property1,bool]]) -> Property1: def property_wrapper(value: T) -> Property1: outcome = property(value) if isinstance(outcome, bool): return constant(outcome) else: return outcome return bind(property_wrapper, gen) sum_of_list_2 = for_all_2( list_of(int_between(-10,10)), lambda l: for_all_2(int_between(-10,10), lambda i: sum(e+i for e in l) == sum(l) + len(l) * i)) ``` \ We allow the property function to return either a bool value (in which case it's the innermost `for_all`) as before, or a `Property1` (in which case it's one of the outer `for all` functions). Because we may now get a `Property1` back, which is a `Random[bool]`, it's not enough to use `map`: we have to upgrade to `bind` in the implementation of `for_all_2`. \ As a final improvement let's print the generated arguments if the test fails. We'll use the same trick as before by piggy-backing on `Random`, but instead of `bool` use a richer type `TestResult`: ```python @dataclass(frozen=True) class TestResult: is_success: bool arguments: Tuple[Any,...] Property = Random[TestResult] def for_all(gen: Random[T], property: Callable[[T], Union[Property,bool]]) -> Property: def property_wrapper(value: T) -> Property: outcome = property(value) if isinstance(outcome, bool): return constant(TestResult(is_success=outcome, arguments=(value,))) else: return map(lambda inner_out: replace(inner_out, arguments=(value,) + inner_out.arguments), outcome) return bind(property_wrapper, gen) def test(property: Property): for test_number in range(100): result = property.generate() if not result.is_success: print(f"Fail: at test {test_number} with arguments {result.arguments}.") return print("Success: 100 tests passed.") ``` \ This is more user-friendly when the test fails: ```python > test(prop_wrong_sort_by_age) Fail: at test 0 with arguments ([Person(name='qqdkng', age=31), Person(name='lcqyva', age=73)],). ``` \ At least it now tells us which arguments are a failing input! With that it's straightforward to reproduce and debug the error. \ Note that for mutable values, capturing the arguments in this way can be problematic: `TestResult.arguments` captures a reference to the generated value, but if the assertion or the function under test changes it, we'll print the value after the change. For a pure functional language like Haskell that can't happen, but for Python and others it's definitely a problem. One neat way to deal with this is to store the pseudo-random number generator's seed instead of the generated values, and then re-generate the value from scratch with that seed. We'll come back to issues around mutability in the next posts. \ That's as far as I want to take this initial implementation. Let's take stock of what we have, what's missing, and what we'll cover in the next post. ## Finally The 80/20 rule applies here - this code, which altogether is just a couple pages, represents 80% of the functionality of a property-based testing library. A lot of the work goes into writing and maintaining a wide range of useful generators for commonly used types. I've definitely simplified and omitted a number of aspects though, only some of which we'll come back to. * The property and test "modules" here only have one function each (`for_all` and `test` respectively). Real PBT libraries have more functions to gather statistics and info on generated values, and this is implemented by adding more fields on `TestResult` to collect them. For example, we might print information like "20% of the generated lists were empty, generating values took 12% of the test time" and so on. Also there will be a bunch more options to configure the test run in various ways. * QuickCheck was written in Haskell which is a pure language and so QuickCheck's `Random.generate` needs to include the pseudo-random number generator's seed value as input. In the implementation above, that seed is hidden away in the `random` module, and updated as values are generated. * QuickCheck's `generate` also includes an integer `size` argument. This size is used to restrict the "size" of the generated values - which is not unambiguously defined, but it's typically used to make sure that generators for recursive types (like trees) don't end up generating huge values, by e.g. interpreting the given size as the maximum depth of the tree. We will come back to the notion of size when we discuss shrinking in the next few posts. \ That's it for this post, a lot of ground covered! The next post will discuss shrinking - which is an area where most experimentation has happened since QuickCheck was introduced. \ If you feel like it, please share this post, comment below or discuss on Twitter [@kurt2001](https://twitter.com/kurt2001). Until next time! --- Also Published [Here](https://getcode.substack.com/p/-property-based-testing-2-the-essentials)