This is the sixth post in a series about property-based testing. This post describes "random shrinking", the third and last implementation of shrinking I know of. It keeps all the advantages of internal shrinking, which we discussed in part 5, and is much simpler. Previous posts are: What is Property-based Testing? Essentials of Vintage QuickCheck Shrinking For Easier Debugging Unifying Random Generation and Shrinking Shrinking Choices, Shrinking Values The last posts discussed two different approaches to shrinking failing test cases, aiming to make the failing example easier to understand and debug. The first approach, direct shrinking, makes values smaller by directly changing them. The second approach, internal shrinking, instead tries to change the sequence of choices that are made during random generation. These choices are like the DNA of a generated value, and choices can be edited to make the resulting value smaller. When we discussed the advantages of internal shrinking vs direct shrinking, we also noted that editing the choices can be tricky. Significant engineering effort is needed to make it work well. What if we could have our cake and eat it too - an approach to shrinking that keeps all the advantages of internal shrinking, but is much simpler to implement? The unreasonable effectiveness of randomness The idea is simple: instead of coming up with a deliberate algorithm to make values smaller, we just - in technical terms - throw shit to the wall and see what sticks. That is, we're already counting on random generation to find bugs, let's also randomly try to find smaller values that still make our test fail. Unlike the shrinking strategies we've seen so far, which produce a limited number of smaller values to try, this new approach can keep trying forever. If the random generator is "random enough", a good shrink should show up at some point. In practice, that can take too long, especially if the test itself is slow. However, we can avoid running a test, or even generating an actual value, by calculating the size of the value that we're about to generate beforehand. We'll keep the size of the smallest value that fails the test so far, and then randomly generate another and estimate its size while we generate. If that size is greater than that of the current smallest value, we're not interested in running the test with it or creating the rest of the value. Instead, we'll make the generator short-circuit - basically abort and move on to the next value. This avoids a lot of useless work. I learned of this beautiful idea via by Anthony Lloyd, and to the best of my knowledge, he is also the inventor of it. As we'll see it's simple to implement - the shortest reference implementation of all - and it's effective. As with previous implementations, it's worth noting that CsCheck's approach is slightly more complex than what I'll describe here - I'm just capturing the main idea. CsCheck Matters of size The implementation contains all the same concepts as the previous posts. We make two modifications to . In addition to generating a random value, we also return the size of the generated value. For our purposes, the size is simply represented by a positive integer. The smaller the integer, the smaller the size. Additionally, to enable short-circuiting, the generator takes in the current minimal size, . Random min_size Size = int class SizeExceeded(Exception): pass class Random(Generic[T]): def __init__ (self, generator: Callable[[Optional[Size]], Tuple[T, Size]]): self._generator = generator def generate(self, min_size: Optional[Size] = None) -> Tuple[T, Size]: return self._generator(min_size) If is , we're in the random generation phase of the test run - meaning we haven't found a failing test yet and so the generator should never short-circuit. If is a value, we are shrinking, and is the current smallest value for which the test fails. If a generator exceeds that size, there's no point in going on, and the generator should throw to short-circuit. Don't worry if that doesn't make sense yet - examples below. min_size None min_size Size min_size SizeExceeded Beginning with , as usual: constant def constant(value:T) -> Random[T]: return Random(lambda: (value, 0)) A constant value has a size 0 - no amount of random generation is going to make the constant value change, so if we have found a value with size 0 we can just stop shrinking. Our old friend is a bit more tricky. int_between def int_between(low: int, high: int) -> Random[int]: def zig_zag(i: int): if i < 0: return -2*i - 1 else: return 2*i def generator(min_size: Optional[Size]): value = random.randint(low, high) size = zig_zag(value) # short circuit - implementation below dec_size(min_size, size) return value, size return Random(generator) uses so-called ZigZag encoding to calculate the size of an int. ZigZag encoding is also used in . The gist of it is that each signed integer gets a positive size, by zigzagging between negative and positive integers. Unlike the implementation here, with a limited type like int32 or int64, you only need a few bit shifts and exclusive or to compute the size, which makes it pretty fast in practice. The idea is that integers with greater absolute value have greater size. int_between Protocol Buffers I've also introduced a method (for "decrease size"), which makes sure we short-circuit generation as soon as possible. dec_size def dec_size(min_size: Optional[Size], decrease: Size) -> Optional[Size]: if min_size is None: return None smaller = min_size - decrease if smaller < 0: raise SizeExceeded() return smaller If is not , meaning we're currently shrinking, subtracts the size of the currently generated value from the current minimum size, and checks if that is still positive. If it's not, it throws to indicate that the value we're about to generate is greater than the best, minimum size - thereby short-circuiting. The less work we do, the more values we can generate in a shorter time, and the more opportunities we have for finding smaller values. We're not yet using the result of but we will do so soon. min_size None dec_size SizeExceeded already dec_size is uninteresting - it just maintains the size: map def map(func: Callable[[T], U], gen: Random[T]) -> Random[U]: def generator(): result, size = gen.generate() return func(result), size return Random(generator) The size of is the sum of the sizes of all its inputs: mapN def mapN(func: Callable[...,T], gens: Iterable[Random[Any]]) -> Random[T]: def generator(min_size: Optional[Size]): results: list[Any] = [] size_acc = 0 for gen in gens: result, size = gen.generate(min_size) min_size = dec_size(min_size, size) results.append(result) size_acc += size return func(*results), size_acc return Random(generator) Both and assume that the size of the output value decreases as the size of the input values decreases. Reasonable, though it's easy to imagine a counterexample. Here's where the result of is used: as values for the inputs are generated, subtracts their size from . This ensures that, if say we have 10 inputs, and we already exceed the by the third generator, we short-circuit as soon as possible. map mapN dec_size mapN min_size min_size Finally, : bind def bind(func: Callable[[T], Random[U]], gen: Random[T]) -> Random[U]: def generator(min_size: Optional[Size]): result,size_outer = gen.generate(min_size) min_size = dec_size(min_size, size_outer) result,size_inner = func(result).generate(min_size) size = size_inner+size_outer return result, size return Random(generator) The size of is interpreted as the sum of the size of both inner and outer generators. bind And that is pretty much it! We can literally reuse our implementation of and , which I've repeated here as a reminder. Property for_all @dataclass(frozen=True) class TestResult: is_success: bool arguments: Tuple[Any,...] Property = Gen[TestResult] def for_all(gen: Gen[T], property: Callable[[T], Union[Property,bool]]) -> Property: def property_wrapper(value: T) -> Property: outcome = property(value) if isinstance(outcome, bool): return constant( TestResult( is_success=outcome, arguments=(value,) ) ) else: return map( lambda inner_out: replace( inner_out, arguments=(value,) + inner_out.arguments ), outcome ) return bind(property_wrapper, gen) def test(property: Property): def find_smaller(min_result: TestResult, min_size: Size): ... # the only new bit for test_number in range(100): result, size = property.generate() if not result.is_success: print(f"Fail: at test {test_number} with arguments {result.arguments}.") find_smaller(result, size) return print("Success: 100 tests passed.") Now, is new but is hardly difficult. The idea is to keep generating random values until we find one that is both smaller than the current smallest known size, and that still fails the test: find_smaller def find_smaller(min_result: TestResult, min_size: Size): skipped, not_shrunk, shrunk = 0, 0, 0 while skipped + not_shrunk + shrunk <= 100_000 and min_size > 0: try: result, size = property.generate(min_size) if size >= min_size: skipped += 1 elif not result.is_success: shrunk += 1 min_result, min_size = result, size else: not_shrunk += 1 except SizeExceeded: skipped += 1 print(f"Shrinking: gave up at {min_result.arguments}") print(f"{skipped=} {not_shrunk=} {shrunk=} {min_size=}") We stop shrinking after trying 100,000 values, or if we somehow found the smallest possible size of 0. For each try, there are three possible outcomes: "skipped". This means that during or after the generation of the value, we detected that its size is greater than our current minimum size. "shrunk". Successful shrink - we've found a smaller value that still fails the test. "not_shrunk". Unsuccessful shrink - we've found a smaller value, but it passes the test. Before we check how that works out in practice, let's take a break and have a healthy snack. Putting it to the test Now the question is - does this even work? It seems naively simple. Only one way to find out - using our canonical example. Note once again we didn't have to change generators or test code, so I won't repeat that code here. prop_wrong_sort_by_age > test(prop_wrong_sort_by_age) Fail: at test 0 with arguments ([Person(name='pjahgc', age=27), Person(name='gndrlt', age=70), Person(name='qukflk', age=79), Person(name='dknczu', age=2), Person(name='jqtqgr', age=8), Person(name='xlcxhk', age=22), Person(name='wotanl', age=5), Person(name='nxkupy', age=99), Person(name='pxngky', age=31)],). Shrinking: gave up at arguments ([Person(name='etkdgb', age=8), Person(name='haagjh', age=5)],) skipped=81616 not_shrunk=18373 shrunk=12 min_size=2502 > test(prop_wrong_sort_by_age) Fail: at test 0 with arguments ([Person(name='cigepg', age=69), Person(name='lqkqmp', age=100), Person(name='nlgbbl', age=33), Person(name='xrnzfq', age=76), Person(name='ujjnfz', age=34), Person(name='xlcvxf', age=19)],). Shrinking: gave up at arguments ([Person(name='adcyxh', age=1), Person(name='hrclbb', age=0)],) skipped=81931 not_shrunk=18055 shrunk=15 min_size=2530 Each of these examples took a couple of seconds to generate. The second example is one of the best results I've seen, the first is more typical. A few notes:' It's alive! I've only tried a dozen or so times, but it always manages to shrink to a list of two s, with relatively low ages. Person This approach has more problems with shrinking lists of letters - it's not even clear that there is a pseudo-random seed that generates the letters "aaaaaa" and "aaaaaab" in succession, which is what would have to happen for those "minimal" names to show up. The stats in the last line of each example are typical. happens about 80% of the time, about 20% of the time. The number of values is almost insignificant! These numbers are similar if I try 1,000,000 times instead of 100,000 times. In fact, the number of successful shrinks remains around 10-15 even with 10x more tries. This indicates that for this case at least, it's unlikely to help much if we keep shrinking for longer. skipped not_shrunk shrunk Picking a winner This third and last approach to shrinking sounds almost too simple to work. But it does, and pretty well at that. We've removed all human cleverness from the shrinking process, and as a result, we gain a straightforward implementation, while keeping an integrated generation and shrinking API. Shrinking works just as well for mutable as for immutable values, and we can reproduce any run with just a single seed value for the pseudo-random number generator (typically a couple values). int64 Now, I've emphasized that this approach is by far the simplest to implement, and simplicity is its own reward. But does it also buy us anything? Yes, it does - besides having all the advantages of internal shrinking, it's also trivial to parallelize. All you need is a pseudo-random number generator that can generate several separate streams of random numbers. Secondly, if you have the code and the random seed, it's easy to pause and resume the shrinking process at a later time. After all that, you may be left with a question: what's the best-shrinking strategy? As usual, I would say "it depends", and rephrase the question: if you're writing a property-based testing library from scratch, which approach should you choose? The answer to that question is random shrinking, in my view. Faced with such decisions, it's tempting to find the "best" solution, by some measure. Instead, I think it's better to find the solution that gets people something useful, in the least amount of time, and with the highest optionality - meaning, while keeping as many options open as possible. In this case, random shrinking is certainly going to produce something useful (i.e. it will make counterexamples smaller). It is by far the simplest and thus fastest solution to implement. And finally, we can move to another approach without bothering our users much, which keeps our options open. Also, it's feasible to tack on any of the other shrinking methods random shrinking: for example, we can try to find the smallest example using random shrinking, and then try and make that value even smaller by any of the two other methods. after Finally Phew! When I started this series I didn't think I'd get to six posts on property-based testing, and the worst is that I'm not even done - for example, there are approaches that allow exhaustive generation of values as input to a property-based test. That said, I'm going to take a break (of currently unknown duration) from writing about property-based testing because I have a bunch of ideas for other topics I'm excited to write about. The complete code for this post can be found on - in particular and . GitHub example.py random_based.py Also published . here
Share Your Thoughts