This is the fifth post in a series about property-based testing. This post describes "internal shrinking", a different implementation of shrinking that has some interesting advantages. Previous posts are: What is Property-based Testing? Essentials of Vintage QuickCheck Shrinking For Easier Debugging Unifying Random Generation and Shrinking The complete code for this post can be found on - in particular and . GitHub example.py internal_shrink.py In the last two posts we tried shrinking values by directly changing values: we assumed an oracle that comes up with smaller candidate values, and search those in the hope to find some that still make the test fail. We touched on one annoying problem with this approach: it doesn't deal well with mutable objects, because captures the randomly generated object as its root, and then lazily comes up with smaller candidates. If the object is mutated in between capture and finding a smaller candidate, things go sideways. We can try to work around this problem by asking users to clone mutable values, but we can do better - by changing our approach. CandidateTree To introduce the main idea behind the new approach, we'll need some context - another perspective on how random generation works. It's all about making choices Let's check where in our mini-PBT library we make random choices. All we need to do is look up where we use Python's . There's only one place where it's called: random module def int_between(low: int, high: int) -> Random[int]:
    return Random(lambda: random.randint(low, high)) (You may remember in the last post this was called to disambiguate with . However, thanks to the implementation in this post, we won't need the functions anymore, so I've omitted the prefix.) random_int_between tree_int_between tree_ random_ When we generate a complex value like a list of objects, ultimately chooses the length of the list, the letters of the persons' names and their age. The code of the generators makes that easy to see: Person int_between ages = int_between(0,100)
letters = map(chr, int_between(ord('a'), ord('z')))
simple_names = map("".join, list_of_length(6, letters))
persons = mapN(lambda a: Person(*a), (simple_names, ages))
lists_of_person = list_of(persons) Reading bottom-up, i.e. in execution order, random choices are: chooses a random length of the list. list_of generates the first . It first generates a by randomly choosing 6 letters. Each letter is chosen by choosing a random integer, and then mapping that to a letter. persons Person simple_name A random age is chosen. Repeat 2 and 3 for each element of the list. All told we make 7 choices per , and one choice to figure out how many objects we should generate. Person Person Exactly how many choices there are is not important - the point is that we can think of creating a list of objects as a sequence of random choices, which are represented by a sequence of integers generated by . The process of generating any random value is then a function that takes a sequence of choices as input, and creates a corresponding value as output. Person int_between We can use this insight to reproduce any randomly generated value. While generating a random value we record the sequence of choices the generator makes. To re-create the value we simply run the generator again, but we replay the same choices we made randomly before. You can think of the sequence of choices as the DNA of the generated value: choice sequence and generator uniquely determine the value. Hopefully you're with me so far. The last step is to ask: is it possible to manipulate the choice sequence to make the generated value smaller? If we could do that, we'd effectively have a way to shrink. Looking through the example, this seems possible. For example, if we make the choice of list length smaller, we'd generate shorter lists. If we make the choice of letter smaller, we'd generated smaller letters and names. We can also try more interesting changes to the choice sequence - for example we could generate a list of the same elements but in a different order by moving the sub-sequences that determine each person around. In principle, any change directly to the value can also be made by by changing the choice sequence instead. It turns out this approach works well in practice. This method of shrinking was by David R. MacIver for the Python property-based testing library . He calls it "internal shrinking" as opposed to "external shrinking" which we've been discussing in the last two posts. invented Hypothesis Internal/external is relative to the random generation process: with external shrinking, we make some random choices and generate a value, which is then made smaller by directly manipulating the value. Shrinking is external to random generation. With internal shrinking on the other hand, we manipulate the sequence of random choices to indirectly get smaller values. Shrinking is internal to the random generation process, and runs the exact same generator code, just with a different choice sequence. A simplified implementation Let's look at this idea more concretely. We're not changing the way values are initially randomly generated, but we do need to somehow remember the sequence of choices that lead to that value, as that sequence is what we'll be using for shrinking. We'll start with wrapping the call to so we can record the choice sequence. I've chosen to write a small class for this wrapper. There is more to it than I'm showing here, but we'll go piecemeal. random.randint class ChoiceSeq:
    def __init__(self) -> None:
        self.history: list[int] = []

    def randint(self, low: int, high: int) -> int:
        result = random.randint(low, high)
        self.history.append(result)
        return result makes a random choice, and keeps the results. Now we need to change the implementation of to use this class instead of directly. To do that we need to change to pass an instance of around so we can use it in : ChoiceSeq int_between random.randint Random ChoiceSeq int_between class Random(Generic[T]):
    def __init__(self, generator: Callable[[ChoiceSeq], T]):
        self._generator = generator

    def generate(self, choose: ChoiceSeq) -> T:
        return self._generator(choose)

def int_between(low: int, high: int) -> Random[int]:
    return Random(lambda choose: choose.randint(low, high)) Other functions are essentially unchanged, e.g. here is : map def map(func: Callable[[T], U], gen: Random[T]) -> Random[U]:
    return Random(lambda choose: func(gen.generate(choose))) Since doesn't have to make a new choice, it can just pass the through. The same is true for and . map ChoiceSeq mapN bind The types are also unchanged from before, as well as the implementation of : for_all Gen = Random[T]

Property = Gen[TestResult]

def for_all(
        gen: Gen[T], 
        property: Callable[[T], Union[Property,bool]]
    ) -> Property:
    ... # unchanged from last post However, is where it gets interesting: test def test(property: Property):
    def do_shrink(choices: ChoiceSeq) -> None:
        ...

    for test_number in range(100):
        choices = ChoiceSeq()
        result = property.generate(choices)
        if not result.is_success:
            print(
                f"Fail: at test {test_number} with " 
                f"arguments {result.arguments}."
            )
            do_shrink(choices)
            return
    print("Success: 100 tests passed.") We start out much the same as always: by generating up to 100 random values. However, thanks to the class, we've now got a record of all the random choices that were made during generation of each value. When all tests pass, we don't do anything with the choices, although you could imagine storing them in a database, as a reproducible log of the executed tests - at least provided the generators don't change too much! ChoiceSeq For this post, we only care about . Given a choice sequence, we now need to figure out how we're going to change it to make the resulting value smaller. do_shrink In a real implementation, this is a decent engineering challenge to do well. I'll just demonstrate the principle, by re-using our list shrinking function from a couple posts ago: def shrink_candidates(choices: ChoiceSeq) -> Iterable[ChoiceSeq]:
    # this is part of the list shrinker from vintage.py!
    for i,elem in enumerate(choices.history):
        for smaller_elem in shrink_int(elem):
            smaller_history = list(choices.history)
            smaller_history[i] = smaller_elem
            yield ChoiceSeq(smaller_history) The idea is that we take the history out of , which is a list of , and create a new list with one of the elements made smaller using . We'll need to change as well, which we'll do later. As far as goes, an initial attempt looks as follows: ChoiceSeq int shrink_int ChoiceSeq do_shrink def do_shrink(choices: ChoiceSeq) -> None:
    for smaller_choice in shrink_candidates(choices):
        result = property.generate(smaller_choice)
        if not result.is_success:
            print(
                "Shrinking: found smaller arguments " 
                f"{result.arguments}"
            )
            do_shrink(smaller_choice)
            break
    else:
        print(f"Shrinking: gave up") takes the initial failing , generates smaller values via and recursively shrinks if it finds a successful shrink. do_shrink ChoiceSeq shrink_candidates What is going on in ? In that call, we want the to replay the choices that were made previously, instead of randomly making new choices and appending them. We need to add replay behavior to : property.generate(smaller_choice) ChoiceSeq ChoiceSeq class ChoiceSeq:
    def __init__(self, history: Optional[list[int]] = None):
        if history is None:
            self._replaying: Optional[int] = None
            self.history: list[int] = []
        else:
            self._replaying = 0
            self.history = history


    def randint(self, low: int, high: int) -> int:
        if self._replaying is None:
            # recording
            result = random.randint(low, high)
            self.history.append(result)
            return result
        else:
            # replaying
            if self._replaying >= len(self.history):
                raise InvalidReplay()
            value = self.history[self._replaying]
            self._replaying += 1
            if value < low or value > high:
                raise InvalidReplay()
            return value

    def replay(self) -> None:
        self._replaying = 0

    def replayed_prefix(self) -> ChoiceSeq:
        if self._replaying is None:
            raise InvalidOperation()
        return ChoiceSeq(self.history[:self._replaying]) There's a lot going on here so bear with me. now has two modes: record or replay. When created without a argument, it is in recording mode - any call to generates a new random int and appends it to the list, as above. However, if is called explicitly, or a parameter is given to , then replays the history of choices without any random choices. In other words, creating a new and passing that to exactly re-creates the original value. The premise of internal shrinking is that we can generate smaller values by making the list of choices smaller. ChoiceSeq history randint replay history __init__ ChoiceSeq ChoiceSeq(choices.history) generate There is however a snag - by changing choices earlier in the sequence, we'll influence the behavior of the generator. For example, the generator may need more history than the initial randomly generated value that was recorded, or an integer in the history is out of the desired bounds. In those cases, we fail with - later on, we'll react to by declaring the resulting value as an unsuccessful shrink. Not that if this happens, we bail out quickly, in most cases before we even run the test. InvalidReplay InvalidReplay Vice versa, thanks to our devious manipulations, we'll hopefully use fewer choices - that indicates we've actually managed to make the value smaller. For example, if we make the length of a list smaller, then it's likely we won't use the the last part of the history. The method extracts the part of the choice history that is used so far, so we can keep shrinking just that part. replayed_prefix With all that, here is the final version of : do_shrink def do_shrink(choices: ChoiceSeq) -> None:
    for smaller_choice in shrink_candidates(choices):
        try:
            result = property.generate(smaller_choice)
        except InvalidReplay:
            continue
        if not result.is_success:
            print(
                "Shrinking: found smaller arguments" 
                f"{result.arguments}"
            )
            do_shrink(smaller_choice.replayed_prefix())
            break
    else:
        choices.replay()
        print(
            "Shrinking: gave up at arguments"
            f"{property.generate(choices).arguments}"
        ) There are three things to note here: As explained, we react to by considering the shrink as failed and carry on with the next candidate. InvalidReplay If a shrink is successful, we carry on with the part of the choice sequence that's actually used by calling . replayed_prefix To show the value of our last and most successful shrink, we use the method to reproduce the last value for display purposes. replay And that is the essence of the internal shrinking approach! Let's see it in action next. But first: coffee and biscuit time. Putting it to the test We don't need to change usage of our API at all - generators and properties are implemented the same as before. This is an amazing achievement - the original QuickCheck API from over 20 years ago needs no changes despite one big refactor and one re-implementation. There are not many APIs I've designed that I expect to pass the test of time that well. Let's try our new implementation on our usual example . If you need a refresher, see the . sort_by_age first post in this series > test(prop_sort_by_age)

Success: 100 tests passed. Hardly interesting. Shrinking is what we want to see! > test(prop_wrong_sort_by_age)

Fail: at test 1 with arguments ([Person(name='pjahih', age=0), Person(name='iapxij', age=55), Person(name='mvrnyq', age=57), Person(name='qyzwko', age=94), Person(name='iuzotw', age=4)],).
Shrinking: found smaller arguments ([Person(name='pjahih', age=0), Person(name='iapxij', age=55), Person(name='mvrnyq', age=57), Person(name='qyzwko', age=94)],)

Shrinking: found smaller arguments ([Person(name='pjahih', age=0), Person(name='iapxij', age=55), Person(name='mvrnyq', age=57)],)
Shrinking: found smaller arguments ([Person(name='pjahih', age=0), Person(name='iapxij', age=55)],)

Shrinking: found smaller arguments ([Person(name='ojahih', age=0), Person(name='iapxij', age=55)],)

Shrinking: found smaller arguments ([Person(name='njahih', age=0), Person(name='iapxij', age=55)],)
...
Shrinking: found smaller arguments ([Person(name='abaaaa', age=0), Person(name='aaaaaa', age=2)],)

Shrinking: found smaller arguments ([Person(name='abaaaa', age=0), Person(name='aaaaaa', age=1)],)

Shrinking: gave up at arguments ([Person(name='abaaaa', age=0), Person(name='aaaaaa', age=1)],) Remember, doesn't sort a list of objects by age, but by name and age. This particular shrink had about 160 successful shrinks out of 20,000 tries. It's definitely putting in the effort, and the result is not too shabby: it has found a minimal counterexample. In my cursory testing, it comes up with a two-person list with ages 0 and 1 most of the time (and names being some variant of and ). Sometimes it only finds a three-person list. wrong_prop_sort_by_age Person aaaaaa baaaaa This is a good result! The occasional misses don't indicate a shortcoming, just that my implementation is simple-minded. By manipulating the choice sequence in smarter ways, it's possible to make this shrink to a single canonical example every time. If that doesn't work for some case, then we need to add smarter strategies - Hypothesis has an internal mini-language to describe how to manipulate the choice sequence for various cases - apparently, shrinking floating point values in particular needs special care. Conclusion It definitely seems like we have a winner on our hands. Compared to the previous approach we have the same outcome: we don't need to burden the user with writing shrink candidate functions in addition to generator functions. Since shrinking is now internal to random generation, the question of how to integrate random generation and shrinking doesn't even come up - the two are integrated at the core. Because of this tighter integration, internal shrinking has additional advantages: . Each shrink re-creates the object from scratch from the replayed choice sequence, running the exact same generator code as during random generation. As a result, this approach is much better suited for imperative-minded languages like Python, C, C# and Java. Works great for mutable objects . So far it's been easy to reproduce randomly generated values, by storing the pseudo-random seed, but that doesn't work for any of the shrunk values. But now that we have the choice sequence, we can easily reproduce the shrunk values too. Can reproduce shrunk values . In the last post, we noticed that if a generator is written using , the integrated approach loses a lot of information because it needs to randomly re-generate values for the inner generator. For example, if you write a generator for a list by randomly choosing a length, and then use to generate the values, every time it shrinks the length, the values of the list are randomly re-generated. With internal shrinking, this is not the case - the choice sequence can be changed so the exact same elements are re-generated. You can see this in the example above: the first few shrinks chop elements from the end of the list, while keeping the first elements the same. The end result is that internal shrinking is better at preserving information from the initial value that made the test fail. Bind without remorse bind bind All seems well. So, is this the final word in shrinking for property-based testing? Not entirely! I know of one more approach, which again has all the advantages of internal shrinking, with a simpler implementation. But for that, you'll have to wait for the next post... Until next time! Also published . here

Oracle

Random All the Way Down; Property-based Testing (Part 6)

Shrinking Choices, Shrinking Values - Property-based Testing (Part 5)

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Essentials of Vintage QuickCheck - Property-Based Testing (Part 2)

10 Must Have Chrome Extensions for a Web Developer

The Noonification: Can Mankind Survive as an Interplanetary Species? (10/20/2022)

132 Stories To Learn About Performance

3 Golang Pitfalls Every Developer Needs to Know

🤩 24 Coolest VSCode Extensions That Will Rock Your World 🛠

Essentials of Vintage QuickCheck - Property-Based Testing (Part 2)

10 Must Have Chrome Extensions for a Web Developer

The Noonification: Can Mankind Survive as an Interplanetary Species? (10/20/2022)

132 Stories To Learn About Performance

3 Golang Pitfalls Every Developer Needs to Know

🤩 24 Coolest VSCode Extensions That Will Rock Your World 🛠

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps