How Good Are We At Writing Tests?

If you love testing as much as I do and work on a lot of codebases, then you might have observations like mine.

We don’t struggle as much to write good/clean/readable production code as we struggle to write good/clean/readable test code. These days, I seem to be much more interested in the latter, than the former. This post is about some things I see a lot that cause problems. It’s predominantly about tests I see written in Java, but I’m sure it applies to other languages.

What makes a good test?

Just like production code, there are (unrelated) qualities that make good tests.

Readability - it’s of paramount importance that I can glance at a test, and know what it is, how it’s operating, and understand what business cases it covers.

Flexibility - it’s of paramount importance that if I need to add a new nuance to the existing tests, or change the API, or change the structure of the tests, that the code that supports the tests is malleable and I don’t have to re-implement scaffolding that is already in place too much.

Correctness - it’s of paramount importance that there is a direct, intentional, and accurate exercising of the production code that the test is targeting and that the behavioural conditions are accurately asserted.

Let’s look at an example (it’s contrived, they all are)

Imagine a simple Java class with a constructor as below.

public SaleEvent(Instant transactionTime, Money transactionValue) {
    this.transactionTime = transactionTime;
    this.transactionValue = transactionValue;
}

Let’s imagine that this class is used in a Reporting API where Sale Events are submitted to a service in real-time, and you can ask that service how much Money has been transacted, in a period of time.

Lets imagine a Service interface like this:

interface EventAggregatorService {

  void submitEvents(String saleChannel, List<SaleEvent> saleEvents);

  Money getTransactionTotal(
    String saleChannel, 
    LocalDate startInclusive, 
    LocalDate endInclusive
  );

}

I often see tests of this kind of style.

@Test
public void testEventInsideRangeIsFound() {
    eventAggregatorService.submitEvents(SALE_CHANNEL_1, ONE_SALE_EVENT_DAY1);

    assertThat(
      eventAggregatorService.getTransactionTotal(SALE_CHANNEL_1, DAY_1, DAY_2)
    ).isEqualTo(ONE_SALE_EVENT_TOTAL)
}

@Test
public void testMultipleEventsInsideRangeIsFound() {
    eventAggregatorService.submitEvents(SALE_CHANNEL_1, TWO_SALE_EVENTS_DAY1);

    assertThat(
      eventAggregatorService.getTransactionTotal(SALE_CHANNEL_1, DAY_1, DAY_2)
    ).isEqualTo(TWO_SALE_EVENT_TOTAL)
}

@Test
public void testEventOutsideRangeIsNotFound() {
    eventAggregatorService.submitEvents(SALE_CHANNEL_1, ONE_SALE_EVENT_DAY3);

    assertThat(
      eventAggregatorService.getTransactionTotal(SALE_CHANNEL_1, DAY_1, DAY_2)
    ).isEqualTo(Money.ZERO)
}

Let’s evaluate whether these are good tests according to the criteria (Readability/Flexibility/Correctness).

Are these tests readable?

In a lot of ways, yes these tests are readable, or at least more readable than they could be. They have method names that represent what they are testing, they (almost) follow the ‘given, when, then’ style that we as BDD enthusiasts love to see, and they have named parameters that are in some way logically readable and consistent with what is happening. These tests are not a high standard of readability though. This is because some of the construction of the SaleEvents is happening in a context outside of the test without convention. TWO_SALE_EVENTS_DAY1 for example, is a list that is constructed elsewhere, and we need to check it to look at it to know which dates are being used, we have a clue that it’s DAY_1 but we don’t know what day that is and whether it’s the same day as defined in ONE_SALE_EVENT_DAY1. These problems are a readability smell.

Are these tests Correct?

If we leave the production implementation up to the imagination, these tests could be pretty correct in many implementations. But there are some obvious gaps. This post isn’t meant to be focused on testing boundary conditions like a SaleEvent precisely on the stroke of midnight and checking its inclusively present in both sides of the date range searches, or suggest that an additional test, for each imaginable boundary condition, be added. Exhaustively adding tests to cover all boundary conditions, may not actually help to make the tests more readable overall, if they look as above. If you consider adding more and more member variables to define more cases, it starts to look quite unscalable.

Are these tests Flexible?

No. These tests are not flexible, at all. The problem is subtle, but it’s fairly easily exposed. The problem is the binding that exists between the setup of the input data and the literal values used in the asserts. If we do not have control of the construction of the input objects in the test context itself, then we equally don’t have any control of the outputs. A flexible test, in its ideal form, has the ability to redefine something about the input data, without affecting other tests, or at least in a way that is easily compatible with other tests. Static, or member fields without convention do not lend themselves well to this quality.

What’s better?

Let’s consider what it looks like to give control to the testEventInsideRangeIsFound() test, of its input data, and output data. This can vary and be a lot more, or less verbose, depending on java version.

@Test
public void testEventInsideRangeIsFound() {

    String SALE_CHANNEL_1 = "SALE_CHANNEL_1";
    LocalDateTime DAY_1 = LocalDateTime.of(2022, 4, 2, 0, 0);
    LocalDateTime DAY_2 = LocalDateTime.of(2022, 4, 3, 0, 0);
    BigDecimal ONE_SALE_EVENT_TOTAL = BigDecimal.valueOf(18.99);

    List<SaleEvent> ONE_SALE_EVENT_DAY1 = List.of(
      new SaleEvent(
        DAY_1, 
        new Money(Currency.getInstance("USD"), BigDecimal.valueOf(18.99))
    );

    eventAggregatorService.submitEvents(SALE_CHANNEL_1, ONE_SALE_EVENT_DAY1);

    assertThat(
      eventAggregatorService.getTransactionTotal(SALE_CHANNEL_1, DAY_1, DAY_2)
    ).isEqualTo(ONE_SALE_EVENT_TOTAL)
}

This is much better from the flexibility perspective, because everything that the test is dependent on, is visible and changeable from a single context, the test.

It’s not much better though….

If you consider what it takes to do this for all tests eg. testMultipleEventsInsideRangeIsFound()

@Test
public void testMultipleEventsInsideRangeIsFound() {

    String SALE_CHANNEL_1 = "SALE_CHANNEL_1";
    LocalDateTime DAY_1 = LocalDateTime.of(2022, 4, 2, 0, 0);
    LocalDateTime DAY_2 = LocalDateTime.of(2022, 4, 3, 0, 0);
    BigDecimal TWO_SALE_EVENT_TOTAL = BigDecimal.valueOf(25.98);

    List<SaleEvent> TWO_SALE_EVENTS_DAY1 = List.of(
      new SaleEvent(
        DAY_1, 
        new Money(Currency.getInstance("USD"), BigDecimal.valueOf(18.99)
      ),
      new SaleEvent(
        DAY_1, 
        new Money(Currency.getInstance("USD"), BigDecimal.valueOf(6.99)
      )
    );

    eventAggregatorService.submitEvents(SALE_CHANNEL_1, TWO_SALE_EVENTS_DAY1);

    assertThat(
      eventAggregatorService.getTransactionTotal(SALE_CHANNEL_1, DAY_1, DAY_2)
    ).isEqualTo(TWO_SALE_EVENT_TOTAL)
}

Fine, now we have full control of the objects in the test context. This is helpful and looks manageable today, but actually, as a pattern is quite unmanageable. What if it becomes important, for whatever reason, to write a new test that has 100 SaleEvents on Day one, and 2000 on day 2? what if it becomes important to test sale events, for instance evenly distributed across 1 hour? maybe there is a test case that requires them to arrive all at the same time for example, or maybe you need to exclude events according to a ruleset and it’s important to know which events fall inside that ruleset and which don’t? Those conditional parameters would also need to be readable in the test code, if we think through how this starts to look, it eventually again looks quite unscalable, and it’s too verbose.

What are our options?

We can start to think about writing helper methods that can deal with object construction and dealing with setting up input data for our tests. Our objective is to be as generic as possible and to give as much control to the caller as we can.

Consider some helper methods to solve this for the above example:

private List<SaleEvent> weeklySaleEvents(List<SaleEvent>... weeklySaleEvents) {
        return Arrays.stream(weeklySaleEvents)
                .flatMap(List::stream)
                .collect(Collectors.toList());
    }

    private List<SaleEvent> dailyEvents(LocalDateTime start, Money value) {
        return IntStream.range(0,24)
                .mapToObj(
                    hourOfDay ->
                        new SaleEvent(
                            start.plusHours(hourOfDay).toInstant(ZoneOffset.UTC),
                            value
                        )
                )
                .collect(Collectors.toList());
    }

These helpers allow for usages that are defined in each test context, and are dynamic, for example

List<SaleEvent> saleEvents = weeklyEvents(
  dailyEvents(MONDAY_START, MONEY_VALUE),
  dailyEvents(TUESDAY_START, MONEY_VALUE),
  dailyEvents(WEDNESDAY_START, MONEY_VALUE),
  dailyEvents(THURSDAY_START, MONEY_VALUE),
  dailyEvents(FRIDAY_START, MONEY_VALUE),
  dailyEvents(SATURDAY_START, MONEY_VALUE),
  dailyEvents(SUNDAY_START, MONEY_VALUE)
);

But these helper methods are making assumptions, which make them inflexible. For example, there is a rule in the helper method that defines that there are 24 events per day, incrementing in time per hour and that there is only one money value possible for each event, and this cant be configured per event. This is a flexibility smell because as soon as a new requirement invalidates one of these assumptions a maintainer will need to redefine, or re-implement this scaffolding to introduce tests, or duplicate a helper method adding more scaffolding that needs to be understood and maintained.

We can try to solve this, by writing even more fluent builders/helpers

But of course, in the same way we run into issues in predicting the direction of the codebase for production code, we equally encounter this in test code. The best that we can really do is write, or generate, fluent builders for all of our test objects that allow us to build objects in all of the possible directions, fluently. That way we can use fluid syntax in each test context and that syntax can be adjusted/redefined for each new business scenario we need to test. Regularly though, this time investment is not practical, and in a lot of cases, the end result can not be as readable as we would ideally like, particularly where we have to manage collection types (nested Maps, Lists etc.) or where we have wildly different test cases that need to co-exist and we can’t guess which parts of the object graph need fluid construction/nesting.

Generated fluent builders, for example, are no substitute for a varargs parameter in the right place, or a well-named method in the right place that hides some meaningless part of the object construction. Building this kind of scaffolding takes a lot of time and skill though and we can’t generate builders automatically on all types of project, either. And especially on older Java versions this strategy can look quite verbose unless we do a lot of handcrafting. There isn’t always time to justify this.

In general, I feel that this is a problem that needs more attention and discussion because if we are to be as Agile as we need to be, to deliver increments as fast as possible, we need to be writing tests all of the time that are Readable/Flexible and Correct, by convention, and we need to be able to do that as quickly as possible.