Let the flame wars begin.
As with all good opinion pieces, I’ll be clear about the terms I’m using and what they mean.
The lines in the code that are executed when one of the automated tests run, expressed as a percentage of the entire codebase. For example, 65% code coverage would mean that the tests execute 65% of the code.
For a metric to be “good” in this context, it must have some kind of relationship with code that is higher quality. Quality is defined as easier to understand, change and maintain.
Consider the following example code, freakishly simple. It takes a value and gives the output of a map. The map is hard coded for simplicity.
public class MyService {
Map<String, String> myMap = new HashMap<>();
public MyService() {
myMap.put("1", "v1");
}
public String generateValue(String input) {
String mapValue = parseValue(input);
return myMap.get(mapValue);
}
private String parseValue(String input) {
return input.substring(1);
}
}
A simple unit test using spock might look something like this. This will give us the full 100% test coverage.
def "test my service does its thing properly"() {given: 'my service'MyService myService = new MyService()
when: 'I run my service with a value ending in 1'
String output = myService.generateValue('l1')
then: 'I expect something back'
output != null
}
So what do we know?
Uh, no. Let’s go and look at my code for a minute — what happens if I pass in null? What if someone messes with the internals of the class to change the output? Would our 100% test coverage tell us anything about that. Hell no. What’s going on here?
It’s measurements are reliable when you’re tracking how much of your code is ran by your tests but it tells you absolutely nothing of the value of those tests. Visualising it on its own is useless because it has no reliable, predictive relationship with the quality of the code or the tests. Quality gates that prevent releases for minor decreases in code coverage are invites for crazy shit — tests like this.
def "test my setter works"() {given: 'an instance of my domain object'MyDomainObject myDomainObject = new MyDomainObject()
when: 'I set some value on my domain object'
myDomainObject.setSomeValue('this is a value')
then: 'i expect that value to be set'
myDomainObject.getSomeValue() == 'this is a value'
}
Sure, you could test it, but come on. Do you really think your tests should be polluted with this kind of noise? You should be testing your logic, not every single setter in a domain object — that’s the very reason tools like Lombok exist. This is the problem with being overzealous with code coverage — developers will find a way to make up the numbers. You create a system they don’t believe in and they’ll game it, it’s what they do. This is the definition of waste.
As part of a chorus of metrics, it can help to paint a picture of what is going on in the code base. If it’s a flat 0%, it might give you some insight into the coding standards and the testing practices of a team, but it will not tell you whether those tests are any good or whether the whole testing strategy is broken. That’s really what you care about — high quality code that is thoroughly tested across multiple domains. Security, performance, resilience, etc.
Let’s take this for a bit of a walk. Here’s a little thought experiment — you have two options.
I bet your eyes are on the latter. That’s because this thought experiment helps to pull out the real value of a good set of tests in the codebase. They’re supposed to stop bad things from happening and make it easier to continue working on the code. The former does not do that, despite executing all the code. The latter does.
Nope, not on its own. The above example shows a simple but common case where code coverage is reporting full marks but the code looks like it was written by an eight year old. Code coverage will give you quantity when what you need is quality. Remember, ten good tests blow 100 garbage tests out of the water, any day of the week.
For more technical rambling, go ahead and follow me on twitter!