I never worked on a project with 100% code coverage. I’ve worked on several projects with a high coverage and a test suite that gave confidence to the team. But even on those projects sometimes we’d have a nasty bug in production that could have been easily detected with a simple test.
This is a tricky subject and usually developers don’t care much about this or think that it’s not worth the cost, or even that is not that useful. I’ve gathered here several arguments in favor for a full coverage.
It’s definitely not a statement that the program is without bugs. It’s simply a data point that states that all lines are covered. If we have some part of the program that is not covered we do know that if something changes we’ll need to perform some manual validation. And we won’t be confident that our changes don’t break anything.
On the other hand we can also have low confidence on covered code. We could have tons of tests and be afraid to push something into production. The truth is that our tests just cover a predefined set of combinations and use cases. This can be mitigated with generative testing, but ultimately there’s always the chance that we missed something.
We have to gain that we know that a new patch won’t break scenarios that we assume that are already correct. Well, this argument is more about having tests than having full coverage. But with a full coverage you know that the probability of damaging something that is already working is smaller.
And do consider that sometimes code is broken without the interference of a human. Imagine a complex merge that changed some logic somehow. Having that code covered yields a better assurance that we’re ok.
When I’m attacking bug issues and I have a bug with an exception, I always try to change some line on the code, run the test suite, and check if a test complains. When it does, it’s great. Because if I already have a test for that line I know several things:
If I don’t have already a test near that problem, I may be into a challenge to build the specific context that generated the problem. This is usual on code that uses third party services/components or complex code. It’s very hard to create the context to test, and we neglect on that.
But these are exactly the scenarios where we should really need to create a test. We all have touched that code that uses PayPal to process payments and it’s very hard to test and uses callbacks and etc. And when there’s a bug? We’re in trouble.
If something is hard to test, it will be hard to maintain.
And hard to maintain translates to poor productivity and confidence. From my experience projects with good coverage but not 100% miss the following:
No. Do consider the broken windows theory. If 99% is good, won’t 98% also be good? Specially when we just got an integration that is complex to test? How do we know if that missing 1% is harmless? Going back on the previous example, the missing coverage will be:
From my experience code that is not that important is actually quick to test. And having that tested just because do allow us to have the maximum threshold. And stay there.
But if we have everything covered that means that every time we need to change code we also need to change tests — if you put it this way… yes, it’s cumbersome.
But I’m going to put it another way: start by changing the tests. Refactor the tests for the new reality, see that red mark of failed tests and only then go to the code and work for the green. I do believe this is the best practice. But I also reckon that sometimes it’s just changing a line… and that line may impact several tests. However I do believe the pros outweigh the cons.
With a full coverage we’ll easily detect code that is not being used anymore, because that code will be marked as not covered. Let’s not be looking at some method and be considering implications on refactoring something that is not used.
There are IDEs that detect unreachable code and warn developers. But even with those tools I’ve seen unnecessary code being on the repository. It can be because of a developer distraction or an automatic merge.
Having more tests can mean a slower test suite and a longer feedback loop to know if everything is ok. I do believe that we should track the overall speed of the test suites over time and should favor pure unit tests. Spliting the logic in pure functions and functions with side effects does help here.
If we have these stats, we can have a picture of how much time is added to the test suite, per developer, per year and extrapolate that. We can see if the way we’re working now will make the test suite minutes or hours longer in a year or two. And if it’s alarming, we can start right away improving the way we work.
I’ve gathered several code examples of scenarios that might make us push away full coverage. (I’m sorry for the outside reference, wasn’t easy to format all the examples here on medium).
There are many advantages on having 100% cove coverage, but that can mean more work and be heavy on the development process. I do believe that a patch that has tests that fully cover the changes have more quality and we should aim for that. It may be hard, we may need to learn new ways to work, and we may need to question our beliefs.
But I’m sure that it will make us better software craftsmen.
Originally published at engineering-management.space on February 3, 2018.