Most companies use the code coverage metric to indicate how well their code has been tested. However, as many have pointed out, code coverage is a very poor metric for indicating whether or not your code has been tested well.
Indeed, observe the following test:
int AddOne(int in)
{
return in - 1;
}
TEST(MyTest, TestsAbsolutelyNothing)
{
const auto val = AddOne(1);
}
This example is a bit extreme, but in a large legacy code base, situations like this can easily occur. This test will have 100% code coverage of AddOne, but due to the missing expectation, it will not test anything.
There is no direct correlation between code coverage and the amount of code that has been tested.
This doesn’t mean that code coverage is a useless metric. Codes with high code coverage tend to be better tested than codes with very low code coverage metrics. However, relying on high code coverage can lull us into a false sense of security.
The problem is that code coverage is a poor term. The issue stems from the term coverage. When you have in your head that something is “covered”, you immediately think that it has been taken care of (testing wise). With code coverage, as we have seen, this is not the truth.
Let’s look at a different example:
int Add(int in, int value)
{
return in + value;
}
int Subtract(int in, int value)
{
return in - 1;
}
TEST(MyTest, TestAddAndSubtract)
{
EXPECT_EQ(2, Add(1, 1));
EXPECT_EQ(0, Subtract(1, 1));
}
In this test we have added expectations, and still have 100% code coverage. This will suggest that the code is tested, but there is an obvious error in the code still.
Instead, we should invert the metric, and use another term. In lack of a better name, I will use the term “Legacy Code”. This term, I have borrowed from Dean Martins excellent book “Working effectve with legacy code”, wherein he describes legacy code as code with no tests. “Legacy code%” is then equal to “100% - code coverage%”. And this metric accurately tells us how many percent of the code has not been tested at all.
So, where previously your CI tool would report 80% code coverage, it should now report 20% legacy code.
Legacy code% = 100% - code coverage%
Now, having 0% legacy code still does not mean that the code has been tested well. It could still be tested poorly, as in the second example. So, the metric is still fairly flawed. In the end, tests can never prove the absence of errors.
However, I believe that getting rid of the word “coverage” will improve people's understanding of what this metric actually means.
Also published here.