paint-brush
Comprehensive Coverage: The AI Solution To Unit Testingby@mlodge
465 reads
465 reads

Comprehensive Coverage: The AI Solution To Unit Testing

by Mathew LodgeFebruary 13th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Unit test coverage – the percentage of source code for which unit tests have been written - is a measure of how much of an application’s source code is 'touched' by tests. While 100% coverage is often unattainable (because not all code may be suitable for testing), 70% to 80% coverage are considered a robust testing suite. Automated test generation tools can be especially beneficial for improving coverage of legacy code.
featured image - Comprehensive Coverage: 
The AI Solution To Unit Testing
Mathew Lodge HackerNoon profile picture

How do you increase unit test coverage of your source code? Automate with AI!


A unit test checks the smallest, individual parts of source code in isolation from the rest of the system to ensure that each unit of code functions as expected. It is aimed at verifying the correctness of the code, identifying and fixing bugs early in the development cycle, and improving the overall quality of the software.


Unit test coverage – the percentage of source code for which unit tests have been written - is a measure of how much of an application’s source code is 'touched' by tests. While 100% coverage is often unattainable (because not all code may be suitable for testing), 70% to 80% coverage is considered a robust testing suite.


In cases where test coverage is insufficient, bugs can slip through the cracks, only to be discovered by QA teams or, worse, by the end users. Observability tools can help troubleshoot and rectify these issues, but such tools serve more as a post-mortem analysis rather than a preventive measure and don’t replace the role of adequate test coverage.


In regulated industries, especially with safety-critical systems, regulators require evidence that developers have taken steps to mitigate risks in their software. Passing tests can serve as this evidence, showcasing that the software behaves as expected.


As a result, code test coverage is often mandated by management to maintain compliance with recommended safety standards and ensure that incorrectly implemented code does not cause serious harm or loss of life. But even in less regulated industries, management often mandates coverage levels to ensure increasingly complex software.


These targets are often perceived as arbitrary by developers and don’t always align with the complexities or specific requirements of the application in development. Writing tests manually is a daunting task, especially when faced with a complex codebase. It is not only time-consuming – developers say they spend as much as a third of their time writing unit tests - but also susceptible to human error, particularly when engineers are tasked with writing thousands of tests to meet a coverage threshold.


When Goldman Sachs turned to Diffblue Cover to write tests for its legacy code, it doubled coverage for one repository in less than 24 hours.


As a result, mandated test coverage often leads to the creation of simplistic tests that are certain to pass but do not catch bugs.


So, mandating high code coverage doesn't inherently translate into high quality, which, in this context, is determined by the test's ability to uncover bugs and to adapt to behavioral changes in the code.


Assertions are the backbone of unit testing. They are the checkpoints that validate the expected outcome against the actual result of the test. Without assertions, a test might execute all the code but won’t be able to identify any discrepancies or errors. In other words, the test won’t be able to tell you whether the code is doing what it’s supposed to do.


So, while high code coverage is important, it’s equally important to have meaningful assertions in your tests to ensure that your code behaves as expected.


Crafting an assertion for every single change in a program's state is impractical. It can lead to overly complex tests, especially when dealing with objects containing hundreds or thousands of fields. The goal is to create a minimal set of clear, effective assertions to avoid missing critical behavioral changes.


A test suite with fewer tests but high-quality assertions is preferable over a multitude of low-quality tests that achieve superficial coverage.


Assertions prove their worth when the code changes in the future. If an engineer alters the code and behavior changes, tests with assertions will fail, signaling the deviation.


To alleviate the burden of manual test writing, some developers turn to automation tools. Automated test generation tools can be especially beneficial for improving coverage of legacy code for which tests were never written. However, increasing coverage for such codebases can be a monumental effort, distracting developers from their primary tasks.


For Java code, Diffblue Cover generates unit tests automatically, not only accelerating the development process but also ensuring high-quality tests that cover all the source code’s necessary branches and boundary values.  It also automatically writes all the assertions needed for the tests.


Diffblue generates coverage reports that provide graphics, like a pie chart, showing how much of the application the unit tests cover. These reports help developers understand where coverage is low and identify complex code that is difficult to test. They can identify and refactor parts of the code that are untestable, which in turn can increase coverage upon retesting.


Diffblue uses reinforcement learning, a branch of artificial intelligence in which computer programs use trial and error at lightning speed to find the best solution to a problem.


This is far more precise than transformer-based generative AI systems, which can only suggest solutions based on probabilities in their training data. This requires expertise to pick the best one, so it doesn’t necessarily save developers time – it just shifts their focus from one problem (writing tests) to another (choosing and checking tests).


Tools like Copilot can suggest initial tests that align with the existing code, which can be a starting point for developers. However, using such suggestions is a meticulous task requiring verification to ensure correctness. Therefore, while it may change the nature of the task, it doesn't necessarily save time.


Writing a unit test can take anywhere from 10 to 20 minutes, a process often seen as monotonous and tedious. This tedium can lead to mistakes; when people are bored, they're more prone to error as engagement levels drop, allowing for inaccuracies to slip through.


Reinforcement learning systems like Diffblue, on the other hand, don’t require any human intervention. They generate quality tests every time. But by using tools like Diffblue Cover, developers can achieve maximum test coverage with quality tests and relevant assertions.


When Goldman Sachs turned to Diffblue Cover to write tests for its legacy code, it doubled coverage for one repository in less than 24 hours, saving an estimated year of developer time. Diffblue Cover also identified critical edge cases, enhancing application stability and security.


In another case, a global technology manufacturer significantly reduced application failures and sped up modernization efforts using Diffblue Cover. Recognizing the need for more effective unit testing, they implemented a new coverage gate but struggled even though more than 100 developers were already dedicating significant time to unit testing. Using Diffblue Cover rapidly increased coverage to 70% coverage, lifting the burden of test writing from those developers while reducing downtime and accelerating modernization.


So, where to start? Prioritizing testing for critical code segments can help allocate resources more effectively. Additionally, collaboration between development and testing teams in a unified approach to testing can help achieve comprehensive test coverage.


Beyond that, consider automation tools for the programming language your system is written in.