In this post, we will discuss a possible problem in the code that can be identified from the history of the source code. We call this “smell” cumulative code. The term cumulative code is used in this post to describe the code that changes mostly by additions and rarely by modifications of the existing code. Every change is essentially new code added in the same module when refactoring would have been a better option. New features are introduced by adding a new piece of code that changes or extends the behavior of the existing code.
Note that unlike the other code smells, cumulative code is not identified directly from the code, but instead it is identified from the history of the source code.
As we said, cumulative code can be easily detected by just looking at the history of the code in the version control system. If the history of a file follows this pattern then the code can be considered as cumulative (or at least there is a smell of cumulative code):
Line, developer, commit date101 dev1 2011/3/1102 dev1 2011/3/1...133 dev1 2011/3/1134 dev2 2013/6/7135 dev2 2013/6/7...146 dev2 2013/6/7147 devN 2017/12/16148 devN 2017/12/16...149 devN 2017/12/16
In this case, new commits are just additions, no (or relatively few) deletions/modifications. Usually, as new features come in, the code evolves. Evolution is not just adding new code to the existing one. Design and architectural decisions change as we get better understanding of how our system works and these decisions may change the structure of our code. The previous example shows that for the last six years, probably, no such decision was made, and code kept growing. When a system grows, the code is modified so it can adapt the new code. A famous quote from Ken Thompson says: “One of my most productive days was throwing away 1000 lines of code.”
Note that the pattern mentioned is an indication or a slight smell that something is maybe wrong with our code. Obviously, there is code that follows patterns like that, that is flexible and clean. For example, the code of an interface that new methods are added can follow this pattern and not be cumulative.
To answer this question, we have to think of what cumulative code actual indicates. Cumulative code is an indication of a violation of the Open/Closed principle. The Open/Closed principle states that “software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification” so definitely this code is not closed for modification as new functionality has been added along with the existing functionality, making our software less flexible. This is usually, a bunch of if statements that cover new cases or modify the old ones. Such additions could be ugly hacks and shortcuts that developers do to finish the task quickly. Cumulative code tends to make methods/functions and classes large, and code, in most cases, is duplicated in several places which degrades the cleanness of our code. Thus, the code tends to be less maintainable as time goes by and the implementation of a new feature takes longer.
If we take a step back, we can see that these problems are just the effects of a more general problem. Cumulative code means that there is a communication problem within the team that maintains the code. As we saw in the previous section, every developer added her feature at a different point in time. This is a clear indication that team members did not communicate how the code could be restructured to adopt the new changes, what patterns could be useful, whether the new changes modify the nature of the method/function/class/component/module and should be separate (in order to conform with the Single Responsibility Principle), or even worse, whether the new functionality already exists somewhere else in the system.
Due to the lack of communication, the developer who implemented a new feature did not consider refactoring the code because she was afraid of breaking things that already worked. This is also known as the fear of change! This is not an unusual case, especially if the codebase is rather old and the test suite might not be that reliable (more on this topic in this post) or not even exist at all. Thus, the developers do not have a feedback if their newly introduced changes break existing functionality.
Cumulative code can also be the result of a team that is constantly working in “firefighting mode.” In this case, the developers try to find the shortest path to the solution to finish the task, and they end up using hacks and shortcuts to add the functionality required to finish their task. Of course, the cost of this solution will have to be paid when these hacks will need to be debugged or extended in the future. It is ok for a team to work in firefighting mode for a limited amount of time (due to emergency) but when this mode becomes the default, this means that the codebase is reaching its end of life. Firefighter-developers can be seen as heroes in the short term as they solve urgent problems, but if the technical debt that is produced is not paid, their practices put the codebase in danger.
The communication problem can lead to code with huge methods and classes that are very difficult to comprehend, massive duplications and all the problems that obscure code can have. No team member has the same understanding of how the system works and how it should be extended. Thus, it is not uncommon for individuals to reimplement existing functionality that may have slight or no differences instead of reusing what is already there. The common understanding of the system is highly correlated to the architecture of the system as Martin Fowler states in “Who needs an Architect?”
Communication can be crucial to the success of a team (not only a team of developers but a team in general). In a previous post, we discussed the importance of the communication within a team and how it can affect the team’s performance, and thus its success. A team should highly invest in the communication structures.
Communication is not always physical/verbal or even synchronous. The code is also the medium that developers communicate their solutions to each other, as we discussed in this post. When the team members are not collocated, the communication through the code is always more challenging and requires some extra effort. Clean code can improve the communication!
Apart from clean code, ideally, there should be tests that can validate the behavior of the code, in order to eliminate (or reduce) the fear of change. Tests are the best way of communication (more on testing in this post). A developer that has written a piece of code year ago might have forgotten some corner cases, but tests don’t. A documentation page can be outdated, but tests cannot. Tests do not lie!
Version control systems keep track of what has ever happened in our codebase. By just looking at the commits we can extract some very useful piece of information about how the way the team works (no, LoC is NOT metric for developer’s productivity!). We can also, see how our system has evolved through the time as new features came in or as the capacity requirements changed.
A legacy system has a lot of such history which is always exciting!