Photo by Aaron Burden on Unsplash
We’ve all been there.
Taking a look at a repo that we can’t quite make heads or tails of. Or looking at a function, reading through it, getting to the end, and then realising we’d need to read it five or six more times before we can grasp what it does.
There are a handful of different ways to measure this area of code quality and to judge whether code is easily understandable, but today let’s focus, specifically on two measures of complexity – Cyclomatic Complexity & its younger sibling, Cognitive Complexity.
The idea of Cyclomatic Complexity was first proposed in 1976 by Thomas McCabe as a way to figure out what software would be difficult to test or maintain.
Cyclomatic Complexity can give a good proxy for how hard it is to understand code (more on this in a bit) and it can also help you determine the minimum number of tests you need to have complete test coverage over the section of code you’re looking at.
To measure the Cyclomatic Complexity you can look at the control flow graph for a section of code to determine the number of independent paths for that source code.
A flow graph is a map of what is happening in the source code by linking together the flow paths between different processing tasks made up of Nodes (the processing tasks) and Edges (the paths that connect processing tasks).
Here are a couple of sample Flow Graphs for some simple functions:
def is_even(number):
if number % 2 == 0:
number_type = "Even"
else:
number_type = "Odd"
print(number_type)
def combining_lists():
list_1 = [1, "a"]
list_2 = [1, 2, 3]
list_3 = ["c", "d"]
list_of_lists = [list_1, list_2, list_3]
joined_lists = []
for i in len(list_of_lists):
joined_lists.append(list_of_lists[i])
print(joined_lists)
Once you have a Flow Graph you can quickly calculate the complexity for any piece of code by:
V(G) = E - N + 2
Where E are the number of Edges, N is the number of Nodes, and V(G) is the total complexity. You can also calculate it by:
V(G) = P + 1
Where P is the number of Predicate Nodes (aka Nodes that contain conditions).
Let’s take a look at how complex those functions we mapped out were earlier.
In Example 1 we have 6 nodes & 6 edges, so our total Cyclomatic Complexity is 2 (V(G) = 6 - 6 + 2)
In Example 2 we again have 6 nodes & 6 edges, so we have a total complexity of 2. Both the if
statement in Example 1 & the for
loop in Example 2 drive up this complexity.
In general, with any of the software complexity metrics, a smaller number is better, and there isn’t an upper limit to what a piece of software’s cyclomatic complexity can be. But what’s a “good” score for cognitive complexity? And how high of a complexity score is too high? Thankfully, NIST
< 10 - a simple program with little risk 11 – 20 - more complex programs with moderate risk 21 – 50 - high complexity with high risk > 50 - an untestable program with very high risk.
The NIST ratings talk a lot about testability – more than they talk about complexity or understandability.
That’s because the number of tests required for full test coverage is equivalent to the cyclomatic complexity of a program. So complex programs can become nearly impossible to fully test.
Cyclomatic Complexity can be useful to determine the testability and maintainability of code, but it’s not always the best tool to actually determine how understandable code is for a human developer. There are three primary reasons for this:
When calculating Cyclomatic Complexity every method has a minimum complexity of 1. We can easily wind up with two classes with equally high complexity scores where one of them is a large but easy to maintain class with a number of simple methods and the other is an extremely complex, but shorter, class. Above the class level, the complexity score is closely tied to the number of lines of code, so longer programs get penalized just for being longer.
Because it was developed in 1976 there are modern structures (eg. lambdas and things like try/catch) that Cyclomatic Complexity can’t accurately account for.
Methods with similar Cyclomatic Complexity scores aren’t tied to how hard they are to understand or maintain, so some wind up being underscored and others overscored.
This means that looking at two pieces of code with identical Cyclomatic Complexities doesn’t mean that we’re dealing with two pieces of code that are equally easy to understand or maintain. For instance:
# Example 1
def sumOfPrimes(max):
total = 0 # + 1
for i in range(1, max): # + 1
for j in range(2, j): # + 1
if i % j == 0: # + 1
break
j = j + 1
total = total + 1
i = i + 1
return total # Total Cyclomatic Complexity = 4
# Example 2
def getWords(number): # + 1
if number == 1: # + 1
return "one"
elif number == 2: # + 1
return "a couple"
elif number == 3: # + 1
return "a few"
else:
return "lots" # Total Cyclomatic Complexity = 4
Both of these cases have Cyclomatic Complexity scores of 4, but the example below is clearly cleaner. We need a metric that can easily distinguish cases like this and that gives us an easy idea of whether code is easy or hard to understand.
Enter Cognitive Complexity.
Reducing or at least not penalizing for features that allow you to simplify/shorthand code.
Increase the complexity score for breaks in the typical linear flow of the code – eg loop structures, conditionals, or ternary operators.
Increase the complexity further for nested breaks in the linear flow.
Intuitively all of these approaches make sense. Conditionals and loops are naturally adding a level of complexity – and nesting these structures adds in even more complexity.
But let’s see how those last examples we were looking at fair in terms of Cognitive Complexity:
# Example 1
def sumOfPrimes(max):
total = 0
for i in range(1, max): # + 1
for j in range(2, j): # + 2
if i % j == 0: # + 3
break
j = j + 1
total = total + 1
i = i + 1
return total # Total Cognitive Complexity = 6
# Example 2
def getWords(number): # + 1
if number == 1: # + 1
return "one"
elif number == 2: # + 1
return "a couple"
elif number == 3: # + 1
return "a few"
else:
return "lots" # Total Cognitive Complexity = 4
Cognitive complexity doesn’t treat these cases equally anymore – instead, it says the lower example is significantly less complex, which makes sense intuitively.
It seems pretty clear that cognitive complexity can be used as a strong measurement of the complexity and readability of code and therefore should have a clear, inverse relationship with development velocity.
So how can we approach improving the cognitive complexity of code to simplify it?
Let’s take a look at our good old friend, the
class GildedRose:
def update_quality(self):
for item in self.items: # + 1
if (
item.name != "Aged Brie"
and item.name != "Backstage passes to a TAFKAL80ETC concert"
): # + 2
if item.quality > 0: # + 3
if item.name != "Sulfuras, Hand of Ragnaros": # + 4
item.quality = item.quality - 1
else:
if item.quality < 50: # + 3
item.quality = item.quality + 1
if item.name == "Backstage passes to a TAFKAL80ETC concert": # + 4
if item.sell_in < 11: # + 5
if item.quality < 50: # + 6
item.quality = item.quality + 1
if item.sell_in < 6: # + 5
if item.quality < 50: # + 6
item.quality = item.quality + 1
if item.name != "Sulfuras, Hand of Ragnaros": # + 2
item.sell_in = item.sell_in - 1
if item.sell_in < 0: # + 2
if item.name != "Aged Brie": # + 3
if item.name != "Backstage passes to a TAFKAL80ETC concert": # + 4
if item.quality > 0: # + 5
if item.name != "Sulfuras, Hand of Ragnaros": # + 6
item.quality = item.quality - 1
else: # + 4
item.quality = item.quality - item.quality
else:
if item.quality < 50: # + 4
item.quality = item.quality + 1
# Total Cognitive Complexity = 69
The key driver of all of this complexity is 1) the sheer number of conditionals in the Gilded Rose & 2) the amount of nesting that exists in the function.
To improve the complexity we’ll want to tackle both of these issues.
Let’s just take a look at the last section of the Gilded Rose - starting with if item.sell_in < 0:
. We can take 3 initial steps to make big improvements here:
1. The for block is split into three if conditions - Let's look at the last one first. In the nested condition it checks if item.name != "Aged Brie"
.
In general, it's easier to understand a condition if the test is positive rather than negative. We will therefore start by inverting the condition, yielding this:
if item.sell_in < 0:
if item.name == "Aged Brie":
if item.quality < 50:
item.quality = item.quality + 1
else:
if item.name != "Backstage passes to a TAFKAL80ETC concert":
if item.quality > 0:
if item.name != "Sulfuras, Hand of Ragnaros":
item.quality = item.quality - 1
else:
item.quality = item.quality - item.quality
2. Let's go ahead and invert the"Backstage passes"
the condition here too, for the same reason.
This gets us to:
if item.sell_in < 0:
if item.name == "Aged Brie":
if item.quality < 50:
item.quality = item.quality + 1
else:
if item.name == "Backstage passes to a TAFKAL80ETC concert":
item.quality = item.quality - item.quality
else:
if item.quality > 0:
if item.name != "Sulfuras, Hand of Ragnaros":
item.quality = item.quality - 1
3. Then, an elif is easier to read than a nested if inside an else condition, so let's change this to an elif.
if item.sell_in < 0:
if item.name == "Aged Brie":
if item.quality < 50:
item.quality = item.quality + 1
elif item.name == "Backstage passes to a TAFKAL80ETC concert":
item.quality = item.quality - item.quality
else:
if item.quality > 0:
if item.name != "Sulfuras, Hand of Ragnaros":
item.quality = item.quality - 1
Overall this takes:
if item.sell_in < 0: # + 2
if item.name != "Aged Brie": # + 3
if item.name != "Backstage passes to a TAFKAL80ETC concert": # + 4
if item.quality > 0: # + 5
if item.name != "Sulfuras, Hand of Ragnaros": # + 6
item.quality = item.quality - 1
else: # + 4
item.quality = item.quality - item.quality
else:
if item.quality < 50: # + 4
item.quality = item.quality + 1
To:
if item.sell_in < 0: # + 2
if item.name == "Aged Brie": # + 3
if item.quality < 50: # + 4
item.quality = item.quality + 1
elif item.name == "Backstage passes to a TAFKAL80ETC concert": # + 3
item.quality = item.quality - item.quality
else:
if item.quality > 0: # + 4
if item.name != "Sulfuras, Hand of Ragnaros": # + 5
item.quality = item.quality - 1
We’ve managed to shave off 7 points of complexity fairly quickly. The next steps start to get more complicated – and Nick goes through them in great detail in
def update_quality(self):
for item in self.items: # + 1
if item.name == "Sulfuras, Hand of Ragnaros": # + 2
continue
if item.name == "Aged Brie": # + 2
if item.quality < 50: # + 3
item.quality = item.quality + 1
if item.sell_in < 1 and item.quality < 50: # + 3
item.quality = item.quality + 1
elif item.name == "Backstage passes to a TAFKAL80ETC concert": # + 2
if item.quality < 50: # + 4
item.quality = item.quality + 1
if item.sell_in < 11 and item.quality < 50: # + 4
item.quality = item.quality + 1
if item.sell_in < 6 and item.quality < 50: # + 4
item.quality = item.quality + 1
if item.sell_in < 1: # + 4
item.quality = item.quality - item.quality
else:
if item.quality > 0: # + 3
item.quality = item.quality - 1
if item.sell_in < 1 and item.quality > 0: # + 3
item.quality = item.quality - 1
item.sell_in = item.sell_in - 1
# Total Cognitive Complexity = 35
Through this refactoring, we’ve managed to cut down the cognitive complexity from 69 to 35. This is still far higher than we would want it to be (ideal targets are below 9), and you can continue to refactor it to keep improving the score. By repeatedly reducing nesting and overly complex conditionals we’re able to dramatically improve the readability.
Cognitive complexity provides us with a good tool for figuring out how complex different sections of our codebase are and, as a result, which sections might be slowing us down the most.
If we want to improve our code’s complexity we should focus on reducing nesting, limiting the number of conditionals (especially nested conditionals) we’re using, and using shorthands that allow us to simplify and shorten our code further.
At the end of the day though, complexity is only one aspect of code quality, and we need to pay attention to all of them to optimize our code & our development processes.
This is the first part of a seven-part series on code quality, technical debt, and development velocity.