Code Review Is Inherently Flawed. Here’s How to Fix It

These days we should think critically about how we do things in programming. We need to apply the engineering approach to our processes. We, the software engineers, are confident in discussions about abstract classes and pure functions. On the other hand, we run away when there is a need to discuss "managerial" things.

My favourite widely spread programming process that has an enormous amount of flaws is code review. This story will look at it from different perspectives and propose improvements.

The first significant fact I read about software inspections was in the "Facts and Fallacies of Software Engineering" book by Robert Glass. It claims the following:

Rigorous inspections can remove up to 90 percent of errors from a software product before the first test case is run.

I couldn't determine if these words were solely about the code review. I treat them as referred to different kinds of inspections, which I will describe later.

Uncle Bob Martin helped me to discover the roots of our modern reviews. Michael Fagan formulated the idea of inspections in 1976 in his article "Design and code inspections to reduce errors in program development".

I discovered three following types of inspections there:

Design inspection,
Code inspection before unit testing,
Code inspection after unit testing.

Fagan's work doesn't propose a new approach to code but documents the already existing phenomena and argues for it. However, the article is the earliest written sign of inspections I've found.

Code inspections look like our modern code reviews. Why do we miss other types of inspections today?

Why There Are Only Code Reviews Today: Some Assumptions

The popularity of code inspections and the almost inexistence of other types of inspections today comes from our tools. GitHub, BitBucket, or GitLab all have built-in code reviewing instruments, and they naturally fit in Git flow, GitHub flow, and other approaches.

What tool do you use for design activities? It's not about the UI/UX. It is about the structure of the code you will build. You might have heard of CASE or UML tools. I haven't seen them seriously used in any company I worked for, and I have worked for seven.

On HackerNoon, the "UML" search query results in just two relevant results. So there is no place to introduce the design inspection when there is no tangible solution design process. In the team I lead, we used Miro for interface design. The process could have been more satisfying: as with other diagramming tools, you soon start drawing instead of solving your problems. We can be independent of our tools. Here is a little quote to support this from the "Investments Unlimited" book:

...if we just do what the tools can do — then we will always be limited to the capabilities of our tools.

What Flaws Do the Existing Code Reviews Have?

Let's look at the process in its classic form from different perspectives. Each of them shows significant problems.

BPMN Perspective

BPMN is a business process model and notation. It describes processes with actions, events, logical gateways, messages and other means. I even recommend using it if you want to develop an algorithm, as it is more descriptive than a flowchart. Let's depict the code review process for a single reviewer with this notation and analyze it. I've used a text-based tool to generate the upcoming diagram. There is a little glitch on it with many returning letters.

It all starts with a PR creation, and nothing is notable here. The next step is to notify a reviewer, which is a simplification of all different means available to say: "Hey, my PR is waiting for you!👋" What's important here is the freeing from the current activities. Here the PR waits for an unknown duration. Then the reviewer dives into the task and conducts the review. There is a chance that a PR will get merged right away. However, the opposite could happen: a few comments requiring fixes would appear.

The code's author might be already in the next task, and there is one more wait time of an unknown length. Also, coming back requires context restoration, the interpretation of the comments and their fix.

The next step is the notification of the reviewer.

Haven't we already been there? Yes, you are right. We've just finished our first iteration of the potentially infinite loop. Yes, infinite. There is always a chance that new comments will appear. Or that one of the waiting periods would last forever.

Do we want the potentially infinite loop as a part of our daily operations? I'm not sure, as having faster delivery is usually preferable.

Solution Vision Perspective

Sometimes, we have senior developers or architects in our teams as reviewers. They wish to have a consistent codebase, stick to some methods and patterns and avoid others. They have a vision. Developers also have their idea. Usually, they are not aware of the intent of their seniors. It requires consistent broadcasting or an auxiliary environment, which rarely happens. Let's have a look at the following picture.

You can see how the two visions of code review participants differ. After the first iteration, they begin alignment, but there is still a way to go. The reviewer adjusts one's vision, and the code author moves according to the interpretation of the proposals.

We can use the "imagine you've asked for a house and then surprise! It's not the one you expected" metaphor. Or we can look at its core. Imagine you've asked a person to achieve something. Then you come back and see the results, but you become surprised by the set of decisions the achiever has made. Don't be surprised, as you didn't require a particular decision-making framework.

Interpersonal Perspective

The image says for itself. Would you ask your fellow engineer to fix the design issue after spending many days on the task? Imagine that your sprint ends, and the ticket was the cause of some tension on a standup. You'll help your colleague merge it as fast as possible. On the other hand, there could be a senior engineer reviewing the junior's code. He could ask him to go a long way to fix the decision made in the beginning. However, is this the right time to give such advice? So many decisions stand on the wrong choice already.

Lean manufacturing has yet to influence programming. However, we already have a tool from the "Lean Software Development" book by Mary Poppendieck and Tom Poppendieck. This tool is the seven software development wastes:

Partially done work,
Extra features,
Relearning,
Handoffs,
Delays,
Task switching,
Defects.

Classic code review wins 6 out of 7!🎉

Partially done work. A task in the review is abandoned most of the time. I usually treat this stage of development as a queueing rather than one where the action is. Creating a review also has an interesting psychological effect: "The task is ready, and it's not my job anymore!" Code review is the "no one's responsibility land", which is why we often see it escalate to the upper levels.
Relearning. You can see the relearning on the BPMN diagram above. Restoring context is relearning.
Handoffs. If we are honest, this is not a waste here. It's the core mechanics.
Delays. Code review design contains two types of delays we've discussed earlier. What's even scarier is their loop.
Task switching. People pause their tasks to give some attention to a review.
Defects. Review is good for finding cosmetic problems but not design flaws, which would harm the most. The mentioned above lack of motivation to ask fellows for significant changes leads to substantial defects in the project.

We've covered the problems of code reviews from many sides. What can we do to fix this process?

Redesigning Code Review

In this section, we'll cover the same topics as the previous one but from the fixing viewpoint.

Fixing the Process Structure

Here we have a code author waiting for a review to complete and a pair programming session after a reviewer's overview.

In the proposed process, we can see several significant changes compared to the previous one:

No more potentially infinite review loop;
Single waiting time of the unknown duration, we'll handle it also;
No need to restore context or interpret feedback for the code author. Comments now serve as reminders for the pair.

What are the core prerequisites for this process to happen? Now a team needs an additional role. It implies that someone does the auxiliary activities, e.g., handles the technical debt, or fixes bugs of a lower priority. When code review appears on the radar, this person immediately drops the current task and jumps into the arrived PR. If you have ever wondered why you need some slack in the structure of your work, this is an example. If everyone is occupied with the top-priority features, you'll have them delivered later.

Fixing the Vision Divergence

What do we discuss on code reviews? The decisions made during the development. We made some of them an hour ago and others a week ago. Our choices stand on top of each other and are not independent.

How enjoyable do you find an opportunity to question one of the first decisions on which the next several hundred stand? You may be unhappy about the opportunity.

The most appropriate time to discuss the design decisions is when they appear. We need one additional type of review: design review. Sometimes when the problem is challenging, it's good to spend some time on the plan and discuss it with your knowledgeable colleagues.

There is a famous military adage:

No battle plan survives contact with the enemy.

If we translate it into the language at systems, we'll get something like: "System will certainly need to adjust its behaviour when the first feedback arrives". In the case of programming, the feedback would be the problems we face when trying to implement the design into our system. Some of the decisions need to be revised. We would need to change them and again diverge in our visions with a reviewer.

Adam Thornhill, in his overwhelming "Software Design X-Rays" book, proposed a method:

That's why I recommend that you do your initial code walkthrough much earlier. Instead of waiting for the completion of a feature, make it a practice to present and discuss each implementation at one-third completion. Focus less on details and more on the overall structure, dependencies, and how well the design aligns with the problem domain. Of course, one-third completion is subjective, but it should be a point where the basic structure is in place, the problem is well understood, and the initial test suite exists. At this early stage, a rework of the design is still a viable alternative and catching potential problems here has a large payoff.

I've coined a term for my team to have a convenient language: skeleton review. I hope it helps to reflect the idea behind the activity.

Lean Perspective on the Proposed Process

The proposed process eliminates or significantly addresses the discovered wastes.

Partially done work. Switching to another task during the anticipation time for a code author is not allowed, so there is no user-significant partially done job. Partially done activities of a less priority are the trade-off.
Relearning. No relearning occurs as little time passes between coding completion and pair programming.
Delays. We've significantly shortened the code reviewer's delay and eliminated the author's.
Task switching. No longer allowed for an author and managed for a reviewer.
Defects. Now fixing design defects becomes cheaper, and the most important of them, design defects, become fixed.

Additional Thoughts

We've reviewed the code review approach for a single author and a single reviewer. When more reviewers appear, problems multiply. So, unfortunately, not all bugs become shallow if you add people lately to look at all the decisions you've made. You'd instead make your code merge later.

Two of the most challenging problems I faced when trying to introduce the proposed process were the following:

Developers treat the review stage as a job done.
Managers get horrified when proposing to introduce redundancy into daily work. It's great they don't manage the firefighters' teams.

The cure to the first problem is repetition and speed.

The second problem is more complicated. Here I want to quote the "Actionable Agile Metrics for Predictability" book by Daniel Vacanti:

There are a lot of strategies that FedEx employs, but the one that is probably most important is that at any given time FedEx has empty planes in the air. Yes, I said empty planes. That way, if a location gets overwhelmed, or if packages get left behind because a regularly scheduled plane was full then an empty plane is redirected (just-in-time it should be said) to the problem spot. At any given time FedEx has "spares in the air"!

If you are a manager, consider this the next time when planning for utilization maximization.

Are we satisfied with the update? Yes, it looks better than what we have now.

Can we do better? Yes, we can.

The target is to eliminate code review at a time when we can guarantee that all the necessary quality is already in the code. To achieve this, we need to build an auxiliary decision-making framework to help developers apply what's treated as a good practice and avoid the bad one. I have never heard of such a framework, but in-IDE linters are a step towards it.

Thank you for reading! Write a comment if you want to discuss the described ideas.

Code Review Is Inherently Flawed. Here’s How to Fix It

Too Long; Didn't Read

Companies Mentioned

Why There Are Only Code Reviews Today: Some Assumptions