I can’t tell you the number of times the title of this post has crossed my mind as I dug through a piece of code that I hadn’t touched in years.
At the time I wrote it, I probably thought my code was beautiful. An elegant masterpiece. It should have been printed, framed, and hung on a wall of The Programming Hall of Fame. As clever as I thought I may have been a few years ago, I rarely am able to read my old code without some serious time wasted debugging.
This problem plagued me regularly. I tried different techniques to try and make my code easier to understand.
First I tried adding comments to my code. Pretty easy to do, but not all that helpful.
When comments weren’t cutting it, I tried to write self-documenting code instead: small, well-named classes and methods that described their limited functionality. This made the code more readable but I would still have questions.
I also tried documenting my code in a separate file. This had its benefits but didn’t solve the problem entirely either.
Eventually I figured out what I needed to do: I needed to use all three of the above techniques to write truly beautiful and understandable code.
Comments in your code should document the why, not the how. When I first started programming, I would often write very unhelpful comments like this:
public class Class1{public List<string> DoWork(List<string> a){List<string> numbers = new List<string>();
// Loop over data
for (int i = 1; i < a.Count; i++)
{
int s = a\[i\].IndexOf(" ");
string num = a\[i\].Substring(0,s);
// Save data
numbers.Add(num);
}
return numbers;
}...}
“Loop over data”? “Save data”? These comments are beneficial to understanding the code. I can easily tell that I have a loop, and that I’m adding my data to a collection, why should I waste valuable screen real estate with unhelpful comments?
Instead of saying what or how, comments should explain why. A programmer will see the for
loop and know that it’s looping over some type of collection of Addresses
. However, a programmer will not know why we are starting our counter with int i = 1
— this is where adding a comment can improve the understanding of the code:
// i = 1 because the view will never display the first addressfor (int i = 1; i < Addresses.Count; i++) {...
Now, we know some of the business logic driving our app. We know we don’t process the first address because it never gets outputted to our view. This comment answers the why behind skipping the first address, adding clarity to the code.
Additionally, we remove the // Save data
comment completely since it adds no insightful value.
Comments alone won’t make code easy to reinterpret however. Let’s take at our method again with our improved comments:
public class Class1{public List<string> DoWork(List<string> a){List<string> numbers = new List<string>();
// i = 1 because the view will never display the first address
for (int i = 1; i < a.Count; i++)
{
int s = a\[i\].IndexOf(" ");
string num = a\[i\].Substring(0,s);
numbers.Add(num);
}
return numbers;
}...}
What exactly is Class1
? What kind of work isDoWork()
doing? What about the use of int s
? The names of the objects in our code don’t aid in our understanding what this code is doing.
This is where the idea of self-documenting code comes in: instead of creating objects with arbitrary, non-informative names (“I swear I’ll refactor this later”), we build descriptive objects. If I have a class, its name should give me a good idea about what its properties and methods could be. A method’s name should be descriptive enough to tell me what I should expect as an output without having to dig into the details of what that method is doing. Variables should add additional illumination that make what and how type comments obsolete.
In our example, let’s make our code self-documenting. First, this class is intended to help us clean address data. Let’s call itAddressStandardizer
. With that simple renaming we know that all of the properties and methods of this class should pertain to dealing with dirty address data and making it cleaner.
What about the method name List<string> DoWork(List<string> a)
? Well , I can tell you that this method is trying to parse out the number portion of a street address. So let’s change the method name and signature to something more informative, like List<string> ParseHouseNumbers(List<String> addresses)
. Now we can make an educated guess that this method accepts some address strings as an input and and it will return a list of parsed house numbers.
If we clean up some variable names, our code becomes much easier to read, like this:
public class AddressStandardizer{public List<string> DoWork(List<string> addresses){List<string> houseNumbers = new List<string>();
// i = 1 because the view will never display the first address
for (int i = 1; i < addresses.Count; i++)
{
int firstSpaceIndex = addresses\[i\].IndexOf(" ");
string houseNumber = addresses\[i\].Substring(0,firstSpaceIndex);
houseNumbers.Add(houseNumber);
}
return houseNumbers;
}...}
Our code is finally starting to shape up. We have comments explaining why we chose to do something and we refactored our code to have object names that are informative.
The code at this point is ok but not perfect. If we don’t look at this code for a few years, we probably have enough information now to look at the code and figure out what it’s doing with relative ease.
The big piece of information that we are still missing however is knowing why this code was written in the first place.
Often times, I get a question from a manager or analyst about why we decided to build the project in the first place. Or I’ll get a request for information about how the logic in the program works. Without a proper documentation file, the best thing I can do is send the business user a copy of my code. Most of the time that isn’t very helpful.
What would be helpful though is an explanation of what our program is doing at a high-level. This is the purpose of formal documentation.
The documentation for this section of code might look something like this:
…After retrieving our customer information from our vendor, the program processes the data and cleans it up to load into our reporting warehouse. Cleaning up the data means parsing the addresses into multiple columns including house number, street name, street suffix, city, state, and postal code…
Now, when a business user needs to know what your program is doing, you can easily send them the above documentation their way. The documentation also acts as a nice refresher for you, the programmer, when it comes time to revisit the code, as well as any future coworkers who will be new to the project.
All of these techniques are necessary to eliminate code headaches down the road. Learn from my experience — not doing all three may save a little bit of time in the short term but it will hurt at some point in the future. Once you get in the habit of writing all three kinds of documentation, it will become second nature and make your life (and the lives of your future-self!) much easier.
Comments in code should explain the why not the how: