Why you should document your self-documenting code

First and foremost we should agree on the definition. I understand self-documenting code as a code that can not be possible made any more transparent by adding new textual artifacts to it that are not code. Is this ok?

Now I want you to see this.

Three lines.
The first line is 5 syllables.
The second line is 7 syllables.
The third line is 5 syllables.
Punctuation and capitalization unspecified.
It does not have to rhyme.

These are the rules for haiku. The poets who master this form can put the whole spirit of a moment in just three lines. It takes a lot of talent, practice and inspiration.

And these are the rules for Shakespearean sonnet.

Fourteen lines.
Metrical line is iambic pentameter, so 10 syllables per line.
Rhyme scheme is: a-b-a-b c-d-c-d e-f-e-f g-g.
All the rhymes should be single.

Less rules, but they are more strict. It is a demanding form. There is even an opinion that since Shakespeare himself, no one could truly master it, although many tried.

Now let’s take a look at programming languages. For instance C++ standard in act (ISO/IEC 14882:2014) consists of 1358 pages. It has 30 chapters and 6 appendices. It’s index alone is 27 pages long. Do you really expect this to be an easier art form to express your thoughts than sonnet or haiku?

I’ve seen a lot of code authors claimed to be self-documenting or self-explanatory. I wrote a lot of code I considered self-documenting myself. It was all a delusion. All that code don’t pass the single appropriate test — the test of reading by people who are not you.

Writing self-documenting code is an act of extreme poetry. Writing prose is hard enough as it is, and writing within narrow boundaries of a computer language is enormously harder. Sure, you can do several lines of well thought well readable code that is hard to misunderstand, but in order to produce this magnificence in mass quantities you have to have a talent at least 2000 millishakespears worth.

But you might think this is a losers excuse and you, unlike the rest of the world, do have all the talent and patience. Don’t make my mistakes, don’t fool yourself. You are not that talented, and your code is not that clear. Ask someone who is completely unfamiliar with it for a review. You’d be surprised.

Shakespeare by William Blake [Public domain], via Wikimedia Commons. You have to be twice as talented as this guy to write truly self-documenting code.

If you think I’m going to praise code documentation at this point, I’m not. Documentation sucks. The very concept of describing code with something that is not code is fundamentally flawed and I’ll show you why.

In autumn 2013 I was working on a project that involved reimplementing some piece of decades old Assembly code in C. I had to measure the effort needed and compare it to the gains we would have gotten from this transition. The code was well covered with comments and it was also supplied with the full documentation — tomes of it. It was the most documented code I saw in my life. And it took me roughly two weeks until the first suicidal urges.

I came to my department manager and said: “I just can’t go on anymore. The documentation says the piece does one thing, but the comments say it does another. And the code actually does something completely different. I just don’t know what to do.” “Oh, this happens all the time with really old projects,” she replied, “I know what to do. Your job is to reimplement the behavior, right? Just read the code then! See how simple it is?”

From the viewpoint of this wise woman, I would have been better reading 60K SLOC of Assembly, then relying on documents and comments deliberately written so I would not have to read any code in the first place. At first I though she was mad. But as it turns out, it was I who was stupid.

Documentation rots. That’s a fact. When we introduce changes to the code, we have all the means to check that it remains correct. We have compilers, static analyzers, dynamic analyzers, unit tests, functional tests, performance tests, coding standards and practices, and auto-metrics for them. All we have for documentation is some human conducted verification. The most expensive and the least reliable form of verification that is.

It’s not that I despise humanity, it’s just how this works. If you want to verify something for correctness, you should let machines do that for you. And the correctness of documentation is impossible to verify automatically, because the very purpose of the documentation is to serve as a bridge between a machine and a human. Formally verifiable documentation is possible, but it would still be a code, and we already had this talk about extreme poetry.

Matsuo Bashō By Hokusai [Public domain], via Wikimedia Commons.

So having unrealistic self-documenting code on the one hand and conceptually flawed documentation on the other, what should we choose? A Ukrainian proverb says, between two evils always choose both. There’s no dichotomy. You should make your code as self-documenting as you can; and then document it.

Because it’s still better to have rotten documentation than have none at all. Even while being incorrect and misleading in details, it might give you the very helpful insight about the author intentions. It may give you the information you would not ever get from the code, like the core idea of the algorithm, or its limitations and unexpected artifacts, or the scenarios it is expected to be used in. I would say, even if a piece of documentation is 70% rot, 20% trivialities and only 10% of helpful guidance, it is still worth reading.

Of course, since you don’t know which is which, you have to read all of it. And trust none.

Still better than self-delusion.