Why You Should Always Avoid Encoding Type Into Names

The Rule: When naming a variable, don't encode it's type into its name, even in a dynamically typed language.

Static and Dynamic Examples Required

In the last article I talked about the general rules of good naming. In that, I briefly touched on the idea of not encoding type into your variable names. However, when picking those general rules I only considered things that were agnostic to language type.

Languages can be Statically or Dynamically typed. Either they have their types decided at their definition or at their assignment.

Not encoding type into your names is actually a good rule for both Statically and Dynamically typed languages, but the reasons differ between them. For this reason I wanted to make a separate post highlighting this rule and its reasons.

In Statically Typed Languages

In statically typed languages like Java, we must declare the type of a variable when its defined. So if I wanted to build a list of integers I would write:

final List<Integer> integerList = new ArrayList<>();

Note the inherent duplication of information here. I already inform my reader that I am creating a list of integers with

List<Integer>

, I don't need to do it again by naming the variable

integerList

. This is an easy name to come up with, but adds no new information and does not tell a reader what the integers actually represent. I can improve things with a name like:

final List<Integer> primesList = new ArrayList<>();

Now readers can easily discern that this is a list of prime numbers, but I have still included duplicate information about it being a list. This opens the name to a potential risk, where another developer realizes all primes are unique, and changes the type in the future. Now we end up with a definition that is misleading, like:

final Set<Integer> primesList = new TreeSet<>();

A reader of this variable will be confused, because the usage of a

Set

and a

List

are quite different. While the definition site of this variable provides the correct information, if that is a long ways from its use-site, the reader will be left to assume the lie.

public class Example {
    private final Set<Integer> primesList = 
        new HashSet<>(); // This is the definition site

    // ...
    // A very long class full of code
    // ...

    private int doSomething() {
        //...
        this.primesList.add(prime); // Use-site: Here the name is misleading
        //...
    }
}

If there is a bug in here, we have made debugging this code much more complicated. To a reader of the use-site, this object can be sorted, accessed by index, and a bunch of other functionality that does not exist.

Obviously, It is frustrating that the second author didn't update the name to match the change in type, but it's not unexpected. This kind of change happens often in the real world.

Parallel data is often the root of problems in software engineering, because maintaining synchronization is challenging. Having a single source of truth is often a priority in good system design, and it is also true in good programming (This is also true for writing code comments, which I will cover in a later article). In statically typed languages we can future proof our names by not encoding the type twice.

I would be better off calling the above function

primes

, using the plurality of the name to indicate that it is a collection. This would look like:

final Set<Integer> primes = new TreeSet<>();

In Dynamically Typed Languages

In dynamically typed languages like python, we don't have to declare the type of a variable when defining it. Instead, the type is set at assignment. Imagine I am passing a collection of Duck objects into a function like:

def quack_all_ducks(ducksList):
    for duck in ducksList:
        duck.quack()

Here I are quite intentionally touching on an important feature of dynamically typed languages, called duck typing (If it walks and talks like a duck, for all intensive purposes, it is a duck). The idea is, in the function

quack_all

I don't actually care if the input is a list, or even if it contains ducks. All I really care about, within this function, is that I can iterate through the items given, and, that for each item, I can call a method

quack()

. Encoding the words

List

and

duck

here both limit the power of this function. I am giving up one of the main advantages of using a dynamically types language.

This method will call quack on anything it is passed; even a set of Geese. The code would execute properly but to the reader it would be unexpected (and more difficult to debug - Why are all my geese quacking?). For example:

def on_loud_noise(pond_area):
    birds = [pond_area.get_all_ducks()] + [pond_area.get_all_geese()] 
    quack_all_ducks(birds)
    #...

The fact that this makes the geese quack too isn't clear without digging further into code. The crux of clean code is putting slightly more work on the writer to simplify the life of the reader.

I would be much better off naming the parameter

quackables

. This expresses it is an iterable (a collection) through plurality, and that the contents can each be

quacked

. This would look like:

def quack_all_quackables(quackables):
    for quackable in quackables:
        quackable.quack()

Now the

quack_all

function can be used for anything that might quack, such as a collection of geese or swans. In dynamically typed languages, encoding type into names counteracts the power of duck typing.

Our callsite now would look like:

def on_loud_noise(pond_area):
    birds = [pond_area.get_all_ducks()] + [pond_area.get_all_geese()] 
    quack_all_quackables(birds)
    #...

Since Geese quack, this is much clearer to a reader that they will be quacked as well.