Use the 80/20 Rule with Moderation by@felipepenha

May 28th 2020 703 reads

The **80/20 rule**, a.k.a. **Pareto principle**, has been perpetuated along the lines: **"80% of the effects come from 20% of the causes." **Different cases where the rule emerges have been studied, in the last century, by great personalities such as Vilfredo Pareto (land ownership in Italy), George Kingsley Zipf (word frequency in Languages), and Joseph M. Juran (quality management in industries). Working as a Data Scientist, I have seen enough of the 80/20 rule being invoked in business meetings followed by a round of applause ๐๐๐. Also, I have read numerous LinkedIn posts alike. Most times, it is just a reckless stretch of the rule. But what is the danger here, if any? After all, profits matter more than mathematical and statistical rigor.

Well, the danger is making bad decisions based on these stretches, possibly incurring into financial loss.

The 80/20 folklore can be addressed if we all agree to pay attention to:

- the concept behind the rule; and
- examples where the rule is true,
*i.e.*supported by data.

These are the basic items that I intend to cover in this short article.

Photo byย Alex Chambersย onย Unsplash

Nature has an elegant tendency of repeating itself, mathematically speaking. And by Nature, I mean basically everything surrounding us, be it spontaneously occurring (*e.g.* meteors) or man-made (*e.g.* machines), be it concrete (*e.g.* paper clips) or abstract (*e.g.* language). Thus, it should not strike as a surprise that versions of the 80/20 rule surface in disparate contexts such as Astronomy and Business Administration.

We, humans, are very good at spotting Nature's patterns. In Statistics, these patterns are called **distributions**. See below how a frequency distribution comes about:

This is a histogram, where one has to arbitrarily define bins of data (number intervals). The height of the bar in each bin is the event count, or frequency.

The above "bell-shaped" histogram is a typical example of data that approximately follows a **normal distribution **(or Gaussian distribution). There are various statistical tests available to check how close to a normal distribution your data really are, but let us not get down to that rabbit hole. Just accept that all data are imperfect and we, scientists, are always trying to fit them into idealistic shapes such as the normal distribution.

The bell shape of the normal distribution has a very clear meaning: extreme values are rare and there is always a well-defined mean value sitting at the peak between these extremes. The height of humans and the grades of students are classical examples of quantities that are expected to follow such pattern.

In the case of a normal distribution, a 50/50 rule is applicable: 50% of the values amount to 50% of the occurrences. In other words, the distribution is symmetric.

Some people will make the mistake of believing that anything in Nature is "normal". Do not fall into such trap! While it is true that some popular distributions tend to a normal distribution for large numbers (Central Limit Theorem), there are many other distributions out there that do not. An example of these stubborn ones is the **power law distribution**, which is **not centered **around a certain value, it is **ever decreasing** and **spans over various orders of magnitude**. That is exactly the type of distribution that leads to the rather unbalanced 80/20 rule.

Photo byย Raphael Schallerย onย Unsplash

In the Linguistics world, you will hear about a certain **Zipf law**, which is basically a disguised power law. See it by yourself how a handful of most frequent words in the English language amount to most of the occurrences in written text:

To produce the above plot, I used a collection of texts in the English language known as the Brown Corpus. Notice that I have arbitrarily highlighted a few words in red, to have a better sense of how complexity gradually increases from the head (frequent words) to the tail (rare words).

In fact, ~20% of the most frequent words amount to ~92% of the English language. A version of the Pareto Principle, or a 92/20 rule! ๐ฅณ ๐ The proof:

As everything in life, there is a spectrum of possibilities...

This is even better than 80/20! But don't get too excited: acquiring 20% of the vocabulary does not mean you know how to connect the words to make sentences. Also, as you can see in the plot, the top frequent words are articles, prepositions, and conjunctions.

Luckily, J. K. William *et al.* demonstrated that phrases follow more closely a power law than individual words. See how the orange curve looks more like a straight diagonal line in the log-log scale:

See below additional examples, borrowed from M. E. J. Newman 2006:

These examples (a-l) cover a variety of power laws, which are linear in the log-log scale:

**log(y) = b - a log(x)**

, with **a**

ranging from **1.8**

to **3.5**

. The popular 80/20 rule holds only when we have **a = 1.2**

. Our rather extreme example with the Brown Corpus followed a power law with **a = 1.4**

and ended being a 92/20 rule. Therefore, in the examples above (a-l) you should expect anything between 92/20 and 100/20. Notice that these two numbers separated by a "/" do not need to add up to 100. You see, now, how a perfect 80/20 rule is unusual?Photo byย Hunters Raceย onย Unsplash

At least a handful of executives that I have met would start exploring an Excel sheet as follows: order the rows by one of the numerical columns that is meaningful to the business problem at hand, take the top 20% rows and assume that these would amount to 80% of revenue. The myth that this could possibly work can be traced back to this book:** ****"The 80/20 Principle: The Secret to Achieving More with Less" by Richard Koch**. If you have identified yourself within that group, I urge you to read the present text in full, and understand first where the 80/20 rule comes from.

When you make a bold 80/20 rule statement in a business meeting, are you sure your data are distributed at least vaguely as a power law? Or, in case you do not have the data at hand, do you have a strong intuition that it would be distributed as a power law? Meaning, the distribution is **not centered **around a certain value, it is **ever decreasing** and **spans over various orders of magnitude**. And now, that we have learned to identify power laws, we know that the distribution should be** linear in the log-log scale**. Why are these conditions relevant? You would like to have a certain guarantee that the first 20% of the effort are much steeper than the next 80%, and thus worth the initial investment. Check it out yourself above, again, the cumulative frequency graph of the Zipf law of the English Language, to visualize how the curve is notoriously steep until it reaches a plateau. That is the behavior you are looking for!

A few guidelines:

- always hold on to examples supported by data (ignore the anecdotes) to be able to make parallels like a pro;
- do not assume you can extrapolate the 80/20 rule linearly, because the usefulness of the rule is in reflecting the non-linear character of practical problems;
- 80/20 is a special case and you should expect heavily imbalanced rules such as 90/1 to be rather common;
- be careful when formulating the rule in terms of "time", alternatively use the rule to redirect where your teams spend their time; and
- keep in mind that the last mile (the 20% left from the first 80%) may be the most relevant, or even essential, to your business.

Most importantly, be skeptical with any strong 80/20 statements conveyed as pulling a rabbit out of a hat.

Richard Koch's book is full of anecdotal evidence and it displays a quite optimistic view of the world, where applying the 80/20 rule could only possibly bring benefits to your business and personal life. As a Data Scientist I see it differently: the 80/20 mania is so much internalized by executives that its perpetuation without data to support it undermines the whole data-driven culture that has been built in the last few years.