paint-brush
Feature Flag Fails can Cost Millionsby@newsletters
145 reads

Feature Flag Fails can Cost Millions

by newsletters September 3rd, 2021
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Most people keep their flagging disasters under wraps. Luckily, for those reading, we are going to break down some scenarios of feature flag disasters and explore the insights that we can learn from them.

People Mentioned

Mention Thumbnail
featured image - Feature Flag Fails can Cost Millions
newsletters  HackerNoon profile picture


Most people keep their flagging disasters under wraps.


Luckily, for those reading, we are going to break down some scenarios of feature flag disasters and explore the insights that we can learn from them.


Reused Flag Names


Imagine a situation when someone names a flag like this:


brand_new_flag


Then think about the havoc this name will cause for both back-end and front-end teams. Each of these teams could use the same flag and step on each other’s toes by touching the same switches.


To avoid that, come up with a naming convention where each toggle has a verbose name like this:


this_is_a_lenghty_flag_name_created_for_this_newsletter


Also, think about including its purpose, e.g. by including a prefix of “Release,” “Experimental,” and others.


Holding on to Outdated Toggles


Another sad situation is when short-term switches contribute to increased technical debt which is not to be confused with long-lived toggles.


The longevity of a toggle refers to how long it is expected to be in use. For example, release toggles are needed in a codebase for a limited period of time. In contrast, circuit breakers and permission toggles are long-lived. They provide control for years after the release of a particular feature.


With that said, fleeting feature flags should be disposed of as soon as they have served their purpose.


LinkedIn and some other giants have a proven record of sticking with end-of-life toggles. Linkedin users once had an issue navigating the platform, because the site ran a release with all flags switched on.


Bonus: You can avoid this sorry state of affairs by implementing a naming convention. Try prefixing your short-lived feature flags with temp-.


Logging to Prevent Forgetting


The logging practice is a mandate for feature flagging. Once you log all the changes to a feature flag, you can track who made modifications and at what time. So if anything goes sideways, you’ll always have someone accountable for the issue. It also means that you can fix the bug without wasting time on finger-pointing.


With that said, here’s a cautionary tale with a whole host of don’ts:


Have you heard about Knight Capital losing more than $400 mln due to a trading glitch? The glitch set off turmoil to the company by selling all the stocks it accidentally bought Wednesday morning.


In short, they left the “old” version and reused a toggle (yiykes). When the flag was flipped, the company reverted to the obsolete code still running on the eighth server and reinstalled it on the other seven versions.


As it turned out, this caused a domino effect, since all eight servers then had the defective code triggered by the reused RLP toggle and executing in full throttle.


The Moral of the Story


The Peter Parker principle says that with great power there must also come great responsibility. Although he likely knew nothing about feature flags, the DevOps teams can still relate to that.


Switches introduce complexity. We can keep tabs on this complexity by using time-tested feature flagging practices and appropriate tools to manage them.


Subscribe to HackerNoon’s thematic newsletters via our subscribe form in the footer.