paint-brush
Most Software Bugs Are Not From Lack of Knowledge by@kamlasater
1,164 reads
1,164 reads

Most Software Bugs Are Not From Lack of Knowledge

by KamOctober 2nd, 2022
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

The prevailing culture in software to errors seems to be "if you just", where errors are blamed on lack of knowledge. As society becomes more dependent on software systems we are building the severity and impact of problems only grows. My goal is to make it easier for others to share failures I have seen or participated in. This will create a feedback loop that will let us learn faster. We would love to hear your stories of failure from myself and Cyclic. Join us in sharing your stories.
featured image - Most Software Bugs Are Not From Lack of Knowledge
Kam HackerNoon profile picture

When I was a younger I went on several rock climbing and mountaineering expeditions. I was exposed to some instructors and practitioners that took staying safe in the mountains very seriously. One of them carried a book from The American Alpine Club on climbing accidents. The accidents were never the cause of a single bad choice. They were caused by a series of decisions, taken over time, that combined to create the conditions for the accident.


What stuck in my memory, sitting beside rock faces that could very well be the scene for one of the stories, was that in certain cases we might make one of the same choices. That given a different time of day, or different weather, or different climbing partners, the same choice could have different levels of risk. Reading and discussing those stories made me a more aware adventurer and a safer climber.


The public analysis and discussion of the chain of decisions and events that led to accidents helps beginners build experience, experts grow wisdom and fosters a culture of safety. Other industries and sub-cultures, to a have their own ways to learn from accidents. For example: the US military has after action reviews, medicine has accident review boards and aviation has NTSB accident reviews.


I have read root cause analysis reports from outages at hyper scalar cloud hosting providers. Often they read like the plaster foot prints of a long dead animal, the legal and marketing departments sanitizing and scrubbing any element of life or learning from them. They are the cargo culting of accident investigation and reporting. All of the motions and affectation with none of the substance.


The prevailing culture in software to errors seems to be "if you just". Where errors are blamed on lack of knowledge. If the operator had "just" known the impact of config change, if the developer had "just" understood the interaction of their code change. It is a culture of expert knowledge. It is based on the unexamined belief that bugs only exist from lack of knowledge or lack of care. Or taken to their offensive extremes: stupidity and laziness.


In the software industry, bugs or outages most often do not result in injury or death. However, as society becomes more dependent on the software systems we are building the severity and impact of problems only grows. As our software systems continue to grow in complexity and the reach of problems we write software to solve, we as a society must grapple with changing this culture.


As a step on this path of changing this culture I am talking about failures I have seen or participated in. My goal is to make it easier for others to share where they have failed. This will create a feedback loop that will let us learn faster.


Join me in sharing stories of failure. Here’s a talk about failure, and a couple related posts from myself and the Cyclic team:

  1. How to Fail at Serverless: Serverless is Stateless (blog post)
  2. Its Always Sunny in us-east-1: The gang does business continuity (blog post)
  3. AWS S3: Why sometimes you should press the $100k button (blog post)


We would love to hear your stories.


Write something up.


Post it publicly.


Let us know.


We got your back :)


Also published here. Featured image above generated by HackerNoon Stable Diffusion.