If you haven’t read Antifragile by Nicholas Nassim Taleb, check out my recap of the book. I highly, highly recommend the work. Taleb never mentions product development, but I couldn’t help but think of all the connections between successful software and antifragility.
One that immediately stands out is a lot of agile processes reduce fragility. In agile, you learn and adapt to small errors to make a better product on a more reliable timeline. In waterfall, you plan everything early on and have limited methods to adapt or learn.
Taleb often says big or fast is fragile. Waterfall is big planning up front. That size makes it easy for lots of tiny, non-linear problems to arise and multiply the cost of a project. Agile hits those same problems, but it encourages making adjustments as you go.
The author relates a story about how travel does not have optionality. Travel times have limited upside and almost unlimited downside. If you are driving your car down the road, you might hit more green lights than you should, or you might speed and not get pulled over. No matter the positives, you arrive a few minutes earlier at best. It can never take negative time to get somewhere.
But there are dozens of events that can slow you down: traffic, getting pulled by a police officer, accidents, bad weather, construction, etc. In one particular snowstorm Raleigh had a few years back, the usually 20 minute drive home took 3 hours. I was lucky. Others had it worse.
Like with travel times, product development lacks optionality. Many events will cause delays:
Yes, sometimes that feature you thought would take two weeks instead takes four days—had it happen to me this past week. But it’s more common to see something you think will take thirty minutes end up taking two days. A feature will never take negative time to complete, but can easily take ten times the estimated amount.
With waterfall, these events continue to extend the timeline, often non-linearly. With agile, you reprioritize and drop less valuable items so that you have the best product you could have made within your timeline and budget.
Sidenote: “fast” development also produces fragile products. If you are rushing to try to meet a tight deadline, you will have more bugs. If you don’t do quality assurance, the bugs will multiply non-linearly. If you hire junior developers to work beyond their current abilities, you will have a product that is difficult to continue building upon. If you don’t write automated tests, the code is more likely to break in unexpected ways with future changes. If you don’t take the time to refactor, changing the system gets more and more expensive over time until your team starts demanding a full rewrite. (Side side note: never do a rewrite)
All these have tradeoffs, and keeping a team fast now and fast later is about balance. For instance, let’s look at automated testing. Writing automated tests increases the initial launch date by 30% on average, but saves you when maintaining a product. There are cases where it makes sense to not write tests. We cannot predict the risk of a critical failure due to not testing, but we can tell that it makes the product more fragile. It then becomes straightforward how to decrease that fragility: add tests.
Another antifragile aspect of agile is its retrospectives. You take time to talk about what didn’t work or what slowed you down, and you change it. Over time, your team improves.
In the team I’m on now, we discovered our meeting structure was getting in the way of work being completed. We iterated on it a few times and found a scheme that works for us. Now our meetings are bunched together to allow makers more long stretches to get into flow. We also come prepared with agendas so we spend less time in meetings. These small changes have made a big impact on our design and development work.
Retros allow you to turn a negative event during a sprint into a chance to learn from it and become better. The more bad events, the more lessons learned, the more improvements your team makes. Done right, the problems start to dissipate and your team hits its stride.
Kanban is a system where your work moves through various phases, and the amount of work in progress for each phase is limited. Let’s say you have “In Design”, “In Development”, “In Quality Assurance” (QA), and “Done”; and you are capped at 3 tasks in each phase. If you then have 3 items in QA, you cannot move a task to it from Development until you’ve moved an item from QA to Done.
Kanban was developed as part of the Toyota way. If a worker encounters a problem along the assembly line, the car manufacturer allows them to stop the entire line. That way, when one part of the process breaks or bottlenecks, it gets the attention it needs to get fixed. Practices like this helped Toyota dominate the industry. Small errors lead to big improvements in process.
Antifragile systems are often redundant. Take the human body for instance. The body has two of almost everything and you can lose large portions of many organs and still function.
Traditional teams don’t have that redundancy. The back end developer only writes Java. The database guy handles the SQL queries and stored procedures. The front end designer only does HTML and CSS, and the front end developer writes the JavaScript to plug it together. Of course, sometimes you have no database work to do. Other times you have an overwhelming amount of front end and your back end guy gets way ahead… only to find out later that what he wrote doesn’t work for what the front end needs and he has to completely rewrite it.
You can imagine the ideal where everyone on the team is great at everything. Whenever work comes in, it gets tackled by whoever is free. The skills needed to create an app are diverse, so it’s impossible to master all of them. But many are transferable.
For example, if your front end developer can handle design, your back end person is full stack, and your database administrator can code—you have a cross functional team. Fewer things will block the progress of your product and you limit the impact of negative events.
Pair programming is a core tenant of eXtreme Programming (a brand of agile). Pairing helps reduce bugs, reduces the likelihood and impact of blockers, and keeps the developers focused. But one of the largest benefits is knowledge redundancy. Complex software often has edges to it that developers consider arcane.
In one particular project I was on, it was the code which took an image, split it into a handful of distinct colors (with some fudging for colors that were “close enough”), and broke that into layers so each color could be changed independently of one another. In another project, it was the authorization stored procedure which would join half a dozen tables to a query to prevent a user from seeing more than they were supposed to.
In both examples solo developers wrote the modules. When the author of the database authorization module wanted to leave, management threw the bank at him. They feared no one else could maintain it.
Now, I’d wager both pieces of code would have been simpler with two sets of minds working on it originally. Those modules wouldn’t have gained a reputation for being arcane in the first place.
But let’s assume the complexity was unavoidable. In that case, you now have two people who know how to work with it. Should one leave: no problem. You’ll have another trained up on it soon enough. In other words, it limits the downsides.
The waterfall way of project planning is to set a timeline—usually in Gantt chart form—at the start of a project to outline exactly how everything will go. We’ve already discussed why planning so far ahead is so prone to error: the non-linearity of how problems affect the timeline. But there’s another common anti-pattern: having the business define how long development should take. The reason is simple: skin in the game.
Skin in the game means the negative outcomes hurt you in some way. If the deadline is too aggressive, the development team is the one who ends up pulling the late hours to try to finish it. Agile solves this by having developers commit to the work being done as part of sprint planning. By all means hold the team accountable if they commit to something and don’t complete it. If you don’t, they don’t have skin in the game, either.
If you’re on the business side and the timeline you receive is too great, you have a few options: reduce the scope of things that need to be completed, or add resources at the beginning of the project. The later you wait to do either of these things, the less the impact is on the deadline.
Fred Brooks wrote the famous book, The Mythical Man Month, to describe the phenomenon where adding a resource to a late project makes it later. This is interventionalism: the feeling one needs to intervene in a complex system that often makes things worse.
In this case, we expect more people doing the work means more work getting done. A lot of labor is like this. Software is not. The new hire needs time to learn the code. Until the developer does, expect slower work, more bugs, and accidentally reinventing things.
This is the tip of the iceberg when it comes to interventionalism and software development. We all know last minute changes extend timelines. In some cases, even simplifying a feature can if some work is in progress or it changes assumptions for other pieces.
Asking the team to pull longer hours to get more work done also results in more problems. As you become more exhausted or more burnt out, your productivity plummets. Maybe a team can handle a few weeks of overtime, but death marches are futile.
Another common intervention is “switching things up” by making changes to processes or team structure. Led by the team through a retrospective on something that’s not working is often hit or miss. Sometimes the new change helps, sometimes they need to go back to the drawing board. The team I’m on isn’t afraid to revert a change if it’s not helping like intended.
But the worst kind of switching things up is when it’s mandated from outside the team working on the project. Always with great intentions, but most changes don’t move the needle. Many move it backwards. Long ago management told a large team I was a member of to meet as one big team for a standup. These standups were marathon sessions of 45+ minutes with a couple dozen people in them. The idea was everyone would then be on the same page. The reality was apparent in the zoned out looks of the people who were waiting for their turn to talk. The team said more, but people communicated less.
Here are my recommendations for building an antifragile product development team and process:
To me, the fundamentals to becoming antifragile are these tenets: prioritize adapting to change, choose the potential high-upside work, avoid intervening where you don’t need to, and structure around skin in the game. If you’ve got experiences or thoughts on the topic, or you’re looking to build an antifragile development team to bring your product to life, reach out and let’s talk.