Part 2 of the series “Theory of Constraints in software startups”
This is the second post in a series. If you haven’t read the first one yet, I recommend you check it out here, otherwise some of the ideas might not make sense.
With an understanding of our system, its goal and measurements we know what kinds of work we shouldn’t be doing: work that doesn’t contribute to the goal. The next step is to figure out how to do work we should be doing more efficient and reliable.
I prefer to open with techniques I tried that didn’t work. Starting with the most common.
American Dream Syndrome
If everyone works as hard as they can all the time, we get more work done.
This is the most wide-spread conviction I notice among managers. Here is a story from one of my friends, let’s call him Tim.
Tim is an engineering team lead. The structure of the company he is working in changed after a merger. His new superior has a peculiar approach to management. The main document they use at their one-on-ones is a spreadsheet that looks something like this (I don’t have the actual document, I reproduced it from his words):
Employees in rows, workdays in columns. Cells of this spreadsheet represent how much work does an employee have on each specific date. How much work is assigned to them relative to how many hours they have in a workday.
Tim’s manager wants my friend to make sure numbers on that spreadsheet never go below 120%. Every employee should have 20% more work assigned to them than they can handle… every day. You know, just in case they finish early.
I think psychology is yet to discover that disorder I call “American Dream Syndrome”. Managers fall victims to it all the time. Good news is, it’s easy to diagnose, bad news — it’s hard to cure. Here are the symptoms:
- Your manager is often sour around you;
- They are more curious than usual about what your team is working on and when will it be done;
- They are sending you lengthy emails or slack messages talking about how you should push your team to give all they’ve got.
- They use loaded concepts like “duty”, “success” and “competition” often. For example “We can’t just sit around doing nothing, we’ll get beaten by competitors!!”
- In other words, your manager is being a pain in the ass.
I won’t go in depth on how we can treat the syndrome in this article. But I will try to show why working at 100% of your capabilities is bad for business.
Let’s run a simulation. Say we are managing a team in a startup. The goal of the team is to satisfy frequently changing business requirements. We have one product owner (PO) who writes specs, four engineers who implement them and one QA person verifying that features work as expected.
- PO can make 3 specs per day;
- Engineering can work on 4 tasks in parallel (1 task per person) and it takes about 2 days to implement a feature;
- QA can verify 2 features per day;
- After we have finished at least 5 tasks QA spends one day testing a Release Candidate (RC). Once RC is cleared — tickets move to “Done”.
Simulation is running in perfect conditions, no rejections from QA, zero variation in people’s performance, no disruptions. Tasks are processed in a first-in-first-out manner.
We’ll measure team’s performance using metrics from the previous article:
- Lead time per task — the number of days from the time PO starts working on a ticket until the time it is released.
Since the goal of the team is responding to business requirements fast, we can take lead time as a measure of throughput. The faster the team processes work — the higher the throughput.
- Inventory aka work in progress over time;
- Operating expense of the team is constant, we won’t measure it.
Here’s a slide deck with this simulation:
- After initial ramp-up of 8 days the team is stable and is releasing every 4 days;
- Work in progress grows during the whole period. Day 10: we have 24 tickets in progress. Day 20: 37 tickets;
- Lead times grow with every release. It took on average 6.5 days per ticket in the first release and 13.5 days per ticket for the last release. Almost double the time!
- If we run the simulation non-stop, lead times and inventory will approach Infinity.
Everyone is busy 100% and in the long run Infinitely ineffective.
Keep in mind — this team is “perfect”. Everyone works at 100% of their capability. Still all our metrics get worse and worse with every release. What is going on?
You can not afford to be busy
First, let’s tackle the problem of disappearing days. Why do we have longer lead times if the release schedule is constant?
Take a look at the fastest (#4) and slowest (#21) tasks.
The fastest (#4) spent:
- PO in progress stage: 1 day
- DEV waiting: 0 days
- DEV in progress: 2 days
- QA waiting: 0 days
- QA in progress: 1 day
- QA waiting RC: 1 day
- QA RC: 1 day
- In total: 6 days
The slowest (#21) spent:
- PO in progress: 1 day
- DEV waiting: 4 days
- DEV in progress: 2 days
- QA waiting: 3 days
- QA in progress: 1 day
- QA waiting RC: 1 day
- QA RC: 1 day
- In total: 13 days
Here’s our answer. The slowest ticket spent 7 days more waiting for Engineering and QA to become available. We have overloaded the system by putting in more work that it can process.
Do we live inside of a simulation?
We have simulated work within one department because it’s easy to show on one Trello Board. Of course the same principle applies to a company as a whole.
On a scale of an organization departments are system’s “work centers”, and customers’ orders/projects/requests are “tasks” moving through the system.
I’ve seen the issue of overload in every company I’ve worked at. Sometimes I was the one causing it. In the past as a team lead I was making sure everyone had work to do at all times. Here is what this type of management often leads to:
- We’re optimizing for quantity, so quality goes down;
- The team becomes unreliable, predictions — impossible. Every job can take in between days and months;
- Team leads and senior management jump in to lobby, put pressure and micro-manage. Priorities change daily;
- Team members switch from task to task based on the changing priorities. Multitasking leads to more delays;
- By the time delayed tasks do get processed, there’s a high chance the work that was done before is outdated. For example customers’ requirements might have changed or developers need to rebase the code on top of the recent changes. That’s more time lost;
- From an accounting point of view, work stuck in the system is like investments with ever-increasing pay-off date. Approximate this trend into the future to find when your startup will have all its funds tied up in inventory. We can mathematically prove such a company will go out of business, how cool is that?
In the real world things are always more dramatic than in simulations.
If we make everyone work as hard as they can, finishing projects becomes harder and harder. Management would often misjudge it as a resource allocation problem and will push for hiring more people or spending more money. If they get the resources they think they need, it might help… temporarily. Once inventory is at a high-enough level, waiting time will rise, projects will be stuck in the system again, the company will be back at square one.
Hiring people or spending more resources won’t solve this issue.
Somebody solved it before
This is one of the mantras I used as an engineer. When we encounter a problem, there’s a high chance someone already solved it. If only there was a StackOverflow for managers…
In manufacturing Taiichi Ohno and his peers established Toyota production system (TPS) in the 80s. One of the practices used in TPS is Kanban. It addresses the same issues we’ve faced in the simulation — growing inventory and lead times, overproduction, low quality.
In the next article we’ll take Kanban for a spin in our simulated environment and see what it’s good and not so good for. We’ll also get a glimpse of the ideas that go beyond Kanban and help us manage the flow of work in infinitely complex systems.
I’d like to thank people that shared their experience and useful insights with me. Their inputs are the foundation of this series. In no particular order these people are: Stefan Willuda, Ricardo J. Méndez, Ed Hill, Adiya Mohr, Conny Petrovic, Goran Ојkić.
Special thanks to Cristina Amate for the illustrations as well as her support and early feedback on the talk and articles.
I’m looking for opportunities to talk about Theory of Constraints in startups. If you’d like to recommend a conference or invite me to speak, please reach out: https://flpvsk.com