We think of Software Development as a pursuit grounded in logic. From this perspective it can be shocking that software projects have high failure rates - 31% in 2014. And some projects fail to the tune of billions. How is this level of chaos possible in an apparently logical discipline?
Making software requires a delicate combination of both planning and structure as well as experimentation and creativity. We can handle projects more safely if we better recognise this balance.
Recognising the creative side of software will also help us understand why estimation is so hard, why there can be large differences in developer performance and why metrics like lines of code produced can be misleading as measures of productivity. We'll also get a new perspective on why agile projects fail less often and why politics can be so prevalent on software projects.
Famous examples of computing problems are a big part of our picture of software and these problems are strictly defined. Travelling salespeople in textbooks are subject to more constraints than in real life (they rarely get lifts or share jobs with colleagues or bunk off for a drink).
(Image from The Unsolved Travelling Salesmen Problem.)
We tend to heavily associate software engineering with examples about breaking secret codes. Not so much with less exciting examples like assessing insurance claims.
But software developers are not all academics and do not all work on well-defined problems. Professional software development is about codifying rules to produce a system that meets a set of business objectives. It aims to produce artifacts of logic that will serve purposes in a messy world.
The messiness of the world has an important hand in the process of making real-world software. Projects can go wrong by making incorrect assumptions about the world or being confused about project direction.
This gives us some perspective on the high failure rates of software projects. But only some. How does it come about that software projects can be so risky and fail so often? How do we go about writing real-world software?
Software projects can vary immensely. It’s hard to hold a clear picture in one’s head of the process of making software. To an extent one simply has to learn from experience. But having a clearer picture of software development will help us better appreciate the sources of risk in software development and avoid misunderstandings.
Given how abstract software development is, it can be tempting to fall back on analogies to more concrete disciplines. Especially tempting is an analogy to construction or physical engineering. This analogy has historically influenced software project management methodology (especially on the waterfall model). But we should be cautious about this:
"I'm very suspicious of using metaphors of other professions to reason about software development. In particular, I believe the engineering metaphor has done our profession damage - in that it has encouraged the notion of separating design from construction." Martin Fowler
As Fowler points out, it is dangerous to think we know in advance how to approach a problem without building anything first.
It is also dangerous to think we know in advance what the tools are and this is another disadvantage of the construction metaphor. Contemporary commercial development relies a lot on existing tools and libraries which have to be chosen. Programmers don’t just write code to solve problems - programmers do a lot of writing code to stitch together the partial solutions of others. These choices are important as it may be difficult to make a tool work for a particular purpose.
Construction does also involve certain problems of tool selection. For example one has to choose types of brick or material for load-bearing and durability. The difference with software is the tools themselves are conceptual so the boundaries of what they do are harder to see. It is not unusual to discover down the line that a chosen tool doesn’t support the particular data format or communications protocol that we wanted it to.
Unchallenged early assumptions about tools can lead to overruns and contribute to failures. But the big risk is unchallenged assumptions in the business objectives. Here again the construction metaphor doesn’t reflect the risk. There is scope for mistaken assumptions to slip through about whether a building design provides enough space for the number of people who will use it and whether there are enough elevators. But visual plans help to bring out assumptions. Since software is abstract by nature it is difficult to make it concrete enough to flush out assumptions without entirely building the solution (by which point all the cost has been incurred).
Refining problem specifications to flush out assumptions can be more art than science. Refining specifications and choosing tools to address them are both central to what software developers do. Risk is inherent to both. Let’s try to articulate a metaphor that expresses this better than the construction metaphor.
Software development is both logical and creative (and therein lies its chaos). There is a class of problems which are both creative and logical - the class of 'puzzles' or ‘riddles’. Creative problem-solving can serve as a good metaphor for commercial software development. We can see this by considering a particular story of puzzle-solving from the old Norse saga of Ragnar Lodbrok.
As the story goes, Ragnar’s men report to him that they have seen a very beautiful woman, Kraka. Ragnar’s interest is aroused and he sends for her, but he decides to test her wits. He commands her to arrive neither dressed nor undressed, neither hungry nor full, and neither alone nor in company. Kraka is up to the challenge and arrives draped in a net and her long hair, biting an onion, and with only her dog as a companion. Ragnar is impressed and Kraka and Ragnar get married.
(Image from Ancient History Encyclopedia.)
Ragnar’s request is not unlike the specification of a software project in a key respect. Ragnar does not entirely know what will satisfy his request until he sees it. He knows certain constraints that will need to be met but he will have no idea that the solution Kraka finds is a possible solution until he sees it.
To see the connection more clearly, let us imagine that Ragnar does not immediately marry Kraka. Instead he next specifies that Kraka should first prove her worth in an even more heroic exercise. He asks her to produce a system which will process the documentation approvals and rejections for which he currently employs a whole team.
At the point of specification, Ragnar has little conception of how that task might be achieved without the need for a team. Much as with his previous riddle, Ragnar does not know it could be possible to process the documentations approval and rejections without a team. He currently trusts his team to make judgements. There may not be strict rules about the documents - perhaps documents are approved only if they are clear, entertaining and/or well-presented. He needs to be shown the automated solution before he can know whether it does what he currently calls ‘documentation approval and rejection.’
In both challenges Kraka has to take Ragnar into a kind of unknown territory. Ragnar is not sure of what things he would call ‘neither dressed nor undressed.’ Similarly, Ragnar is not sure what he will call ‘documentation approval and rejection’ without a team performing the approvals/rejections. Kraka has to come up with something that Ragnar will call a solution.
The basic uncertainty that goes along with software projects is that we do not fully understand what the solution to our problems will be until we have a completed solution. If all goes well then one sees the solution take shape as one goes through the project.
This is the key difference between software projects and manufacturing projects. With a manufacturing project one can form a clear picture early on of the product to be produced. This is because the product to be produced is a physical thing. The solutions to software problems are conceptual.
If one is drawing up requirements for a physical product or structure, those requirements are easier to understand from the beginning than is the case with a software project. For example, if one wants to build a bridge then it is fairly clear from the beginning what it means to say that it needs to take a load of X many cars. Software requirements, by virtue of being abstract, are more likely to be hard to understand and can sometimes look like they are perfectly clear but later turn out to be unclear.
Let's go back to Kraka and Ragnar and imagine that Kraka arrives accompanied by a rabbit instead of a dog. Ragnar might say that he will not accept this as a solution – he thinks that arriving with a rabbit does not count as being accompanied at all. If this seems implausible or unfair then perhaps imagine that Kraka arrives with a goldfish instead. It is plausible that we can find circumstances where Kraka and Ragnar will take different views of what counts as being ‘accompanied.’ But it is Ragnar’s view that counts since he set the problem.
A parallel of the dispute between Kraka and Ragnar takes place on software projects - developers can sometimes take a different view of what fits the problem to what end-users will find acceptable. Kraka could have sought to forestall such an issue by probing the problem further from the beginning – she might have done an equivalent of requirements analysis.
We can imagine Kraka putting a business analyst hat on and asking questions like ‘Do you mean accompanied by a human being? What about a pet?’ But even after much requirements analysis these types of ambiguities can remain. Even if it does become clear that Ragnar would accept a pet, it may not be made clear (not even to him until he sees it) that he would not accept a goldfish.
Solving logical puzzles is a creative task. We have to try out ideas and evaluate them. With the size of puzzles we solve in modern software, this involves lots of aspects and job roles and that gives rise to a lot of complexity. This is at the heart of why making software is so difficult so it's worth considering a more concrete example.
Let's consider a simple account creation screen like we see in many systems.
This sort of page can be deceptively simple. It can involve lots of interactions with other systems - see the red text on the right. It's so easy to get wrong. Imagine somebody talks to a department manager and comes up with this. The developers then spend ages building it. Then users on ground say it’s over-complicated. They don’t even want all the details the page is throwing at them.
These days we try to be Agile and start with the simplest thing that can be put in front of real users. We establish good feedback loops to iterate. Achieving that crosses lots of role boundaries. Who can best say what’s the simplest thing to put in front of users first? Who will find out about the other systems involved? Who can get the feedback going with the users? Who can communicate the user needs? How will competing needs be prioritised? Making software requires a range of roles and involves tasks that might not fall clearly under a particular role.
That means you need lots of people involved. People with different skills and points of view. They all need to be co-ordinated towards a common goal but the path towards that goal is not fully clear - it requires experimentation.
This is what makes making software so difficult for modern business. To bring lots of people together we need structure and a clear common direction. But we can't be too clear about the direction as we have to experiment in order to solve the problem.
Thinking of software problems as puzzles has other benefits too. If we think of software problems as open-ended puzzles then it makes sense that estimation is hard. How could Kraka know in advance how long it would take to solve Ragnar's riddle? If she'd solved similar riddles before then she might have a good idea. But she would still struggle to be sure as she couldn't know in advance what solution would work for Ragnar.
It also makes sense that there can be big differences in performance from one developer to another. The skills are highly niche and require drawing on a lot of knowledge and employing creativity. Large performance differences between individuals have been observed in other fields with these elements too.
There's a common perception that developers who write more code or deliver more features are more productive. But the puzzle-solving perspective challenges the assumption behind this about what counts as output. The true aim is to solve business problems. That means that the metric that should be the basis of productivity is business problems solved. Everything else risks being a misleading proxy.
You wouldn't want developers writing lots of code to solve a problem that could be solved more simply with fewer lines. Nor rushing a solution out that might meet a specification but which doesn't add value to the business. Even measuring features delivered is a risky metric as you also need to know that you're delivering features that solve the key problems for your business. These might be metrics like how features lead to sales, interactions on the site, orders processed etc. Getting good metrics for product value may not be easy but it's important that product value is the real objective and the true basis of productivity.
The puzzle perspective can also shed light on why Agile projects are more successful than Waterfall ones.
(Image from visual-paradigm.)
Agile projects benefit from working iteratively, making more use of prototypes and putting early versions of software in front of users for improvement. On the problem-solving perspective, this benefit from feedback makes perfect sense. A need to resolve ambiguity in the problems being set is something we should expect. Managing ambiguity is at the core of commercial software development.
This also makes sense of why politics can be so prevalent on software projects. Politics stems from disagreements about how to proceed. It is natural that disagreements will arise when it is unclear what would count as a solution or how best to get to one.
I personally find the logical problem-solving metaphor a useful fallback when I'm tempted to look for a definite answer to whether a particular design will 'work' or a chosen tool will fit a particular need. Sometimes we can't be sure of a solution in advance of providing it. We might very much want that upfront certainty but we just can't achieve it.
I also find the metaphor useful when there’s a lot going on in a project and I need to better see how the project is operating and where it is trying to get to. If we see software projects as exercises in collective problem solving then we refocus our thinking away from charts and plans and instead towards the people and motivations driving the project.
This article is based on my 'Why Making Software is so Difficult', ACM SIGSOFT Software Engineering Notes 39. Title image Grete Stern Dream 15, 1949