Estimation — how can we estimate with confidence in software delivery?

Estimation — how can we estimate with confidence in software development? Introduction ‘When will it be done?’ is one of the most common and difficult questions to answer in software development. Estimating in software is traditionally difficult, inaccurate most of the time, with project teams spending a significant amount of time on the process. There are a number of contributing factors to this. Part of it is that software development is a design activity, and thus hard to plan and estimate. Software is done with people, and it depends which individual people are involved. Individuals are hard to predict and quantify and humans in general are inherently bad at predictions [1]. Those teams that build the software are operating in an ever-changing business and technology landscape. The cost of estimation is typically high, resulting in teams either postponing the estimation altogether until becomes too late, or not readapting the estimate when they are in possession of new information. Estimation is not a process that can be automated, a magic formula cannot be created for it. It requires skills and knowledge and an understanding of the estimation process is critical. This paper sets to introduce aspects that should be considered when putting together an estimate, with an aim to reduce the cost of estimation and increase its accuracy. A note of cautions though — these principles and techniques should not be seen as exhaustive, nor as the only way to estimate. Context, knowledge and skills are important. I. Guessing, Forecasting, Estimating Before we move forward, we need to clarify some of the language used in estimation. Troy Magennis in his book “ ” highlights a subtle difference between forecasting and estimating. is estimating in advance, and is carefully forming an opinion, calculating approximately. is taking a stab in the dark by having little or no information about a topic. Forecasting Using data Forecasting estimating Guessing Forecast To predict or calculate (weather, events, etc), in advance. [2] Estimate To form an approximate idea of (distance, size, cost, etc); calculate roughly; gauge. [2] Guess To form an uncertain estimate or conclusion (about something), based on insufficient information. [2] To highlight the subtlety between and , consider for example the difference between asking to estimate the time of the day hours after consulting a working watch vs. asking to estimate the likelihood of a next day scheduled airplane to leave on time. estimating forecasting For estimating the current time, even though the recent information that we had is out of date, we can use it to estimate the current time. Given that the question is in present and there is an actual correct answer, this will be an estimate rather than a forecast. To answer the question of the plane leaving on time, the question is in the future, there isn’t yet a correct answer, thus will be a forecast. To forecast, we can use historical data of previous departure times combined with other pieces of information, such as the weather forecast or other significant events for the day. Forecasting is estimating in advance. All forecasts are estimates, but not all estimates are forecasts. Forecasting is carefully answering the question about the future, to a transparent degree of certainty, with as little effort as possible [3]. II. Avoiding estimation If estimation is so difficult, why estimate in the first place? If it can be avoided, then it should be avoided. Kent Beck, the creator of XP Programming said “ ”. Alternative to estimates: do the most important thing until either it ships or it is no longer the most important thing Other alternatives include the one advocated by Gojko Adzic. He proposes the use of budget as a design constraint for the delivery team, similar to other non-functional constraints such as scalability or performance; the delivery team is asked to come up with a solution that fits the budget constraint [4]. However, there are situations where these or other non-estimation techniques cannot be applied, and the need to know How long? needs satisfied. The rest of paper will focus on these situations where an estimation cannot be avoided. III. One question, in two flavors For completeness, the ’ question comes usually in two flavors: ‘How long will it take? having a scope in mind and a start date, our clients ask us ‘ ’ How long will it take to complete (the scope)? Or having a start and an end date in mind, our clients ask us ‘ ’ How many items can we build (in this time-frame)? The forecasting techniques presented in this material are applicable to both flavors. IV. Deterministic vs probabilistic Dan Vacanti in his book ‘ ’ [5] advocates that a mental shift needs to happen in the software community in the way we produce forecasts, a move away from a approach to a one [5]. When will it be done? deterministic probabilistic A forecast predicates that there is only one possible outcome to a problem, therefore such forecasts provide only one answer where 100% certainty is assumed (e.g. ). deterministic ’we will complete on 1st of December 2017' A forecast on the other hand accepts that there are multiple possible outcomes, and such forecasts will produce multiple answers, with each answer accompanied by a confidence level (e.g. ). probabilistic ’we have a 75% confidence level to complete by 1st of March, and 85% confidence that we will complete by 17th of March’ Weather forecasting Dan makes a great analogy between weather and software forecasting. Forecasting in software development is not too dissimilar to forecasting the path of a hurricane. Same as the storms, software development can be influenced by many factors outside our control and it is full of uncertainties. A approach to software forecasting is more suitable. probabilistic Hurricane forecasting based on a probabilistic approach. Note the message on the forecast (top section of the image): “The cone contains the probable path of the storm center but does not show the size of the storm”. V. Forecasting using models The Wright brothers became more successful than others in building airplanes because they built models which were tested in wind tunnels. They overcame uncertainties by testing those models putting them through wind tunnels. Similarly, to forecast in software we can build models for our situations, and use those models to simulate the uncertainties we face. It’s like putting our plans through a wind tunnel [7]. Before we look in more details at the elements of a , it is worth considering what George Box, one of the great statistical minds of the 20th century said: ” [6]. Forecasting model “all models are wrong, some models are useful We should not take all models are as gospel, however even very approximate models can help us think about a problem, they tell us more than gut feel alone. A model should be designed around things we don’t understand, rather than things we do. “A successful model tells you things you didn’t tell it to tell you” — Jerry P. Brashear, Washington, D.C., consultant. VI. A software forecasting model The building blocks of a Forecasting model are: 1. A start date 2. A delivery team 3. Work that we want to do 4. A working method (necessary to complete the work) 5. Data (or lack of) about size of the work and speed of delivery 1. The start date One of the most common errors in forecasting is the use of a wrong . Before we use a start date we need to make sure that the conditions to the work are met, such as the team is in place, ready, having all the skills and tools required to do the work. start date start effectively If these conditions are not met, then it would be wise to publish the forecast as a . Attention should be paid though if the likely duration could overlap with periods typical holiday periods. duration only 2. The team When we look at the team selected for the work, we should consider whether the team has the right skills to do the work, whether there is a good spread between those who can teach & create vs. those who can do & maintain vs novice and learners [3]. If the skills are not present, or if the team is not complete, then we should take this into account in the ramp up phase of our forecast (see more under the ‘Delivery Pace, S-curve’ section). The team has to have a good spread of knowledge and skills. Ask yourself what happens if certain team members go off sick. What about the pace of delivery of one team over other? Not all teams are the same, and if we know that one team is more effective than the other, then should we take this into consideration while we are building our model. 3. The work We are estimating because we want to know when we would complete . The work can vary, from building something brand new, to enhancing existing solutions, to supporting existing ones (or a combination of the above). some work The nature of the software is that at times we don’t even know what the would entail. Even when we do know what we want, we might not know how the final solution would look like, or how we might build it. work Work is made up of discovered work, discoverable work and undiscovered work, work that we might not even know that this exists. This is also called ‘unknown unknowns’. Work is not just the scope; it is also the time spent discovering scope Once we discover the solution, things get a bit easier, but we are faced with other challenges when we start building it, such as the complex interactions between people, the various external constraints that are imposed on us (time and/or money, resourcing, processes), the unexpected events that we need to deal with, the dependencies that we need from others and others need from us. We make new discoveries that might invalidate some of the early findings and the original solution might not be fit for purpose. Typically, work is referred to as , and at times these terms are used interchangeably. The small subtlety is that scope might imply knowing what we want. As described earlier, at times this might not be the case, and work constitutes discovering of scope as well. When we build our model we need to take this into account as well. For the purpose of this paper, we will use the term of work that encompasses the scope plus the discoveries needed to arrive to the scope. scope 4. The working method For the work that we want to perform, we need to select a working method, a way of working that helps us to complete the work. This method varies based on the nature the work and we might find ourselves constantly adapting the method to suit the type of the work we’re dealing at a given time. To help navigate this landscape, we can consider: into different phases: Discovery, Alpha, Beta, Live slicing the work that is best suited for these phases, remembering that in certain situations more than one method would be applicable. These methods can range from well-structured methods (such as Kanban or Scrum) to less structured (such as a time-box spikes or experiments). For instance: choosing a methodology for , we can use experiment based techniques to surface emerging solutions. These could include activities such as user-research, technical spikes. Discovery for , the use of prototyping and further experimentation can be considered Alpha for , methods such as Kanban or Scrum could be used to implement the findings from Discovery and Alpha Beta Slicing considerations There are two types of slicing, one supporting the other. One type is concerned with . Scope tends to start off in a , and we need to work to transform it in sliceable pieces. slicing of work into features fuzzy state Scope evolution, from Fuzzy to Sliceable These features should be created as manageable chunks, taking in consideration the following: can each slice be allocated to a team? can development of those slices happen in parallel? We refer to this type of slicing as of the scope. decomposition strategy These features tend to group functionality in , which get further broken down into Features/Epics User stories and Tasks. and are . These items have different granularities, having higher granularity than User stories. These need feed into our model. Features/Epics, User stories Tasks work items Features/Epics work items The other type is concerned with . This is a sequencing concern driven by release scheduling (answering the question ’). slicing the work per phases ‘what to release in what sequence? Sliced scope get scheduled for build and release. Each release can have its own phase Discovery, Build, Private Beta, Public Beta A careful consideration needs given when slicing per phases is done. Consider whether multiple releases will be run in parallel, each with its own phase. What skills and team members will be covering phases such as Discovery, and what will be covering Build? Is one release being run by one team, or multiple teams work on the same release? Visualizing release cycles and phases per cycle helps building the picture for your model. Measurable process When choosing a working methodology, we should favor those processes that are measurable. A measurable process gives us essential data for our model. What are we measuring? We are taking measurements on work items, measuring how long they take to complete (time), what is the rate of completion (speed). For instance, in a Kanban method we can easily measure three simple metrics, such as (which answers the ’ question) and (which answered the ’ question). We can apply this measurement to all work items. work in progress, cycle time ‘how long they take complete? throughput ‘what is the rate of completion? Example of a measurement output — these outputs will be fed in our model How to measure these metrics is outside the scope of this paper, and it is well described in Dan for Predictability book. Vacanti’s Actionable Agile Metrics These metrics, especially the , are important elements of our model. delivery pace (throughput) 5. The data The final element of the model is the data, a numerical representation of concepts introduced previously: — expressed as aggregation of work items. For example, this can be the total size of a backlog. size of the work — the rate at which teams delivers the work items. For example, in a Kanban system this is the throughput. delivery pace Some important aspects about the data need clarified. Questions such as “ ” need an answer before proceeding. How can we obtain the data? Do we always have data? Is the data fixed? Can we trust the data? We can obtain the data by working continuously on: understating the size of the work — finding out the number of work items required to complete the work measuring our rate of delivery — measuring our delivery process We should emphasize the nature of data gathering. As our understanding of and we build expands, the size of the work increases or decreases. Similarly, our rate of delivery can change in time. continuousness what how The way to deal with this situation is to express the data points as (e.g. ’ and ‘ ). These assumptions should be clearly stated and well communicated. assumptions ‘we assume that we need to build between 20–35 user stories we assume that our rate of delivery is between 10–12 stories per week’ At times we have very limited, to no data at all. For instance, it is quite common at the beginning of new deliveries that we have no data about the rate of delivery. For these situations, we deal similarly as when we have data: we capture our assumptions on delivery rate; these assumptions might be mere guesses, or views of experts based on similar experiences we start the delivery, start measuring our process, and as soon as we have enough data points we replace our initial assumptions with . Both Dan Vacanti and Troy Magennis recommend that we only need between for this transition. the measured assumptions 7–11 data-points What about ? Our assumptions could carry significant risks, on both the size of the work, as well as on delivery rates. If those risks materialize then our forecasts could be significantly impacted. risks To deal with this situation, we should work on turning these risks into additional data-points. For instance, we could express our risk as “ ”. we have a 50% likelihood that our scope will increase between 12–20 stories if the performance results breach our page load NFRs If risks are turned into data, and those are part of our model, then the negative impact on the forecast accuracy is reduced. Delivery Pace S-curve A special mention needs to be made to delivery rate’s characteristics. The delivery rate throughout the life-cycle of a feature, release or project can wary, at times materially significantly. For instance, it is quite common for projects that the plotted delivery rate over time takes the shape of an , showing a slower rate at the beginning and end of the project. S-Curve S-curve representation The S-curve is made up usually from multiple phases: (or starting) pace as the team is forming and learning, or transitioning to new work a ramp-up pace, which is a sustainable delivery rate once we reach steady-state a stride pace as the team is in the final delivery phases a ramp-down Observed over a long period of time, even if a team is maintained and doesn’t need to ramp-up, the S-curve can manifest itself as the team transitions from one feature to other, a “rollercoaster” of delivery rates. This has can have a significant impact a longer term forecasting. Delivery rollercoaster The S-curve needs taken into consideration while building our model. A “rollercoaster” effect can have a big negative impact on delivery, and best to be avoided. If this cannot be achieved though, then is needs modelled. The role of the system predictability The quality of our data determines the accuracy of our forecasts. The more predictable our system is, the more we can rely on our data, the more accurate our forecasts will be. Therefore, teams should pay attention on building predictable working systems. How to build such systems is outside the scope of this paper, however teams should not forget about this important aspect. Dan Vacanti in his book describes one way of building predictable systems using a Kanban system. Actionable Agile Metrics for Predictability VII. Forecasting is a continuous process Forecasting is a continuous process. As soon as we have new information about our model we should re-forecast. To be able to rapidly re-forecast and allow the team to perform a number of ‘what-if’ scenarios, we want to keep the cost of forecasting low. “Goal of the forecasting is to know earlier than later if we’re in trouble”, Troy Magennis [3]. If the cost of reforecasting is high, likely it won’t get done. Make short and long term forecasts. Shorter forecasts will be more accurate than longer ones. VIII. Putting the model through wind-tunnel Now what? We have the elements of the model, now we need to put them together, build the models and put them through the ‘wind tunnel’, as Wrights brothers did with their model aeroplanes. An important tool in the wind-tunnel arsenal is the , which we need to introduce before moving forward. Monte Carlo simulation A brief introduction into Monte Carlo simulation Sam Savage in his book [7] defines eloquently the Monte Carlo simulation. “The Flaw of Averages” The last thing we do before climbing on a ladder to paint the side of our house is to give it a good shake. By bombarding it with physical forces we how stable will be when we climb on it. random simulate A is a computational technique similar to shaking the ladder to test the stability of uncertain plans. The technique bombards the model with thousands of random inputs, while keeping track of the outputs. Monte Carlo simulation Monte Carlo simulation is a statistical sampling simulation technique, and it is made up four steps: define a domain of possible inputs generate inputs randomly from domain perform a computation aggregate results This can be translated to the model we’ve described in this paper as: 1. domain of possible inputs — this is the (throughput) delivery rate 2. start a simulation having the selected ; set the same as the start date end date start date 3. generate inputs randomly from domain — select randomly a delivery rate 4. perform a computation: use the randomly picked delivery rate and deduct the delivery rate from the number of that we need to complete; work items increase the end date of the simulation by 1; check if the remaining number of items after the deduction is a zero or negative value — if it is, then stop the simulation, otherwise continue with step 3 5. once has been completed, we have completed one simulation and have one delivery date step 4 6. we continue creating thousands of simulations by repeating steps 2–5 7. once all the simulations are done, we create an aggregation by grouping the number of simulation together and calculating the % of simulations for each of the dates end dates The aggregates results look like a histogram. Output of a Monte Carlo simulation as a histogram Alternatively, we can present the output of a Monte Carlo simulation in a table like format. Output of a Monte Carlo simulation as a table The % of simulation completions act as our confidence levels. In the spirit of transparency and using a probabilistic approach, when we present back our results we should consider: , such as — “based on the assumptions behind the forecast ) we have: 1. present a range of date ( a 75% confidence level of completing by 23rd of October, 2017 a 85% confidence level of completing by 26th of October, 2012" — “ the risks ( ) increase the duration by 25%. Let’s look at them, and see how we can eliminate them.” 2. surface the impact of risks We work with the client to choose acceptable confidence levels for the project. What do experts say? Avoiding estimation is best. If estimation cannot be avoided, then building models that can be forecasted using Monte Carlo simulations is good starting point. What about expert estimation? Reach out to experts. Ask for an expert estimate. Carefully blend in expert estimates with forecasts. Diversity on thinking mitigates missing a big risk item through cognitive bias or unconscious incompetence. Finally, putting it all together To put it all together, build a series of models, aiming to learn new information from them. Define what work is, slice the work per phases. Identify what method of estimation can be applied for each phase. Sometimes the only thing we can do it is to time-box a phase. Define your . At times, this is done in parallel with the , sometimes not. decomposition strategy phase slicing Form your teams, paying attention to start dates, availability and skills. Allocate work to teams. Forecast where possible using Monte Carlo simulation. Offer ranges of dates using different confidence levels. Show the impact of the risk on the date. Blend in carefully expert estimates. Don’t forget about the , or the . S-curve rollercoaster — all put together. A team is using a working mechanism to discover, slice and build the work. They measure the work and use the data to forecast likely delivery dates. When they discover something new, they reforcast, everytime. The model Visualize the plan. Ask other stakeholders to look at it. Is the sequencing right, have dependencies been identified? Plan expressed as a roadmap Good luck! Acknowledgement Special thank you for Martin Aspeli ( ) for feedback and edit of this article. http://martinaspeli.net/ References [1] Planning Fallacy, https://en.wikipedia.org/wiki/Planning_fallacy [2] Definitions from Collins Dictionary [3] Troy Magennis, “Forecasting Using Data” [4] David Evans and Gojko Adzic, “Fifty Quick Ideas to Improve Your User Stories” [5] Dan Vacanti, “When will it be done?” [6] Quote also attributed to Edwards Deming, ‘father of quality management’ [7] Sam Savage, “The flaw of averages — why we Underestimate Risk in the Face of Uncertainty”