Let’s say you manage a product that helps small businesses support their customers. You’re looking to improve customers’ engagement and retention levels. There are two ideas are on the table:
The dashboard idea came up a few times in talks with customers and you feel it has good potential, but there’s also a risk that only power users will use it. The chatbot is something that the entire company likes and management is quite bullish about — it feels like an big win for customers, it’s a cool project, and, yeah, chatbots are all the rage now.
Which one would you choose?
Photo by: Matthew Henry
Such prioritization questions are at the heart of product management. The penalty for choosing wrong can be quite high — cost of development + cost of deployment + cost of maintenance + opportunity cost + other residual costs. We are often tempted to make a decision based on weak signals: majority votes, highest-paid person opinion (Hippo), industry trends, etc, but those have been shown time and again to be bad heuristics that are not any better than putting chips on a roulette table (hence the term “Big Bet”).
In this post I’ll demonstrate what I consider to be the best way to find winning ideas. It consists of three parts:
ICE score is a prioritization method invented by Sean Ellis, famous for helping grow companies such as DropBox and Eventbrite and for coining the term Growth Hacking. ICE scores were originally intended to prioritize growth experiments, but can also be used for regular project ideas.
You calculate the score, per idea, this way:
ICE score = Impact * Confidence * Ease
The three values are rated on a relative scale of 1–10 so not to over-weigh any of them. Each team can choose what 1–10 means, as long the rating stays consistent. Ultimately the goal is to have your idea bank look like this:
Let’s use the example to see this at work.
In this workshop I will walk you through the principles and tools of lean product management — how modern-day PMs drive business results and optimize for high impact. Early-bird tickets are available now.
You decide to calculate ICE scores for the two ideas, dashboard and chatbot. At this early stage you use rough values solely based on your own intuition.
So the dashboard gets an Ease value of 4 out of 10 and the chatbot a 2.
There is only one way to calculate confidence — looking for supporting evidence. For this purpose I created the tool shown below. It lists common types of tests and evidence you may have, and what confidence level they provide. When using it consider: what indicators do we have already, how many of them, and what we need to get next to gain more confidence.
Sidenote: If in your product or industry there are other evidence tests, feel free to create your own version of this tool, just be mindful about what present strong or weak confidence. For more background on confidence scores see this earlier post.
Let’s go back to the the example to see the tool in action.
The ICE scores:
The dashboard looks like the better idea at the moment, but the tool shows you haven’t gone beyond low confidence. There’s simply not enough information to make a decision yet.
Next you meet your counterparts in engineering and UX and together you scope out both ideas. Both projects seem feasible at first look. The engineering lead comes back with rough effort estimates: the dashboard will take 12 man-weeks to launch and the chatbot 16 man-weeks. According to your Ease scale this gives ease scores of 4 and 3 respectively
In parallel you do some back of the envelope calculations. With a closer look the dashboard looks slightly less promising and gets a 3. The chatbot still looks like a solid 8.
Using the confidence tool shows that both ideas now pass the the Estimates & Plans test and gain some confidence. The dashboard now moves to 0.8 and the chatbot 0.4.
The chatbot has closed the gap. Still confidence levels are low and for a good reason — these are mostly numbers pulled out of thin air, and you know you need to collect more evidence.
You send existing customers a survey asking them to pick one of 5 potential new features, including the chatbot and the dashboard. You get back hundreds of responses. The results are very positive for the chatbot — it is the #1 feature in the survey with 38% of respondents picking it. The dashboard comes in 3rd with 17% of the votes.
This gives both features some supporting market data, but the chatbot scores higher at 1.5. The dashboard also gets a confidence boost, but just up to 1.
The chatbot has moved strongly to the lead. Your co-workers and the industry seems to have been proven right. Should you pull the trigger now? Probably not — the project is quite costly and we only have medium-low confidence. Survey results don’t generate a very strong signal unfortunately. Keep working!
To learn more you run a user study with 10 existing customers showing them interactive prototypes of both features. In parallel you conduct phone interviews with 20 survey participants that chose one of the two candidate features.
The research reveals a more nuanced picture:
This qualitative research gives you some food for thought. The dashboard seems to be more popular than you expected. The chatbot now sounds more like a high-risk/high-reward project. Looking at the confidence tool you give the dashboard and the chatbot confidence values of 3 and 2.5 respectively. You adjust impact to 6 for the dashboard and 9 for the chatbot. Finally based on the usability study you realize getting chatbot UI right will require more work — you reduce Ease to 2.
The tables have turned yet again and now the dashboard is in the lead. You bring the results to your team and to your managers. Strictly based on ICE scores the dashboard should be declared the winner, on the other hand the confidence scores of both are far from high. Reluctant to let go of a potentially good feature the team decides to keep testing both.
You decide to start by building a min-viable product (MVP) version of the chatbot — development takes 6 weeks and you launch it to 200 survey respondents that indicated willingness to test. 167 enable the feature, but the usage drops dramatically day by day and by the end of two weeks you have only 24 active users. In follow-up surveys and calls a clear picture emerges — the chatbot is harder to use and far less useful than the participants had expected, and worse it antagonizes their customers who seem to value the personal touch. The feature actually causes the business owners to work harder. Analyzing the results you and the team conclude that launching a useful version of the chatbot that will meet customers’ expectation will require at least 40–50 additional man-weeks (Ease of 1) and has high risk. It’s also clear that far fewer customers will find it useful than first expected. You therefore reduce impact to 2. This changes the feature in fundamental ways so you can no longer trust the results of of the user study to confirm it, so you reduce confidence to 0.5 with the help of the confidence tool.
The dashboard MVP launches within 5 weeks to another 200 customers. The results are very good. 87% of participants use the feature, many of them daily with little drop off. The feedback is overwhelmingly positive, mostly asking for more. You realize the impact is higher than you expected — an 8. The engineering team estimates it will take another 10 weeks to launch the dashboard in full, so Ease of 4. According to the confidence tool you feel comfortable setting the confidence value to 6.5 out of 10.
At this point the prioritization is very easy indeed. No one disputes that the dashboard is the right feature to pursue next. You keep the chatbot in your idea bank in order to record the finds, but it naturally gets sorted to the bottom given its low ICE score.
This example illustrates how risky it is to bet on a high-effort features based on gut-feelings, opinions, themes, market trends, etc. Most ideas are more like the chatbot than they are like the dashboard — they underdeliver on impact and cost much more than we think. The only real way to find winning ideas is to put them to the test and reduce the level of uncertainty.
This may seem like a laborious and slow way to build products, but it’s actually much more efficient than the alternatives. Not only does confidence testing eliminate most of the wasted effort spent on bad ideas, it also focuses the team on short and tangible learning milestones with immediate measurable results, which improves focus and velocity. Through the process we learn a lot about the product, users, market, and end up with better end-product that has already been tested by users. We are therefore rarely surprised at launch day and need to make far fewer fixes post-launch.
In reality we often need to choose not between two ideas, but between dozens. By limiting the effort we put into each idea based on our level of confidence in it, we allow ourselves to test many ideas in parallel, avoiding the pitfalls of traditional big-bet development— see my post on GIST planning for more on this.
In this example the team is testing 4 ideas in parallel by running several step-projects, each incrementally building a bigger version of the idea and testing it for higher confidence.
Here’s what worries people most when I explain this topic — how to get mangers and stakeholders to buy-in? Can we really get them to limit their god-like powers over the product? Well, you’d be surprised. I hear a lot from managers that they prefer not to be the deciders on product matters, but they feel compelled to get involved as the team is presenting them with weak options. What is weak or strong is of course subject to opinion, unless you show up to the review not just with a polished pitch deck, but with real evidence and clear confidence levels. You might be surprised how much easier the conversation gets. On the flip side, the next time your CEO surprises you with a new must-have pet idea, try to show her how her idea is evaluated— how much impact, effort and confidence we can give it, how its ICE scores stacks up against other ideas, and how we can test to gain more confidence. Most reasonable folks will agree that that’s a good way to go about it. If they’re still not convinced, send this blog post to read and asked them to leave a comment (or message me @ItamarGilad), and I promise to fight the good fight on your behalf.
Itamar Gilad (itamargilad.com) is a product consultant and speaker helping companies build high-value products. Over the past 15 years he held senior product management roles at Google, Microsoft and a number of startups.
If you prefer to receive posts like these by email sign up to my newsletter.