Runbook for How Uber Can Capture XX% of a $40 Billion Market Utilizing Voice Technology on the…

Before we begin to talk about Voice Technology, the UberEats platform, or the $40 Billion market, let’s first dive into Uber, itself. Here is a citation from Uber on how they got started and where they want to go.

It started as a simple idea: What if you could request a ride from your phone? More than 5 billion trips later, we’re working to make transportation safer and more accessible, helping people order food quickly and affordably, reducing congestion in cities by getting more people into fewer cars, and creating opportunities for people to work on their own terms.

These are the big problems that Uber wants to tackle and solve in the upcoming years. One of these problems is helping people order food quickly and affordably.

The product opportunity for Voice lies within helping people order food quickly.

Here is the current user flow for ordering food on UberEats.

The Two Major Pain Points

There are two main problems that users experience along the UberEats food ordering process.

Time being spent in indecisiveness and in inconvenience seem like small and trivial issues on the surface, but they’re really not. In fact, there is a significant amount of friction between what they want to do and doing what they want (relative to what could be accomplished via a voice interface).

Think About It: If it took 10 more seconds for every YouTube video to load, you would get very annoyed and irritated — objectively, its only 10 seconds, but once humans get speed and convenience, its very hard to go back to seemingly ancient options.

How Voice Will Enable People to Make Decisions Faster and Do Things Faster

Here is a series of example voice use cases on the UberEats mobile app.

Why Voice as an Interface is Inevitable

Humans have evolved to take the path of least resistance.

To illustrate how much humans love speed, lets take a look at how history has proven that consumer usage evolve towards the most reliable and frictionless interface

At the beginning, cursors on computers could only move vertically or horizontally through arrow keys on the keyboard
Computer mice enabled users to move the cursor from point A to point B in a fashion made up of the horizontal and vertical vector simultaneously
Smartphones, presently, immediately recognize area of intent and respond to it without manipulation of a cursor
Using your voice is the path of lesser resistance and we currently have the technology to make it reliably accurate of the user’s intentions

Failure to adjust to this reality could cause UberEats to lose their market share in the upcoming years, as competitors do begin to adjust

$40 Billion Market

Based on a study from OC&C Strategy Consultants, $40 Billion worth of sales in the US will be transacted through voice technology by 2022.

Link to Report

Given the vast platform, availability of resources, and product-market fit of UberEats, they are in an optimal position to invest in a long-term moonshot project that could lead to disproportionate business value — enough to significantly narrow the margin between themselves and GrubHub (leading in food delivery market share), if not overtake them.

August 2017 — March 2018. Click Here for Source

By being a first-mover, UberEats will gain significant market share within the voice food delivery market, that is currently “uncontested”. This will be made possible with the following:

Build a voice-user interface within UberEats
UberEats Siri and Google Assistant built-in integration on iPhones and Android mobile devices.
UberEats Alexa Skill and Google Home Action

Looking at the current market leaders, UberEats could easily take the #1 spot for food delivery Alexa Skill — and honestly that’s no offence to pizza source or Denny’s

Voice User Flow

The voice user flow for ordering food on UberEats will look like this.

The Nitty Gritty

Let’s dive into how to create this voice experience for the UberEats app.

English-only for now
Add a voice button onto the home tab’s top bar (place it next to the filters button)
Clicking-into the button, brings up a voice order screen

Draw similar elements from Shazam when constructing the voice ordering screen

Interpretation Logic

A user will say their order, and the item(s) will automatically be added to the ordering cart.

Multiple items will be batched to one restaurant with the most expensive item being the main subject of filter(s)
Can only order from 1 restaurant at a time for each voice query, for now
Most popular restaurant with less than 45 min delivery time is the default filter for orders unless explicitly stated-so (“order a pizza that gets here the fastest”)
Introduce voice default filter settings (“Popular”, “Rating”, “Delivery Time”) in a future iteration if high enough engagement
Intermediate portion size is default when not specified (“Medium pizza” for pizza)
Unspecific search query terms like “hamburger”, “pizza”, and “ice cream” should map to most popular item of that type at the restaurant (“regular hamburger”, “pepperoni pizza”, “vanilla ice cream”)
Uninterpreted user intent will prompt an error state and message, and ask the user to request their order again

Finally, allergy warnings, expected ordering time, and total price should still be prompted and displayed on the final checkout screen before confirmation of payment.

System Architecture

Leverage iOS and Android Speech-to-Text libraries (iOS Speech and Android SpeechRecognizer) to transform user voice input to text output
Create and design a linguistic interpreter that maps a pattern or sequence of words to the different forms of the ordering food intent, and create API endpoints for these. Contextual information (type of food, restaurant name, etc) should be stored as slots.Check out Amazon Intent Schemas for how they solve this problem.
Pipe the text to the intent interpreter through the API endpoint. The recognized intent will call the existing UberEats API endpoints with the extracted slots as the API parameters (food to order, restaurant to order from) in order to add the item(s) to the order cart.

Defining Success for 6-Months Post Launch

Focus on task success and customer utility for the first 6 months.

Key Objective: Voice-initiated orders should be 10% faster than normal orders (time from opening the app to order confirmation)
Objective 2: 30% of successful voice queries convert to orders
Objective 3: 85% of users say they will use voice again to order

Focus on growth (acquisition and engagement) after task success and utility metrics have been hit.

Key Objective

High-level Hypothesis: Users who utilize the voice feature should be able to complete their order faster than a user who does not utilize the feature.

Orders completed via voice should be put into cohort A

Track session length at time of order (time from start of opening the app to clicking confirm order)

Orders completed normally should be put into cohort B

Track session length at time of order
Record % of sessions that utilized a voice search. These sessions imply that voice did not satisfy the user’s need

Session length at time of order from cohort A should be at least 10% less than cohort B to meaningfully say that the hypothesis has been proven correct.

We need to incorporating time from the beginning of the app session because UberEats users are opening the app to find a solution. Our hypothesis states that voice will convert users into ordering within a faster time frame.

Make sure orders of cohort B that have initiated a voice search is less than 10%.

Objective #2

High-level Hypothesis: Users who utilize the voice feature should be able to find what they are looking for

Primary focus is not to increase A for now, but need to make sure that this number over 30 days does not dip below 0.1% of unique active users over 30 days.

B should not be lower than 10% of A
At least 60% voice searches should lead to an item being added to the shopping cart.A lot of users will play with the voice feature for fun which will negatively impact this number which is why the baseline was not set higher.
Key Performance Indicator: 3 in 10 orders suggested by the voice feature are ordered

Objective #3

High-level Hypothesis: The user experience of the voice feature is seamless and leaves the user satisfied

Utilize a Net Promoter Score: After a voice-initiated order has been successfully completed, show a modal that asks the user if they would use voice to order again with a thumbs up and thumbs down option.

Aim to get at least 85% for thumbs up.

Launch Plan for the First 6 Months

Start roll out with a small segment of users in large, technologically-progressive cities.

Minimum 2 weeks spent on each stage

Progress to the next step, under the following conditions:

voice-initiated orders should be 10% faster than normal orders
Less than 10% of normal orders with an attempted voice search within the same app session
At least 0.1% of unique active users, over the last 30 days, have clicked on the voice button
At least 60% of voice searches should lead to an item(s) being added to the order cart
30% of successful voice searches lead to an order
85% Net Promoter Score for ordering via Voice again

Conclusion

$40 Billion in commerce sales expected to be transacted through voice by 2022
Consumer usage evolve towards the most reliable and frictionless interface
Being a first-mover by deploying voice on UberEats, a Siri and Google Assistant integration, and an Alexa Skill and Google Home Action will yield disproportionate revenue and business value for Uber

If you’ve got any questions, feel free to drop them down below!

Drop a 👏 if you’ve made it this far! More coming soon.

Are you a student? Working in tech in your 20s? I create videos over here.