As a product owner, it is common to face the question of whether to proceed with option A or option B. Or, which version of the screen should be implemented to achieve better results? Making such decisions can be challenging, especially when you are under tight deadlines with limited resources. Furthermore, such decisions are made based on personal judgment or copying the approach of a competitor, which can lead to suboptimal results.
The good news is that one can avoid such pitfalls by setting up a simple experiment environment that requires relatively low effort. In this article, we will describe how you can achieve this.
Setting up an experiment environment is important for two reasons:
Firstly, it allows you to make sure that once you implement new functionality, you pick the best option based on a data-driven approach.
Secondly, it allows you to continuously improve the existing functionality of your product by comparing ‘as-is’ to hypothetical ‘to-be’ options and doing a ‘what if’ analysis.
Before we proceed to the approach, let us debunk some of the myths that usually misguide product owners:
I need a lot of resources to set up a complex environment that allows doing experiments and A/B tests
Wrong: Described approach takes less than one week of your software engineer’s resources.
I need a well-established data gathering process and detailed event tracking
Wrong: You can rely on an existing database that stores information about the lifecycle of your product’s main entity. For instance, order statuses if you are a delivery service.
I need a dedicated team of analysts that will handle my requests on a daily basis
Wrong: Once you understand the approach and metrics of your experiment, you can pull data by yourself regularly using a simple SQL query.
To set up your experiment environment, one is advised to follow these steps:
Before you reach out to your product designer, define the goals and metrics to be measured as part of your experiment. In the case of a classic 'Option A or Option B' question, it is usually straightforward what you want to achieve by implementing a change. For instance, you might be addressing a specific part of the funnel.
For illustrative purposes let’s assume you are working in a delivery company and currently focused on the order creation form. You want to address a relatively low percentage of users who provide their shipping address and then select a shipping method. Also, imagine you have two new versions of the journey:
Current version: One screen requires inputting addresses and showing the map with a pin based on the address provided. The next screen allows for selecting a shipping method based on the address provided.
New version: Single screen requires to input address and select shipping method there.
The goal is to determine which of the options leads to a higher share of users that provided their address and selected a shipping method. Metrics are rather straightforward: % of the users that provided their address and selected a shipping method.
In fact, there are 2 ways to measure such data:
Based on the data that is already available by the design of your backend. For instance, consider a database that has information on the order’s lifecycle. Your order could have states or statuses like:
Draft created
Attempt to find shipping methods
Shipping options found/ Shipping options not found
Event tracking - this is not something that will work out of the box and hence requires extra effort to implement. However, event tracking will enable more granular analysis for you, e.g. type of device and browser name can be passed as a parameter of your events.
In the next sections of this article, we will be focused on the first approach, i.e. existing data architecture, without event tracking.
2 main steps should be completed within the experiment flow:
The idea is to come up with a lightweight A/B testing framework that should be as simple as possible and should allow you to create experiments with the following parameters:
Being able to configure these parameters allows you to set a sample limit and choose the candidates for the experiment randomly until the desired sample size is reached.
Both client & server need changes for this: the server should track the number of candidates per experiment and the backend will decide whether the current user should be part of an experiment or not. The backend will decide whether the authenticated user should be part of the experiment based on the current sample size and on a fixed probability. Moreover, the backend should maintain a collection of users that are part of a given experiment to provide consistent experiences to users and to properly compute experiment results.
That’s how the endpoint for the configuration of the experiment could look like:
POST /api/your-service/experiment-create
Request:
{
experiment_id: "f380739f-62f3-4316-8acf-93ed5744cb9e",
maximum_sample_size: 250,
groups:
{
{ group_name: "old_journey", probability_of_falling_in: 0.5 }, { group_name: "new_journey",
probability_of_falling_in: 0.5 },}
Response:
{
200,
experiment_id: "f380739f-62f3-4316-8acf-93ed5744cb9e"
}
You will need a separate endpoint that will be responsible for assigning a specific user to the experiment and corresponding group. Let’s call it experiment-enrollments
.
While designing the whole environment you should have a clear understanding at which stage of the user journey experiment-enrollments
endpoint should be called. In addition, it may be the case that not all users should participate in the experiment. That’s why it would be useful to provide a user-auth token into an endpoint as well.
In our example, if we want to focus only on new users who are doing their first order, user-auth will allow us to determine what type of user it is and whether one should be enrolled in the experiment. Also, ensure that once the endpoint is called all necessary information is available and considers specifics of your journey and lifecycle.
The experiment-enrollments
endpoint is described below. It can be called at a specific stage of the journey (e.g. before landing on the screen requiring shipping address) for specific types of users (e.g. only new users who haven’t provided the address yet) and will compute whether the current user should participate in a given experiment or not:
POST /api/your-service/experiment-enrollments
, user-auth token is required
Request:
\ {
experiment_id: "f380739f-62f3-4316-8acf-93ed5744cb9e"
}
Response:
{200,
enrolled: true/false,
group_name: group_1,}
To illustrate how theoretical data flow would look like, assume the same example of order creation flow in the delivery company. You are selecting between 2 options of order creation screen.
The following endpoints are mentioned in the diagram below, i.e.:
/create-order-draft (step 3)
/find-shipping-method (step 16)
/submit-order (step 20)
are provided to support the illustrative example and are not necessary parts of the experiment environment
Also, the illustrative and simplified architecture of databases is provided below.
There are 3 main tables:
Experiments set
- it contains all the experiments you created earlier. The database is updated every time you call the /experiment-create
endpoint.
Experiments database
- it contains all records associated with each enrollment of a specific user. The database is updated every time you call the experiment-enrollments
endpoint
Order lifecycle database
- it is provided to support the illustrated example of how experiment-related data can be stored. The point is that this table (or any similar table that corresponds to the specifics of your product) will allow you to see if the entry (e.g. order creation) was successful or not for the specific user enrolled in one of the experimental groups you’ve set. In our example, we can rely on the Shipping method selected status that allows concluding that the user successfully provided shipping details and then selected one of the suggested shipping methods.
Pros:
Cons:
Tasks and indicative estimates:
Once you have designed your backend, align with your frontend team on the best way for them to receive the information and at which stage of the flow.
Keep in mind and mitigate the main dependencies:
Once your experiment has been running for a sufficient amount of time, it's important to analyze and interpret the results to draw meaningful conclusions.
Define the list of fields you need to calculate the impact on metrics you decided to focus on earlier.
From the illustrative example above data sources would be 2 tables:
Experiments database
:
Input: experiment id you are looking results for
Output: list of all user ids that are participants of specific experiment, the group to which each user was assigned and timestamp when user was assigned
Order lifecycle database
Based on this data you can calculate the % successfully created orders for each of the experiment groups.
When analyzing your results, it's important to look beyond just the raw numbers. You'll also want to look for statistical significance to ensure that any differences you observe between your test groups are not just due to random chance. I will not focus too much on this part as I already see plenty of articles related to this topic with this and other online resources. Anyway, excessive knowledge is not required here: in my opinion, being able to apply Z-Test or T-Test to check for the significance of the difference between the 2 groups would be sufficient.
Nevertheless, once you've determined that your results are statistically significant, you can start to draw conclusions about which option of your product performed better.
After you've successfully run an experiment and gotten a sufficient degree of confidence regarding the best option, the next step is to scale up your changes across your product. There can be several approaches:
The easiest one is to adjust the configuration of your experiment so that 100% of the users will be falling under the group that shows better results. You will need to reserve some time to clean up the code in the future so that displaying this specific part of UI is independent of the experiment environment
The less straightforward one is if your product is available on multiple platforms. Be extra careful in assuming that the results of experiments on the web flow apply to the mobile app flow (and vice versa). Sometimes it’s better to be safe than sorry and run a separate experiment in a similar way, but on another platform.
Having your own experiment environment is a very handy tool for any product manager. Regardless of at which stage of maturity your current product is, creating an experiment environment should not take too much time. Paying a fairly low one-off cost to get it working will fairly quickly show you the return on investment.
Finally, here are a few tips to make sure that the results of the experiment make sense:
By following these best practices, you can set up an effective experimentation environment that can help you make data-driven decisions and drive your conversion rates over time.