How to Set Up an Experiment Environment for Data-Driven Product Development

As a product owner, it is common to face the question of whether to proceed with option A or option B. Or, which version of the screen should be implemented to achieve better results? Making such decisions can be challenging, especially when you are under tight deadlines with limited resources. Furthermore, such decisions are made based on personal judgment or copying the approach of a competitor, which can lead to suboptimal results.

The good news is that one can avoid such pitfalls by setting up a simple experiment environment that requires relatively low effort. In this article, we will describe how you can achieve this.

Why setting up an experiment environment is important.
Myths
Setting up your experiment environment
- Defining the goal of your future experiment
- Designing the architecture of your experiment environment
- Analyzing and interpreting results
- Scaling up the preferred option.
Conclusion.

Why is Setting Up an Experiment Environment Important?

Setting up an experiment environment is important for two reasons:

Firstly, it allows you to make sure that once you implement new functionality, you pick the best option based on a data-driven approach.
Secondly, it allows you to continuously improve the existing functionality of your product by comparing ‘as-is’ to hypothetical ‘to-be’ options and doing a ‘what if’ analysis.

Myths

Before we proceed to the approach, let us debunk some of the myths that usually misguide product owners:

I need a lot of resources to set up a complex environment that allows doing experiments and A/B tests

Wrong: Described approach takes less than one week of your software engineer’s resources.

I need a well-established data gathering process and detailed event tracking

Wrong: You can rely on an existing database that stores information about the lifecycle of your product’s main entity. For instance, order statuses if you are a delivery service.

I need a dedicated team of analysts that will handle my requests on a daily basis

Wrong: Once you understand the approach and metrics of your experiment, you can pull data by yourself regularly using a simple SQL query.

Approach

To set up your experiment environment, one is advised to follow these steps:

1. Define the goal of your future experiment, understand the options, and choose metrics

Before you reach out to your product designer, define the goals and metrics to be measured as part of your experiment. In the case of a classic 'Option A or Option B' question, it is usually straightforward what you want to achieve by implementing a change. For instance, you might be addressing a specific part of the funnel.

For illustrative purposes let’s assume you are working in a delivery company and currently focused on the order creation form. You want to address a relatively low percentage of users who provide their shipping address and then select a shipping method. Also, imagine you have two new versions of the journey:

Current version: One screen requires inputting addresses and showing the map with a pin based on the address provided. The next screen allows for selecting a shipping method based on the address provided.
New version: Single screen requires to input address and select shipping method there.

The goal is to determine which of the options leads to a higher share of users that provided their address and selected a shipping method. Metrics are rather straightforward: % of the users that provided their address and selected a shipping method.

In fact, there are 2 ways to measure such data:

Based on the data that is already available by the design of your backend. For instance, consider a database that has information on the order’s lifecycle. Your order could have states or statuses like:
1. Draft created
2. Attempt to find shipping methods
3. Shipping options found/ Shipping options not found
Event tracking - this is not something that will work out of the box and hence requires extra effort to implement. However, event tracking will enable more granular analysis for you, e.g. type of device and browser name can be passed as a parameter of your events.

In the next sections of this article, we will be focused on the first approach, i.e. existing data architecture, without event tracking.

2. Design the architecture of your experiment environment

2 main steps should be completed within the experiment flow:

Create an experiment with selected parameters
Define a stage of the user journey when the user should be assigned to either group and ensure that relevant UI is displayed as a result

Creating an experiment:

The idea is to come up with a lightweight A/B testing framework that should be as simple as possible and should allow you to create experiments with the following parameters:

Experiment ID
Maximum sample
Number and names of each group
Probability of falling into each group

Being able to configure these parameters allows you to set a sample limit and choose the candidates for the experiment randomly until the desired sample size is reached.

Both client & server need changes for this: the server should track the number of candidates per experiment and the backend will decide whether the current user should be part of an experiment or not. The backend will decide whether the authenticated user should be part of the experiment based on the current sample size and on a fixed probability. Moreover, the backend should maintain a collection of users that are part of a given experiment to provide consistent experiences to users and to properly compute experiment results.

That’s how the endpoint for the configuration of the experiment could look like:

POST /api/your-service/experiment-create

Request:

{
experiment_id: "f380739f-62f3-4316-8acf-93ed5744cb9e",
maximum_sample_size: 250, 
groups:
{ 
{ group_name: "old_journey", probability_of_falling_in: 0.5 }, { group_name: "new_journey",
probability_of_falling_in: 0.5 },}

Response:

{

200,

experiment_id: "f380739f-62f3-4316-8acf-93ed5744cb9e"

}

Deciding when and how the user should be assigned to experimental groups:

You will need a separate endpoint that will be responsible for assigning a specific user to the experiment and corresponding group. Let’s call it experiment-enrollments.

While designing the whole environment you should have a clear understanding at which stage of the user journey experiment-enrollments endpoint should be called. In addition, it may be the case that not all users should participate in the experiment. That’s why it would be useful to provide a user-auth token into an endpoint as well.

In our example, if we want to focus only on new users who are doing their first order, user-auth will allow us to determine what type of user it is and whether one should be enrolled in the experiment. Also, ensure that once the endpoint is called all necessary information is available and considers specifics of your journey and lifecycle.

The experiment-enrollments endpoint is described below. It can be called at a specific stage of the journey (e.g. before landing on the screen requiring shipping address) for specific types of users (e.g. only new users who haven’t provided the address yet) and will compute whether the current user should participate in a given experiment or not:

POST /api/your-service/experiment-enrollments, user-auth token is required

Request:

\ {

experiment_id: "f380739f-62f3-4316-8acf-93ed5744cb9e"

}

Response:

{200,
enrolled: true/false,
group_name: group_1,}

To illustrate how theoretical data flow would look like, assume the same example of order creation flow in the delivery company. You are selecting between 2 options of order creation screen.

The following endpoints are mentioned in the diagram below, i.e.:

/create-order-draft (step 3)
/find-shipping-method (step 16)
/submit-order (step 20)

are provided to support the illustrative example and are not necessary parts of the experiment environment

Also, the illustrative and simplified architecture of databases is provided below.

There are 3 main tables:

Experiments set - it contains all the experiments you created earlier. The database is updated every time you call the /experiment-create endpoint.
Experiments database - it contains all records associated with each enrollment of a specific user. The database is updated every time you call the experiment-enrollments endpoint
Order lifecycle database - it is provided to support the illustrated example of how experiment-related data can be stored. The point is that this table (or any similar table that corresponds to the specifics of your product) will allow you to see if the entry (e.g. order creation) was successful or not for the specific user enrolled in one of the experimental groups you’ve set. In our example, we can rely on the Shipping method selected status that allows concluding that the user successfully provided shipping details and then selected one of the suggested shipping methods.

Overview of design and implementation approach

Pros:

The resulted sample is heterogeneous
The sample size is known upfront
Results can be computed quickly, depending on the fixed sampling probability

Cons:

Requires changes of both frontend and backend
Computing the results will rely on several databases , some of which could not be initially designed for analytical purposes

Tasks and indicative estimates:

[ ]undefined[Backend] Create a model and expose fetch counter endpoint: ~2 story points
[ ]undefined[Backend] Expose increment counter endpoint: ~2 story points
[ ]undefined[Backend] Expose endpoints to fetch & increment counter: ~1 story point
[ ]undefined[Frontend] Call counter endpoints to switch between sign-up flows: ~3 story points

Once you have designed your backend, align with your frontend team on the best way for them to receive the information and at which stage of the flow.

Keep in mind and mitigate the main dependencies:

Make sure that your teams are aligned on a contract in advance so that the frontend part can be implemented in parallel with the backend.
Try to avoid your frontend being blocked until the whole backend is done
Regardless of the experiment environment, all UI options should be implemented anyway

3. Analyzing and interpreting results

Once your experiment has been running for a sufficient amount of time, it's important to analyze and interpret the results to draw meaningful conclusions.

Define the list of fields you need to calculate the impact on metrics you decided to focus on earlier.

From the illustrative example above data sources would be 2 tables:

Experiments database:
- Input: experiment id you are looking results for
- Output: list of all user ids that are participants of specific experiment, the group to which each user was assigned and timestamp when user was assigned
Order lifecycle database
- Input: list of users that are in scope of your experiment and timestamp when they were enrolled to experiment
- Output: final statuses of all orders that are associated with these users and were processed after user was enrolled to experiment

Based on this data you can calculate the % successfully created orders for each of the experiment groups.

When analyzing your results, it's important to look beyond just the raw numbers. You'll also want to look for statistical significance to ensure that any differences you observe between your test groups are not just due to random chance. I will not focus too much on this part as I already see plenty of articles related to this topic with this and other online resources. Anyway, excessive knowledge is not required here: in my opinion, being able to apply Z-Test or T-Test to check for the significance of the difference between the 2 groups would be sufficient.

Nevertheless, once you've determined that your results are statistically significant, you can start to draw conclusions about which option of your product performed better.

4. Scaling up the preferred option

After you've successfully run an experiment and gotten a sufficient degree of confidence regarding the best option, the next step is to scale up your changes across your product. There can be several approaches:

The easiest one is to adjust the configuration of your experiment so that 100% of the users will be falling under the group that shows better results. You will need to reserve some time to clean up the code in the future so that displaying this specific part of UI is independent of the experiment environment
The less straightforward one is if your product is available on multiple platforms. Be extra careful in assuming that the results of experiments on the web flow apply to the mobile app flow (and vice versa). Sometimes it’s better to be safe than sorry and run a separate experiment in a similar way, but on another platform.

To conclude

Having your own experiment environment is a very handy tool for any product manager. Regardless of at which stage of maturity your current product is, creating an experiment environment should not take too much time. Paying a fairly low one-off cost to get it working will fairly quickly show you the return on investment.

Finally, here are a few tips to make sure that the results of the experiment make sense:

First of all, understand which metrics will be affected (if any) by implementing the change you are considering.
Clearly define your goals and metrics before starting your experiment
Keep your experiment as simple as possible, focusing on one key change at a time
Be careful when considering the extrapolation of results of your experiment on other platforms or other similar services

By following these best practices, you can set up an effective experimentation environment that can help you make data-driven decisions and drive your conversion rates over time.

How to Set Up an Experiment Environment for Data-Driven Product Development

Too Long; Didn't Read

Table of Contents

Why is Setting Up an Experiment Environment Important?

Myths

Approach

1. Define the goal of your future experiment, understand the options, and choose metrics