New Story

This Data Science-based Approach Can Help Convert Users Into Paying Customers

by SabarnaDecember 17th, 2024

Too Long; Didn't Read

Uplift Modeling is a way to model the impact of marketing campaigns on customer conversion. It aims to identify the customers that are the most incremental for an ad campaign. Uplift models the effect treatment has on conversion.

featured image - This Data Science-based Approach Can Help Convert Users Into Paying Customers

What is Uplift Modeling ?

When faced with the problem of trying to influence prospective customers into buying your product an obvious choice is to run precisely targeted Marketing campaigns. Now what do the words precise targeting mean. In the Universe of prospective customers there are only so many types.

Sure Things : These customers will buy whether or not they receive an ad impression. They are your loyal, brand aware customers that don’t need any convincing to buy from you

Lost Causes : These customers are the exact opposite of Sure Things. They will not buy no matter how many ad impressions you target them with. They are probably aware of your brand but are committed to a different vendor or just don’t have a need for the product you sell.

Persuadables : These are the customers you want to work with. While they might be initially uncommitted to your brand they will change their mind when exposed to the right kind of ad-messaging. Much of Uplift Modeling is about finding these customers in your population.

Sleeping Dogs : Now these are the bunch of customers you do not want to impress with an ad. Theoretically they are the people that, while otherwise a paying customer, would become non-committal when exposed to an ad. Think of suddenly becoming conscious of your automatic payment to Netflix, when shown an ad about a new release and deciding to cancel it.

The following is a nice visualization of the different types and their response to treatment(ad exposure)

Problem Formulation

Imagine you are a Data Scientist working for an e-commerce company, tasked with the objective of identifying the customers that are the most incremental for an ad campaign, to ensure that the campaign budget is spent in the most optimal fashion. The keyword here is incremental.

Drawing from our discussions so far, we could define incrementality by looking at the difference between the effect treatment has on customer conversion versus control. That is :

Uplift = P(Conversion|Treatment) - P(Conversion|Control)

i.e., what is the incremental probability that a customer will convert when shown an ad versus not. Let’s further understand what this metric means for the different types of entities in our Universe of customers.

Sure Things :
P(Conversion|Treatment) ~ 1 & P(Conversion|Control) ~1
∴ Uplift = 0

Lost Causes:
P(Conversion|Treatment) ~ 0 & P(Conversion|Control) ~0
∴ Uplift = 0

Persuadables :
P(Conversion|Treatment) ~ 1 & P(Conversion|Control) ~0
∴ Uplift = 1

Sleeping Dogs :
P(Conversion|Treatment) ~ 0 & P(Conversion|Control) ~1
∴ Uplift = -1

It is clear from the above design that in order to optimize spend one should only target the Persuadables i.e., the customers with the highest Uplift.

However the task still remains to identify these Persuadables from your population. Now unlike Conversion ( a binary variable with outcomes ∈ [0,1]) or Revenue ( a continuous variable with outcome ∈ R) , it is not trivial to formulate this problem in a traditional Classification/ Regression framework. The reason is that one cannot treat and not treat the same customer i.e., A customer could only belong to the Treatment or Control group not both. Therefore it remains tricky to produce a label that corresponds to Uplift that you can then train a regular Machine Learning model towards.

Traditional ways of solving the uplift problem involve the Two Model Approach, the Transformed Outcome Method [1],
Direct Modeling techniques with modified splitting criterion using significance based trees [5]. Gutierrez‘s paper does a comparative review of these techniques [3]. For this article however, I am going to focus on the algorithm and implementation of Conversion Homogenized Uplift Computation - CHUC

Conversion Homogeneity based Uplift Computation

Intuition : The intuition for CHUC simply put is :
Similar people tend to react in the same way when shown an Ad and probability to convert is a good proxy for similarity.

With that intuition, for a given customer CHUC aims to:

(i) Accurately estimate the true probability to convert for that customer i.e., In the absence of any impression how likely is the customer to convert or P(Conversion|Control)

(ii) Group customers with similar estimated probability of conversion

(iii) Compute empirical Uplift ( P(Conversion|Treatment) - P(Conversion|Control)) for each of these groups

(iv) Fit a function between Uplift for each group and P(Conversion|Control) for that group, such that for every new customer the problem reduces to assigning the customer to a conversion group and then obtaining the predicted value of uplift for that group

Given,
•**Training dataset :**df_train
•**Test dataset:**df_test
•**Feature Vector :**X
•**Outcome variable :**Y
•Treatment Label : TR; TR=1 for Treatment & TR=0 for Control

Model Training algorithm:

Step 1: Fit a model M1(Regression/ Classification) that given feature vector X for a customer predicts the actual outcome Y using only Control data (i.e. TR=0). The output of the model Y_pred
(i) is the predicted value Y when Y ∈ R(ii) P(Y=1) when Y ∈ [0,1]

**Step 2:**Group each customer into one of 10 Deciles such that
E[Y_pred|Decileᵢ] < E[Y_pred|Decile_ᵢ₊₁]

**Step 3:**For each Decile generate empirical Uplift scores where
Uplift = P(Y_pred|TR=1) — P(Y_pred|TR=0)
on a bootstrapped df_train to obtain multiple observations for a given Decile.
This is done to reduce bias in the Model

Step 4: Fit another model (Regression) M2 that maps P(Y_pred|TR=0) to Uplift from the data generated in Step 3

Model Scoring algorithm:

Step 1: For dataset df_test use model M1and feature vector X to predict Y_pred_test for each customer
Step 2: Use model M2 to map Y_pred_test to get predicted Uplift

Implementation of CHUC in Python

CHUC Python package is a wrapper around Xgboost and Pylift that implements the above algorithm and provides additional diagnostic insights. This section is about the essential features and implementation of the package. The package uses Xgboost for all of the regression and classification prediction tasks and relies on Pylift for its superb implementation of Uplift evaluation and visualization subpackages.

Installation

The package can be found at https://github.com/sabarna/chucand can be installed using the following command :
*pip install git+https://github.com/sabarna/chuc.git*

Training

Load data

filePath='/../USConversions'
df=pd.read_pickle(filePath+'df_train.pkl')
df.shape

The computation object can be instantiated as

import chucu=chuc.Uplift(df, treatmentLabel='testgroup', outcome='converted')

The package automatically detects the nature of the outcome (binary/continuous) and implements the appropriate algorithm (classifier/regressor)

* In order to instantiate the object for training with hyperparameter tuning use

p_={'max_depth':list(range(2,15,1)),
    'min_child_weight':list(range(10,300)), 
    'n_estimators':list(range(50,400)), 
    'gamma':sc.stats.uniform(0.1,0.9),
    'subsample':sc.stats.uniform(0.6,0.4)}u=chuc.Uplift(df, treatmentLabel='testgroup', outcome='converted, 
              param_search_space=p_)

The model then needs to be trained using

u.fit()

Once trained we obtain the diagnostics plot which tells us the quality of the training and visualizes the scatter plot between the outcome and uplift.

u.getDiagnostics()

We could also visualize the train Qini as

u.plotQiniTrain()

Predicting

In order to predict uplift on a new test set

uplift=u.predictUplift(df_test)

References

[1] Athey, S., & Imbens, G. W. (2015). Machine learning methods for estimating heterogeneous causal effects. stat, 1050(5).
[2]Nicholas J Radcliffe and Patrick D Surry. Real-world uplift modelling with significance based uplift trees. White Paper TR-2011–1, Stochastic Solutions, 2011.
[3]Gutierrez, P., & Gérardy, J. Y. (2017, July). Causal Inference and Uplift Modelling: A Review of the Literature. In International Conference on Predictive Applications and APIs (pp. 1–13).
[4]Piotr Rzepakowski and Szymon Jaroszewicz. Decision trees for uplift modeling with single and multiple treatments. Knowledge and Information Systems, 32(2):303–327, 2012
[5]Leo Guelman, Montserrat Guill´en, and Ana M P´erez-Mar´ın. Uplift random forests. Cybernetics and Systems, 46(3–4):230–248, 2015.

Acknowledgements

A big thanks to Robert Yi and Will Frost for their package (Pylift) and our discussions that helped me understand Uplift modeling better. Also thanks to Kaashyap Thiyagaraj helping me test CHUC and provide crucial feedback.