When faced with the problem of trying to influence prospective customers into buying your product an obvious choice is to run precisely targeted Marketing campaigns. Now what do the words precise targeting mean. In the Universe of prospective customers there are only so many types.
Sure Things : These customers will buy whether or not they receive an ad impression. They are your loyal, brand aware customers that don’t need any convincing to buy from you
Lost Causes : These customers are the exact opposite of Sure Things. They will not buy no matter how many ad impressions you target them with. They are probably aware of your brand but are committed to a different vendor or just don’t have a need for the product you sell.
Persuadables : These are the customers you want to work with. While they might be initially uncommitted to your brand they will change their mind when exposed to the right kind of ad-messaging. Much of Uplift Modeling is about finding these customers in your population.
Sleeping Dogs : Now these are the bunch of customers you do not want to impress with an ad. Theoretically they are the people that, while otherwise a paying customer, would become non-committal when exposed to an ad. Think of suddenly becoming conscious of your automatic payment to Netflix, when shown an ad about a new release and deciding to cancel it.
The following is a nice visualization of the different types and their response to treatment(ad exposure)
Imagine you are a Data Scientist working for an e-commerce company, tasked with the objective of identifying the customers that are the most incremental for an ad campaign, to ensure that the campaign budget is spent in the most optimal fashion. The keyword here is incremental.
Drawing from our discussions so far, we could define incrementality by looking at the difference between the effect treatment has on customer conversion versus control. That is :
Uplift = P(Conversion|Treatment) - P(Conversion|Control)
i.e., what is the incremental probability that a customer will convert when shown an ad versus not. Let’s further understand what this metric means for the different types of entities in our Universe of customers.
Sure Things :
P(Conversion|Treatment) ~ 1 & P(Conversion|Control) ~1
∴ Uplift = 0
Lost Causes:
P(Conversion|Treatment) ~ 0 & P(Conversion|Control) ~0
∴ Uplift = 0
Persuadables :
P(Conversion|Treatment) ~ 1 & P(Conversion|Control) ~0
∴ Uplift = 1
Sleeping Dogs :
P(Conversion|Treatment) ~ 0 & P(Conversion|Control) ~1
∴ Uplift = -1
It is clear from the above design that in order to optimize spend one should only target the Persuadables i.e., the customers with the highest Uplift.
However the task still remains to identify these Persuadables from your population. Now unlike Conversion ( a binary variable with outcomes ∈ [0,1]) or Revenue ( a continuous variable with outcome ∈ R) , it is not trivial to formulate this problem in a traditional Classification/ Regression framework. The reason is that one cannot treat and not treat the same customer i.e., A customer could only belong to the Treatment or Control group not both. Therefore it remains tricky to produce a label that corresponds to Uplift that you can then train a regular Machine Learning model towards.
Traditional ways of solving the uplift problem involve the Two Model Approach, the Transformed Outcome Method [1],
Direct Modeling techniques with modified splitting criterion using significance based trees [5]. Gutierrez‘s paper does a comparative review of these techniques [3]. For this article however, I am going to focus on the algorithm and implementation of Conversion Homogenized Uplift Computation - CHUC
Intuition : The intuition for CHUC simply put is :
Similar people tend to react in the same way when shown an Ad and probability to convert is a good proxy for similarity.
With that intuition, for a given customer CHUC aims to:
(i) Accurately estimate the true probability to convert for that customer i.e., In the absence of any impression how likely is the customer to convert or P(Conversion|Control)
(ii) Group customers with similar estimated probability of conversion
(iii) Compute empirical Uplift ( P(Conversion|Treatment) - P(Conversion|Control)) for each of these groups
(iv) Fit a function between Uplift for each group and P(Conversion|Control) for that group, such that for every new customer the problem reduces to assigning the customer to a conversion group and then obtaining the predicted value of uplift for that group
Given,
•**Training dataset :**df_train
•**Test dataset:**df_test
•**Feature Vector :**X
•**Outcome variable :**Y
•Treatment Label : TR; TR=1 for Treatment & TR=0 for Control
Step 1: Fit a model M1(Regression/ Classification) that given feature vector X for a customer predicts the actual outcome Y using only Control data (i.e. TR=0). The output of the model Y_pred
(i) is the predicted value Y when Y ∈ R(ii) P(Y=1) when Y ∈ [0,1]
**Step 2:**Group each customer into one of 10 Deciles such that
E[Y_pred|Decileᵢ] < E[Y_pred|Decile_ᵢ₊₁]
**Step 3:**For each Decile generate empirical Uplift scores where
Uplift = P(Y_pred|TR=1) — P(Y_pred|TR=0)
on a bootstrapped df_train to obtain multiple observations for a given Decile.
This is done to reduce bias in the Model
Step 4: Fit another model (Regression) M2 that maps P(Y_pred|TR=0) to Uplift from the data generated in Step 3
Step 1: For dataset df_test use model M1and feature vector X to predict Y_pred_test for each customer
Step 2: Use model M2 to map Y_pred_test to get predicted Uplift
CHUC Python package is a wrapper around Xgboost and Pylift that implements the above algorithm and provides additional diagnostic insights. This section is about the essential features and implementation of the package. The package uses
Installation
The package can be found at
*pip install git+
Training
Load data
filePath='/../USConversions'
df=pd.read_pickle(filePath+'df_train.pkl')
df.shape
The computation object can be instantiated as
import chucu=chuc.Uplift(df, treatmentLabel='testgroup', outcome='converted')
The package automatically detects the nature of the outcome (binary/continuous) and implements the appropriate algorithm (classifier/regressor)
* In order to instantiate the object for training with hyperparameter tuning use
p_={'max_depth':list(range(2,15,1)),
'min_child_weight':list(range(10,300)),
'n_estimators':list(range(50,400)),
'gamma':sc.stats.uniform(0.1,0.9),
'subsample':sc.stats.uniform(0.6,0.4)}u=chuc.Uplift(df, treatmentLabel='testgroup', outcome='converted,
param_search_space=p_)
The model then needs to be trained using
u.fit()
Once trained we obtain the diagnostics plot which tells us the quality of the training and visualizes the scatter plot between the outcome and uplift.
u.getDiagnostics()
We could also visualize the train
u.plotQiniTrain()
Predicting
In order to predict uplift on a new test set
uplift=u.predictUplift(df_test)
[1] Athey, S., & Imbens, G. W. (2015). Machine learning methods for estimating heterogeneous causal effects. stat, 1050(5).
[2]Nicholas J Radcliffe and Patrick D Surry. Real-world uplift modelling with significance based uplift trees. White Paper TR-2011–1, Stochastic Solutions, 2011.
[3]Gutierrez, P., & Gérardy, J. Y. (2017, July). Causal Inference and Uplift Modelling: A Review of the Literature. In International Conference on Predictive Applications and APIs (pp. 1–13).
[4]Piotr Rzepakowski and Szymon Jaroszewicz. Decision trees for uplift modeling with single and multiple treatments. Knowledge and Information Systems, 32(2):303–327, 2012
[5]Leo Guelman, Montserrat Guill´en, and Ana M P´erez-Mar´ın. Uplift random forests. Cybernetics and Systems, 46(3–4):230–248, 2015.
A big thanks to Robert Yi and Will Frost for their package (Pylift) and our discussions that helped me understand Uplift modeling better. Also thanks to Kaashyap Thiyagaraj helping me test CHUC and provide crucial feedback.