A propensity to hope and joy is real riches; one to fear and sorrow real poverty. — David Hume
Marketers invest a lot of time talking about the significance of getting the correct messages to the perfect individuals at the perfect time. Notifying or Emailing when the user is not interested may cause many users to turn off app notifications or report emails spam which blocks all future communications.
Marketing comes at a cost both financial and user experience. If there are 100k users on the platform it is wise to put in effort for only a subset of users who might be interested to purchase/converting.
The best way to identify who among your audience is most likely to actually make a purchase, accept an offer, or sign up for a service is a propensity model. Let us understand the propensity model better by working on a problem statement: Build a propensity model to determine if a user will purchase on their return visit.
Analyze your data understanding how many % of the positive class (user buy on a return visit) and negative class (user doesn’t buy on a return visit).
For our Marketing use case to improve conversion rate:
Cost of False Negative (marking High Propensity as Low) > Cost of False Positive (marking a Low Propensity Customer as High)
Hence our Metric should be such that: Recall is more important than Precision
A beta value of 2 will weigh more attention on recall than precision and is referred to as the F2 measure.
F2-Measure = ((1 + 2²) * Precision * Recall) / (2² * Precision + Recall)
The propensity model is a binary classification problem, we would be using a Logistic Regression for our model.
Model Training Schema
Model Output
prob: is logistic regression probability of an event occurring, in our case event is user buying on a return visit or not.
We ran 3 different feature set experiments with logistic regression and found 2nd to be performing best on our metrics.
VISUAL of Model Evaluation (Best Model: 2nd in above Experiment Table | Positive Class Threshold: 0.0217)
The best threshold for positive class = 0.0217 means logistic regression probability ≥ the threshold is positive class (user will buy on return visit) else, negative class.
On testing experiment model 2 with features Bounce, OS, TimeOnSite, Pageviews, and Country. We got a Recall of 91.7% and a Precision of 3.9%. A high recall relates to low False Negative cases and low precision relates to high False Positive cases.
Confusion Matrix on the Test dataset
NOTE: To build this model our objective was to maximize the conversion rate. We gave more importance to recall i.e. Cost(False Negative) > Cost(False Positive)
If marketing communication cost is high and business demands (equal Precision and Recall) then we would need to change the positive class threshold and metric such that Recall = Precision (take F1 Score as metric).
Now, using this propensity model marketing and audience targeting can be done more intelligently where chances of a user conversion (purchase) from the platform are higher. Also, it helps the marketing team in terms of cost as they no longer have to run campaigns/notifications/emailers on all visitors but rather focus only on a subset of users whose propensity score is high.
I hope you learned something new from this blog. If you liked it, hit 👏 and share this article. Stay tuned for the next one!
AUTHOR: https://www.linkedin.com/in/shaurya-uppal/
Newsletter: https://www.linkedin.com/newsletters/problem-solving-data-science-6874965456701198336/
Also published here.