paint-brush
Predict Customer Churn With Machine Learning, Data Science and Survival Analysisby@stylianoskampakis
780 reads
780 reads

Predict Customer Churn With Machine Learning, Data Science and Survival Analysis

by Stylianos KampakisSeptember 9th, 2022
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Churn is the process of customers leaving their service provider for a competitor. It can be due to many reasons, such as financial constraints, poor customer experience, or dissatisfaction with the company. The Tesseract Academy recently worked on a customer churn prediction problem with a large insurance company based in London and San Francisco. In this article we will examine some of the methods that we used in order to make the project successful. We will dive into some of these techniques we used to predict which customers will churn and better understand their behavior.
featured image - Predict Customer Churn With Machine Learning, Data Science and Survival Analysis
Stylianos Kampakis HackerNoon profile picture

Churn is the process of customers leaving their service provider for a competitor. It can be due to many reasons, such as financial constraints, poor customer experience, or dissatisfaction with the company.


Predicting customer churn is important because businesses have limited resources and cannot afford to lose customers if they want to stay profitable. If a company has too many customers leave, it will not be able to produce enough revenue and will eventually go bankrupt. However, predicting customer churn allows companies to avoid this by better understanding why customers are leaving and what they can do about it.


The Tesseract Academy recently worked on a customer churn prediction problem with a large insurance company based in London and San Francisco. In this article, we will dive into how the Tesseract Academy managed to successfully predict churn and increase the client’s bottom line.

Using machine learning for churn prediction

The full article and case study can be found here: Tesseract Report: Customer Churn Prediction through Data Science and AI. The Tesseract's Academy content is targeted toward decision makers, so the case study speaks in a high-level language. In this article, we will examine some of the methods that we used in order to make the project successful. More specifically, we will delve into some of the survival modeling techniques we used in order to predict which customers will churn and better understand their behavior.


How can survival analysis be used in data science?

  1. Will someone survive? Yes or no?
  2. If he does not survive, what is the probability of surviving up to a certain point? Survival modeling allows the addition of “ censored data ”. Censoring takes place when we're following a reality (a client in this particular churn case study), but we don't have data before or after the period of interest.

For illustration, let’s say that we have a dataset of 100 guests. We know that 30 of them churned, and 70 have not. We know that some of these 70 guests will ultimately churn. perhaps they will churn the coming day, or perhaps in 5 times. We don’t really know. But what we do know is that they've survived up to this point.

Traditional bracket and retrogression styles from supervised literacy don't allow us to use this information. But survival modeling does.


Then are some of the most common survival analysis algorithms

Kaplan-Meier curve:  The Kaplan - Meier curve is a non-parametric tool that allows one to visualize the impact that different categories can have on survival. This was one of the first tools that we used. This system can not be used for prophetic purposes, but it's a great tool to use when communicating to the stakeholders, as it's easy to understand.


Cox proportional hazards model: This is a veritably notorious semi-parametric model created by the late and fabulous statistician David Cox. The Cox commensurable hazards model allows the estimation of threat, without demanding the specification of a distribution. In this particular design, we used Cox commensurable hazards as a first system, in order to understand how different characteristics told the birth hazard. While it’s not a stylish model for vaticination, it's veritably useful when you want to understand the impact that one variable can have on survival.


Weibull regression: This is a parametric model for survival analysis. It's grounded upon the Weibull distribution ( As the name implies). An intriguing property of Weibull retrogression is that it can accommodate numerous different types of survival. It can be used to model the survival of natural organisms( that ultimately die), or indeed the survival of artistic artefacts, whose probability of survival increases the longer they’ve been around for.


Survival support vector machines This is an extension of the classic support vector machine algorithm into a survival setting. While we used this system, we set up it to be veritably laggardly, which is a common issue with numerous kernel styles. still, it might be useful for lower datasets.


Survival timbers and grade boosted survival analysis in the last many times there have been some acclimations of classic machine learning algorithms into the survival analysis fabrics. Survival timbers is the most notorious bone .


There's also an extension of grade-boosted trees into survival analysis. We set up both of these styles to be relatively fast and produce veritably good results for our problem.


Assessing survival models


When assessing any machine literacy model, the criteria are one of the most important effects to consider. Because survival analysis combines bracket and retrogression, traditional bracket criteria might not work as well. thus, we've to resort to other means.

One of the most common means of assessing survival is the cleaned concordance indicator, or c- indicator for condensation. We've met the conception of concordance in another post, where we talked about the graces of using the concordance correlation measure as a metric to assess the performance of retrogression algorithms.

This standard is also called Harrell’s c- indicator and is calculated as shown in the formula below



An accordant brace is a brace (of two guests in our case), where the model prognosticated that one client would churn before another, and the vaticination was correct. A discordant brace is a brace of two guests, where the model’s vaticination was out. This means that the model prognosticated that a client would stay for longer with a company, but they did n’t.
You see that because we're assessing survival, the factual vaticination is different to what it would be than if we were assessing a bracket or retrogression model.

How to run survival analysis


One of the stylish packages for survival analysis in Python is scikit- survival. This is an amazing package that's using a scikit- learn type of interface in order to give access to all of the mainstream survival analysis algorithms set up in the machine learning community.

For the more statistically acquainted, R is a better option. There's a survival package in R, which can do everything from Cox retrogression, to Kaplan- Meier angles and Weibull retrogression.
The difference between them is that the R package is primarily concentrated on statistical styles, whereas scikit- survival is concentrated on machine literacy bones.

Data science, AI and machine learning
So, I hope you found this helpful! If you're interested to know further about topics like data science, AI and machine learning, make sure to get in touch. I'm always open to teaching new students, and I'm also organizing my own data science and machine learning training boot camp. Also, make sure to check out the site of the Tesseract Academy if you're a CEO or an entrepreneur who wants to use AI and data science.


Also Published Here