Population Stability Index (PSI) is a statistical measure commonly used in credit risk modeling and other fields to assess the stability of a population across different time periods or segments. It's particularly valuable in scenarios where changes in the population distribution could impact the effectiveness of a model. It is a fundamental assumption that the population on which a statistical model is applied to is similar to the population on which the model is developed at. The model performance relies on the closeness between the model development sample and the production data. Therefore, there is an intrinsic need to compare the two samples (development vs production) and assess whether a model needs to be recalibrated. The population change may occur for several reasons – change in the business strategies and policies or changes in external factors such as the political, economic and social environment. The model performance is impacted because of a change in the joint distribution p(y, X), where y is the model outcome and X is the design matrix that comprises the observations of the model features. Therefore, the entire model fit, i.e., the estimates of the model coefficients and the inherent functions get impacted. If this happens, then a decision must be made towards the recalibration or redevelopment of the model. Before we dive deep into the PSI mathematical expression, we start with some key concepts on Entropy and KL divergence.
Entropy is a measure of uncertainty or randomness in a set of data. In machine learning, particularly in decision trees and information theory, entropy is used to quantify the impurity or disorder in a dataset. For instance, the splits of a decision tree for a classification problem are created to minimize the entropy. Entropy is used to determine the best attribute to split the data at each node of the decision tree.
In the above definition of entropy, the negative sign ensures that Entropy is positive (because log of the probability will be negative as the probability will be < 1).
Kullback-Liebler Divergence (KL Divergence) has been referred in several places in the literature (Lin, 2017), (Wu & Olson, 2010), (Li et al., 2008), etc. It measures the difference between two probability distributions p(x) and q(x). It quantifies how one probability distribution diverges from another. The distribution p(x) is considered the ‘true’ or ‘base’ distribution, and the distribution q(x) is considered the ‘untrue’ or ‘target’ distribution, so that KL Divergence represents some sort of loss due to using the ‘wrong’ distribution.
KL divergence is non-negative and is zero if and only if p(x) and q(x) are identical, i.e., the same distribution. KL divergence measures the dissimilarity of the two probability distributions p(x) and q(x). If both the distributions are identical, KL Divergence is 0.
Now that we have covered the mathematical expressions of Entropy and KL Divergence Index, we will see how they would help us in understanding the expression of Population Stability Index.
Notice that we have added another term q(x) inside the brackets of KL divergence expression to get the mathematical expression for PSI. But the formulae look so similar that we can think of expressing PSI as a function of KL Divergence. Here is how we can do that.
That is the PSI metric is actually the sum of the relative entropies of q(x) over p(x) and p(x) over q(x). In other words we are adding the information ‘gain’ by expressing p(x) using an approximation q(x) and vice-versa. The reason we do this is because we don’t know the ‘ground truth’. We assume that one of them is a ‘true’ distribution and the other is an ‘approximation’; and as we know that the KL divergence is not symmetric, we compute it from p(x) to q(x) and vice-versa and add them to get a final metric which can be judged to ascertain the difference between p(x) and q(x).
So how high should PSI be to make that judgement? The following table summarizes the PSI values which are ‘industry wide rules of thumb’.
Interestingly, PSI is very similar to ‘Information Value’ used to measure the strength of the modelling variables in developing credit risk models. We will explore that soon in a different article.