What do you know about the Population Stability Index (PSI) measure, its historical usage, and its connection to other mathematical drift measures such as KL divergence? If you’re left scratching your head, don’t worry — we’ve got you covered!
PSI is a commonly used measure in the financial services domain to quantify the shift in the distribution of a variable over time. While several resources give an overview of PSI, such as this visual blog by Matthew Burke and this paper summary , they often do not discuss the connection between PSI as a drift metric and other popular measures such as KL divergence.
Briefly, PSI is calculated based on the multinomial classification of a variable into bins or categories. Consider two distributions shown in the left figure above. These distributions can be converted into their respective histograms with an appropriately chosen binning strategy. There are several binning strategies, and each strategy can yield varying PSI values. For the figure on the right, data is collected in equi-width bins. This produces a histogram that resembles a discretized version of the respective distribution. Another possible binning strategy is equi-quantiles or equi-depth binning. In this case, each bin would have the same proportion of samples in the reference / expected distribution. The choice of the strategy is context-specific and requires domain knowledge. For example, in credit score monitoring, credit scores are already binned into ranges representing a client's credit risk. In such cases, it may be desirable to use consistent binning throughout the analysis.
The differences in each bin between the expected distribution (AKA reference or initial distribution) and the target distribution (AKA new or actual distribution) are then utilized to calculate PSI as follows:
where, $B$ is the total number of bins, $ActualProp(b)$ is the proportion of counts within bin b from the target distribution and $ExpectedProp(b)$ is the proportion of counts within bin $b$ from the reference distribution. Thus, PSI is a number that ranges from zero to infinity and has a value of zero when the two distributions exactly match.
Practical Notes: The rules of thumb in practice regarding PSI thresholds are that if: (1) PSI is less than 0.1, then the actual and the expected distributions are considered similar, (2) PSI is between 0.1 and 0.2, then the actual distribution is considered moderately different from the expected distribution, and (3) PSI is beyond 0.2, then it is highly advised to develop a new model on a more recent sample [1,2]. Also, since there is a possibility that a particular bin may be empty, PSI can be numerically undefined or unbounded. To avoid this, in practice, a small value such as 0.01 can be added to each bin proportion value. Alternatively, a base count of 1 can be added to each bin to ensure non-zero proportion values.
PSI is typically used in financial services as a guidepost to compare current to baseline populations for which some financial tool or service was developed. For example, the use of credit scoring tools has proliferated in the banking industry to evaluate the level of credit risk associated with applicants or customers. Such tools provide statistical odds or probabilities that an applicant with a given credit score will pay off their credit. In the context of credit scoring, it is crucial to study the effects of changing populations or irregular trends in application approval rates. Similarly, abnormal periods where the population may under- or over-apply in line with regular business cycles are also important. PSI helps quantify such changes and provides a basis to the decision-makers that the development sample is representative of future expected applicants. Identifying distributional change can significantly impact the maintenance of tools capable of accurate lending decisions.
While there are no explicit resources that we found on the rationale of using PSI, we conjecture that PSI usage stems from multiple factors as listed below:
With the ongoing adoption of machine learning models and systems in financial services, PSI has gained popularity as a model monitoring metric — we only expect this trend to continue as model portfolios grow and the MLOps lifecycle becomes standardized within organizations.
The Kullback-Leibler divergence or relative entropy is a statistical distance measure that describes how one probability distribution is different from another.
Given two discrete probability distributions $A$ (actual), and $E$ (expected) defined on the same probability space, KL divergence is defined as:
An interpretation of KL divergence is that it measures the expected excess surprise in using the actual distribution versus the expected distribution as a divergence of the actual from the expected. This sounds a lot like the reasoning behind using PSI! While KL divergence is well studied in mathematical statistics  and has a lot of references to academic work [1,2], PSI is domain-specific and lacks concrete literature on the history of its usage within financial services. In the following, we illustrate how PSI can actually be viewed as a special form of KL divergence.
Consider the PSI formula and let us look at the proportion of counts within a bin $b$ for the actual distribution $ActualProp(b)$ as the frequentist probability $PA(b)$ of the variable appearing in that bin. The same applies to the expected distribution.
Then, we can rewrite the PSI formula as:
On expanding further,
Thus, PSI can be rewritten as:
which is the symmetrized KL divergence!
We hope you enjoyed this overview of PSI. Don’t forget to check out our blog on detecting intersectional unfairness in AI!