Season 1 | Episode 5

How to Identify ML Drift Before You Have a Problem

‍

In this episode of Safe and Sound AI, we dive into the challenge of drift in machine learning models. We break down the key differences between concept and data drift (including feature and label drift), explaining how each affects ML model performance over time. Learn practical detection methods using statistical tools, discover how to identify root causes, and explore strategies for maintaining model accuracy.

Read the article by Fiddler AI and explore additional resources on how AI Observability can help build trust into LLMs and ML models.

About the guest

Transcript

[00:00:01] Welcome to Safe and Sound AI.

[00:00:03] Today we're tackling something, uh, really critical if you're working with ML models in production: model drift.

[00:00:08] Yeah, absolutely. It's something everyone eventually runs into.

[00:00:11] We wanna dig into why even models that seem super accurate at first can well degrade over time. It impacts things like fraud detection, loan approvals, even ad targeting.

[00:00:23] Right. And it can cause some serious issues if you're not watching for it. So think of this as getting a handle on how to keep those models performing well.

[00:00:31] Exactly. It's basically because, the data coming in starts to look different from what the model originally learned on.

[00:00:37] Precisely. It's not usually a flaw in the original training.

[00:00:40] More like the world changes, you know, the data patterns shift.

[00:00:43] So our goal today is to kind of unpack the different ways drift shows up

[00:00:47] And how to spot it, how to detect it effectively,

[00:00:50] And crucially what you can actually do about it.

[00:00:53] Right. Let's get into it.

[00:00:55] Okay. So drift isn't just one single thing. We usually talk about two main categories, right? Concept drift, and then data drift, which includes feature drift and label drift.

[00:01:05] That's the main breakdown. Yeah, and it's worth remembering, they're not always separate. You can definitely have both happening at once.

[00:01:11] Good point. So concept drift.

[00:01:13] It's really about a change in the actual relationship between the inputs, the features, and what you're trying to predict: the outcome.

[00:01:21] Let's use a loan application example. A model looks at income, credit score, age, stuff like that to predict risk.

[00:01:28] Right. And it might work perfectly fine when the economy is stable.

[00:01:32] But then uhoh, maybe a big recession hits.

[00:01:34] Suddenly that same income level, that same credit score, it might mean something different in terms of risk. The underlying concept of credit worthiness has shifted because of the economy.

[00:01:44] So the models learned rules aren't quite right anymore. The boundary it drew between approve and reject is now in the wrong place for the new reality.

[00:01:53] Precisely. Even if the applicant's details look similar on paper to ones approved before, the economic context changes the outcome, that's concept drift.

[00:02:02] It's like the definition of risky has changed.

[00:02:04] You got it. The model becomes outdated even if the input data looks superficially similar.

[00:02:10] Okay, that makes sense. Now, what about data drift?

[00:02:13] Data drift is, uh, a bit more general. It just means the statistical properties of the data your model sees in production are different from the training data.

[00:02:22] So the data distribution changes.

[00:02:24] Exactly. And the key difference is this change in distribution might or might not actually affect that core relationship we talked about with concept drift.

[00:02:32] Okay. And you mentioned two main types, feature drift and label drift.

[00:02:35] Yep. Let's take feature drift first. This is about changes in the distribution of the model's inputs. So PX, the probability of seeing certain input features.

[00:02:45] Imagine your bank runs a big marketing campaign in say, Texas. Suddenly you get way more applications from Texas than you used to.

[00:02:54] Ah, so the distribution of the state or region feature has changed dramatically.

[00:02:58] Exactly. Even if the income or credit scores within Texas are similar to your training data, the mix of inputs has shifted.

[00:03:06] That's feature drift.

[00:03:07] Got it. That makes sense. So that's one type. What's the other main kind of data drift?

[00:03:11] The other key type is label drift. This focuses on the distribution of the model's predictions, the PY.

[00:03:17] So if our loan model suddenly starts predicting approve much more often than it used to

[00:03:22] That could be labeled drift. You're seeing a shift in the proportion of predicted outcomes, maybe more approvals, maybe more rejections compared to the baseline.

[00:03:31] And that could be caused by feature drift, like if suddenly only super qualified people started applying.

[00:03:37] It could be, yes, feature drift can definitely lead to label drift, but label drift is worth monitoring on its own, as it can sometimes signal other issues too. It tells you something has changed in the output pattern.

[00:03:48] Okay, so drift happens. Concept drift, data drift . How do we actually spot this in a live system? Does it happen slowly or all at once?

[00:03:58] It really varies. Sometimes it's abrupt. Think about, um, the start of the COVID-19 pandemic. Consumer behavior changed almost overnight. That caused sudden massive drift for many models.

[00:04:08] Right? I remember reading about that huge impact

[00:04:10] But other times it's really gradual, a slow creep over months. Or it could even be cyclical, maybe seasonal patterns.

[00:04:16] Which means we need to be constantly looking out for it.

[00:04:19] Absolutely continuous monitoring is key, and how we detect it often boils down to whether we have ground truth labels for the new data.

[00:04:27] Okay. So if we do have labels, we eventually find out if the loan was actually good or bad, for instance.

[00:04:33] Then you can rely on standard performance metrics, track your accuracy, precision, maybe false positive rate, AUC, whatever makes sense for your model.

[00:04:42] And if those metrics start to consistently drop, that's a big red flag for drift.

[00:04:47] Definitely. You might even build specific models just to detect that kind of performance degradation. It's like a supervised learning problem on top of your main model.

[00:04:55] But what if we don't have labels right away, like in real time fraud detection?

[00:05:00] That's common. In that case, you have to shift focus to the data distributions themselves.

[00:05:04] You compare the stats of your incoming data to your original training data.

[00:05:07] And there are statistical tools for this. You can use distance metrics like Kullback-Leibler Divergence, or Jenson-Shannon divergence to see how different two distributions are,

[00:05:16] or statistical tests

[00:05:18] Yep. Things like Kolmogorov-Smirnov test. It helps tell you if two samples likely came from the same underlying distribution. Each test has different strengths and assumptions, so you choose what fits.

[00:05:30] You could even potentially build unsupervised models just to flag these distributional shifts, I suppose.

[00:05:35] For sure. That's another approach. The main goal is just detecting that something has changed in the data stream compared to what the model was trained on.

[00:05:43] Okay, so we've detected drift. Alarm bells are ringing. But just knowing it's happening isn't enough, is it?

[00:05:51] Not at all. That's really just the first step, the critical next step, and often the harder one is figuring out why. What's the root cause?

[00:05:58] Because the solution depends entirely on the cause.

[00:06:01] Absolutely. And the causes can be varied. It might be a genuine change in the real world, like that economic shift causing concept drift.

[00:06:08] Or maybe just changes in how people are using the product leading to feature drift.

[00:06:12] That happens too. But crucially, we also need to consider data integrity issues.

[00:06:17] Ah, like bugs yeah. It's breaking

[00:06:19] A bug in the frontend. Data. Capture an error in how data is transformed in the backend. Maybe an API change broke something, or just general pipeline degradation. Any of those can look like drift.

[00:06:32] But it's not really reflecting a change in the underlying patterns just bad data getting through.

[00:06:37] Right. So when you detect drift, the first thing you should probably do is talk to your engineering team.

[00:06:43] Check for recent code changes, product updates, known issues in the data pipeline.

[00:06:47] Mm-hmm.

[00:06:48] That kind of stuff.

[00:06:48] Yeah. Rule out the infrastructure problems first. That's often the, uh, the low hanging fruit.

[00:06:53] Okay. Let's say it's not an obvious bug. What then?

[00:06:56] Then you need to dive deeper into the model analytics. When did the drift start? Which features are most affected? Is it concept drift, feature drift, label drift. Use those statistical tools we mentioned.

[00:07:08] And then the fix depends on that root cause.

[00:07:10] Exactly. If it was a data integrity bug, fix the bug. If it was, say, feature drift due to a known product change, but the underlying concept is stable, maybe just updating some data processing or refreshing feature stats is enough.

[00:07:24] But if you've confirmed its concept drift

[00:07:26] And usually you need to retrain the model, the old relationships it learned are no longer valid. You need fresh representative data that reflects the new reality.

[00:07:36] So the big takeaway here seems to be that drift is, well inevitable.

[00:07:41] Pretty much. Any model deployed in the real world is going to face this eventually. The data just doesn't stand still.

[00:07:46] Which means awareness, continuous monitoring, and having the right tools and processes in place are just essential. It's not a nice to have.

[00:07:54] It's fundamental, especially if you think about responsible AI principles. Maintaining model accuracy and reliability over time is responsible AI. You can't just deploy and forget.

[00:08:04] Okay, so just to wrap up our deep dive today. We've looked at model drift, breaking down into concept drift, where the underlying meaning changes and data drift shifts in the data distributions, like feature or label drift.

[00:08:15] We talked about how to detect it using performance metrics, if you have labels or statistical distribution comparisons if you don't.

[00:08:21] And really emphasize that finding the root cause is key. Is it the real world changing, product usage shifting, or maybe just a data pipeline issue?

[00:08:31] Because the solution, whether it's a bug fix, a pipeline update, or a full model retrain for concept drift depends entirely on that why.

[00:08:40] It really underscores that ML in production is a dynamic ongoing process. Constant vigilance required.

[00:08:46] So maybe a final thought for you to consider is how dynamic is the data your models rely on, and are your current processes really equipped to handle the inevitable evolution of those patterns over time?

[00:08:57] This podcast is brought to you by Fiddler AI. For more on monitoring ML drift, or more details on the concepts we discussed, see the article in the description.