How do you monitor ML models in production?

5

 Min Read

Artificial intelligence (AI) seems to be everywhere these days, with engineers creating intelligent systems and algorithms to replicate human judgment and behavior. Machine learning (ML) is a subset of AI that brings a dynamic element into the equation. ML can be defined as “a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy.”

In this article, we’ll take a look at the importance of high-quality data in ML performance and the basics of model monitoring

The rise of machine learning

People everywhere engage with machine learning, whether they’re aware of it or not. Among the most common examples are major companies and their platforms—like Google, Amazon, and streaming services such as Netflix, Hulu, and Spotify. Google leverages ML to provide personalized search results, and Amazon serves its users curated product suggestions. Streaming entertainment platforms aim to cater recommendations to users’ tastes, helping them discover new entertainment they’re likely to enjoy. 

Beyond these use cases are several with higher stakes, ranging from machine learning models adopted in industries from financial services(e.g., loan approvals and fraud detection), to healthcare (to recommend treatments), and more.

In other words, the market for ML is growing, and its use cases are becoming more varied and sophisticated. From 2019 to 2025, “the global AI software market is expected to grow approximately 54 percent year-on-year, reaching a forecast size of  22.6 billion U.S. dollars.”

Especially when you consider the emerging applications of AI and ML, it begs the question: what happens when machine learning gets it wrong? If machine learning is meant to replicate human decision making, can it wind up replicating human tendencies for bias or error? The short answer is yes. 

This underscores the vital importance of incorporating an ML model monitoring framework within the design and deployment of high-performing, unbiased, and responsible AI applications.

What is ML model monitoring, and why is it important?

In order to be reliable, machine learning models require careful monitoring. The entire MLOps lifecycle — from defining objectives and collecting data, to feature extraction and training, and finally deployment and monitoring—relies on a foundation of comprehensive and trustworthy data. 

In the early stages of the MLOps lifecycle, ML monitoring validates model behavior and identifies potential bias. Success in these stages means gathering robust data, representative of an appropriately-diverse data set. The quality of data-gathering here really impacts how the model will perform, post-deployment.

Monitoring for bias in these early AI training stages is vital in order to create a well-rounded, accurate ML model and identify any issues that could potentially impact the deployed product—including its user experience. 

Why is model monitoring needed after deploying the model into production?

You should monitor machine learning models post-launch  to be aware of problems like model drift and retrain models appropriately. In addition, an in-use ML model should be monitored in order to detect and understand any performance shifts or related issues. The earlier issues can be identified and remedied, the better, and the more you examine your models, the better-equipped you’ll be to debug as needed. 

How do you do ML model monitoring?

The most straightforward approach to monitoring ML models in production is to evaluate its performance as real-world data grows, evolves, and diversifies. A number of model monitoring tools are available to help identify, understand, and remedy model performance issues. 

A ML monitoring framework should include monitoring for some or all of the following:

  • Shifts in either data distribution, model, or system performance
  • Data integrity, accuracy, and segmentation
  • Performance variation indicative of potential bias in the ML model

In addition, a robust AI observability platform will help ensure your models are performing as intended and optimized accordingly.

Which metrics should be monitored after an ML model is put in production?

Some common ML model monitoring metrics include time saved for engineering or data science teams to resolve model issues, revenue generated from improved model performance, and quicker time to models in production. 

One clear objective of any ML application in production is to demonstrate consistent, accurate performance. Quantifying anything from simple errors or oversights to fundamental bias or other recurring errors is the first step toward remedying them to improve model performance.

It’s essential to diagnose any mismatches between expectation and performance, such as data drift or concept drift. These are scenarios in which either the model’s independent (data) or dependent (concept) variables change, but the ML model doesn’t appropriately adjust to the new circumstances.

What is “responsible AI” and how does it relate to ML monitoring?

Applying a human term like “responsible” to AI might sound unconventional, but it’s actually quite important. Anyone who designs or engages with a machine learning model expects it to accurately make decisions and recommendations. And we hold it responsible when it doesn’t, expecting better. 

Responsible AI is tough to accomplish, though. This is because of the “black box” nature of many AI applications. Once a machine learning model is in production, much of what it actually does is hidden from view. At Fiddler AI , we help MLOps and Data Science teams develop responsible ML models by providing explainable AI. This enables users to understand the “why” behind a model’s decisions and overall performance. It leads to better AI validation, detection of previously-hidden bias or other error sources, and general debugging and improvement. 

You can “fiddle” with it, in other words, to get it right—resulting in a more responsible, accurate, and unbiased AI.

Request a demo to better understand t how continuous model monitoring and explainable AI can help:

  • Detect, understand, and remedy data drift issues.
  • Illuminate the “why” behind decision-making to find and fix root causes.
  • Ensure continued performance by monitoring for outliers.