Model Monitoring Tools and Processes

Table of content

Artificial intelligence (AI) and machine learning (ML) have the potential to make a positive impact across all industries. However, volatile risks persist when developing and utilizing these relatively new technologies.

The core problem that ML/AI engineers face is that most model behavior happens in a black box. This makes it extremely difficult to detect model bias and other flaws that arise during production — and even after deployment. If we are to maximize the potential of AI, a significant amount of effort needs to be directed towards monitoring model performance.

So, what is good model performance? And how can data science and engineering teams work to eliminate detrimental errors brought about by model bias? Many can agree that the measure of a truly successful ML model extends beyond a 99% accuracy rate. To completely understand why your ML models are making certain decisions — and if they’re making those decisions correctly — comprehensive insight is needed at every step of the ML lifecycle. However, this is impossible without the right tools and processes in place.

Acknowledging the complexity and urgency of this topic, we created this guide to examine how to monitor a machine learning model using the right ML model monitoring tools and processes.

In the following sections we will cover:

What ML model monitoring looks like, and why it’s needed
The impact and importance of a machine learning operations (MLOps) framework
How to choose the right tools to elevate and empower your ML monitoring capabilities

What is ML model monitoring?

ML model monitoring is a series of processes that are deployed to evaluate established model performance metrics and examine when, why, and how issues develop with ML models. Ultimately, ML model monitoring is a key component of ML observability, pushing us towards a deeper understanding of how model data and performance function across a complete lifecycle.

Some common focus points of model monitoring include:

Checking for model and data drift
Reviewing the model’s overall performance
Monitoring outliers
Upholding data quality
Testing for bias

For example, during the early stages of development, ML monitoring practices are used to evaluate model behavior and identify potential bias. This process involves collecting robust data that accurately represents the model’s diverse data set. Gathering high-quality data during this initial monitoring phase has a crucial impact on the model’s post-deployment performance.

Monitoring for potential bias in beginning training stages is essential for ensuring fairness in ML models and allows teams to quickly identify risks and malfunctions that could impact a deployed platform. In the end, monitoring processes like this foster greater accuracy and enable an improved user experience.

Why is model monitoring needed after deploying the model into production?

Metrics

MLOps framework

ML model monitoring tools

We’ll take a look at each of these elements below, and explain how they work together to properly assess the performance of a ML model.

How do you measure the performance of a model?

Machine learning is anything but transparent, so how do we know if the model is good enough? We have four words for you; metrics, MLOps, and monitoring tools. Let’s start with metrics.

There are several types of metrics used to evaluate the performance of an ML model. Although each metric plays a specific role in ML performance evaluation, it is important to note that the ways these metrics are used often fluctuate to cater to a specific use case.

In total, there are five categories of model monitoring metrics that are used to measure machine learning performance:

1. Classification metrics

These metrics are used to determine the model’s classification abilities and segment large amounts of data into discrete values. Here are a few examples of classification metrics:

Accuracy
Precision
Recall
Logarithmic loss
F1-score
Receiver operating characteristic
Area under curve

2. Regression metrics

Regression metrics are designed to predict continuous values. For example, linear regression is a common technique used to depict a relationship between an established target variable and a predictor. Like classification, there are several types of regression metrics used in ML monitoring, including:

Mean squared error
Mean absolute error
Ranking metrics
Mean reciprocal rate
Discounted cumulative gain
Non-discounted cumulative gain

3. Statistical metrics

Determining which statistical metrics to use depends on the type of dataset being evaluated and the probability space you’re working in. That being said, there are a few common types of metrics used throughout ML monitoring, like:

Correlation
Computer vision metrics
Peak signal-to-noise
Structural similarity index
Intersection over union

4. Natural language processing (NLP) metrics

These metrics are used to measure a ML model’s approach to different language tasks. This can include a number of things, like evaluating how well the model translates from one language to another or testing its understanding of linguistic skills like grammar and syntax. Here are a few examples of these metrics:

Perplexity
Bilingual evaluation understudy score

5. Deep learning related metrics

Although deep learning is a very broad subject, all deep learning metrics function to identify the essential effectiveness of a ML application’s neural networks. The two metrics listed below are fairly common across all ML models:

Inception score
Frechet inception distance

So, which measure of model performance is most appropriate? Really, the range of metrics and variables used to assess ML models is hyper-specific and varies in every scenario. Even though there is no specific set of metrics to be used in all ML monitoring cases, knowing how these metrics generally apply to the model monitoring process is absolutely essential to truly evaluating a model’s performance.

The most successful model monitoring techniques use a combination of these metrics and ML model monitoring tools to create a comprehensive MLOps framework. In the next two sections we’ll explore this concept in greater detail, and explain what MLOps looks like and how the right tools empower the monitoring process.

Using MLOps

MLOps is intended to help teams outline their structure for developing, implementing, and monitoring machine learning models. At its core, an MLOps framework is meant to encourage greater collaboration between ML/AI engineers, data science teams, and technical operations professionals. Because when each of these groups seamlessly interact, fewer mistakes are made and greater innovations are achieved.

Each stage of the MLOps lifecycle helps organizations develop a model-making process that provides complete transparency into each stage of the ML workflow, enabling teams to detect potential roadblocks and make adjustments before and after deployment.

Here is a brief selection of the various challenges MLOps can address:

Limited cross-functional collaboration

In the past, operational and data science teams have been siloed, causing miscommunication and increased project gridlocks. Using an MLOps framework, teams are able to seamlessly work together to quickly solve and prevent issues. MLOps also combines business and technical perspectives to bring greater structure to every section of the operational workflow.

Compliance issues

Machine learning is still a young field, and one that is constantly developing. Naturally, this causes laws and regulations to fluctuate as well. An MLOps methodology allows you to stay organized and ensure that your algorithms adhere to the latest AI regulations. Additionally, MLOps supports improved regulatory practices and ascribes to a strict model governance framework.

Risks of using open source monitoring tools

Choosing the right model monitoring tools

Now, how do monitoring tools fit into an MLOps methodology? At a high level, using these tools enables a fully effective MLOps framework. There are several closed and open source model monitoring tools available. To achieve desired results, you should pursue ML model monitoring tools that offer the following features and functionalities:

Dashboards for monitoring, detecting, and notifying the user of data quality issues and performance degradation
Intuitive user interfaces that allow a shared view between teams
The ability to catch and fix model inference violations in real-time
Detection capabilities that identify outliers and quickly assess which ones are caused by specific model inputs
Ability to monitor rare and nuanced model drift caused by class imbalance
The capability to pinpoint data drift and contributing factors to know when and how to retrain models
Extensive explanations into the ML issues behind changes in model operation statistics and any corresponding alerts
The ability to quickly detect inputs that are outside the bounds of normal queries, including adversarial attacks, and maintain high performance
Functionalities to support a thorough model validation process
Access to past, present, and future model outcomes
Functionalities that allow you to slice and compare model performance metrics across point predictions, as well as local and global data
Seamless integration with existing data and AI infrastructure
Maximum SOC2 compliant security

Although there may be temptation to mix and match different open source monitoring tools, there is a significant lack of explainability involved with this approach. When jumping between multiple platforms, your data quality can quickly become compromised, while a lot of precious time is wasted on troubleshooting.

Using a single, enterprise-grade monitoring platform allows you to streamline your machine learning operations and quickly identify the root causes of issues with Explainable AI.

For example, the Fiddler AI Observability platform (formerly known as Model Performance Management) gives ML and data science teams centralized model monitoring and explainability, delivering actionable and immediate insights into how your model is functioning. Let’s explore the capabilities of an AI Observability tool in more detail below, and check out our MPM best practices for more tips.

Creating an ML model monitoring framework

Empowering machine learning operations with AI Observability and explainable AI

Optimizing MLOps with an AI Observability platform helps teams continuously monitor and improve model performance throughout a model’s lifecycle. This allows for greater visibility, improved model risk management, better model governance, and much more.

Until recently, many ML/AI teams have relied on manual processes to track production model performance and issues, making it extremely time-consuming and difficult to identify and attribute root causes and resolve issues. Additionally, many teams struggle with siloed model monitoring tools and processes that prevent collaboration.

To combat these issues, an AI Observability platform acts as a control system at the center of the ML lifecycle. The unified ML model monitoring dashboard delivers deep insights into model behavior, and enables multiple teams to easily mitigate issues at every stage. Here is a visual representation of how AI Observability works:

But what does AI Observability look like in practice? Let’s use model bias as an example. Bias can occur at any stage of the model development pipeline. Data bias, modeling bias, and bias in human review all put a ML model at risk. Using an AI Observability platform, model bias can be detected immediately, and the issue can be resolved before real-world problems occur. Fiddler’s comprehensive analytics alert all stakeholders, telling them exactly where and why issues are arising, fostering improved accuracy and increased transparency.

Ultimately, with the right processes and tools in place, we can work to create more purposeful, impactful, and responsible AI. Learn more about our cutting-edge model monitoring tooling. Request a demo today.

Model monitoring tools and processes