Model monitoring framework

Table of content

Machine learning (ML) and AI have the potential to drive incredible innovation and efficiency, but stakeholders also have concerns about the impact if ML models don’t work as intended. These fears range from violating AI regulations to damaging their organization’s reputation and negatively affecting human lives. According to a 2021 survey, respondents worry about the following consequences of AI bias:

56%

fear they may lose customer trust.

50%

are concerned their brand reputation may suffer, resulting in media and social media backlash.

43%

worry about increased regulatory scrutiny.

42%

have concerns regarding a loss of employee trust.

37%

are concerned that ML bias will conflict with personal ethics.

25%

fear the possibility of lawsuits.

22%

worry about impacts on profits and shareholder value.

To avoid unintended consequences and catch issues early on, ML teams should establish a model monitoring framework from the start. In this article, we’ll cover what is monitoring in machine learning, why it matters, and how a model monitoring framework can set your ML solutions up for success.

What is ML model monitoring?

A subset of AI model monitoring, ML model quality monitoring is a set of efforts that have the shared goal of making models more accurate and effective. It’s essential at every step of the process, from the early stages of development and training to deployment.

What are some of the considerations for model monitoring and maintenance?

Proactive model monitoring techniques can reduce downtime, ensure model performance, and improve model effectiveness. Machine learning models are susceptible to data drift, propagating biases, and degrading in performance. And since models are constantly ingesting new data, model monitoring methods have to keep up.

Model monitoring isn’t a set-it-and-forget-it endeavor, but it no longer needs to feel like an impossible task. Models are often opaque and complex, making it difficult to understand the “how” and “why” behind their predictions. With the right monitoring methods and tools in place, ML teams can take a look inside to understand what is causing issues and how to fix them.

What is a model monitoring framework?

One of the challenges that businesses face is that data teams often work in silos. A well-designed model monitoring framework breaks down these silos and enables better communication and collaboration when problems first arise — before they grow into something even harder to deal with. 

As one of the leading model monitoring best practices, developing a framework creates a feedback loop that involves all members of machine learning operations (MLOps) teams, such as data scientists, ML engineers, and business operations. Using an AI Observability platform, each stakeholder can access model alerts and insights to help root cause analyze any model issues.

Why do we need model monitoring?

Left to their own devices, machine learning models can make incorrect inferences from patterns and develop biases that may harm your reputation and end users. In other words, models are not immune to developing human prejudices. How does a ML model work, and what makes them susceptible to developing model bias? There are several reasons why models can become biased, including:

  • Improper Training: Your model’s initial training can set it up for success or failure. When ML models are trained using biased data, they will continue to propagate that bias. For example, training a hiring model with only profiles from past and current employees could lead to discrimination based on previous hiring bias.
  • Working from Biased Data: Machine learning models are only as good as their data. If the data they are ingesting is incomplete, inaccurate, or biased, then your model will develop incorrect assumptions. For example, oversampling a certain population in a survey may train your model to pay a disproportionate amount of attention to that population.
  • User-Generated Data and Feedback Loops: Users have their own bias, and machine learning models can pick up on those patterns. For example, if there are a lot of people searching for homes significantly out of their budget, then a real estate listing algorithm may promote listings that few people can actually afford.
  • Unintended Pattern Recognition: Sometimes, models pick up on the wrong patterns. At a minimum, this reduces the model’s effectiveness; at worst, your model may make unlawful decisions that put your business at risk for fines and public backlash. For example, your model may determine that people around retirement age make up a smaller percentage of active workers. Your model could then act on the assumption that it shouldn’t accept any resumes based solely on the age of the applicant. This could leave your business open to lawsuits for age discrimination.

What happens when an ML model doesn’t work right?

For businesses that rely on machine learning models for day-to-day operations as well as innovation, inaccuracies can have disastrous consequences. For example, in the following survey, 36% of businesses reported being negatively impacted by machine learning bias. Out of the businesses that were affected:

62%

lost revenue

61%

lost customers

43%

lost employees

35%

incurred legal fees from lawsuits

6%

lost customer trust

Even enterprise-scale organizations working in the most prominent industries are at risk of suffering from machine learning biases. For example, the U.S. healthcare system, Facebook, and Amazon had to correct their ML models to account for AI fairness:

U.S. healthcare system

In 2019, a study found that a healthcare risk-prediction algorithm — which was used to evaluate over 200 million people in the US — demonstrated racial bias. The root of the issue was that the proxy data set contained patterns that reflected disparate care between white and black Americans. 

In particular, the algorithm focused on how much patients spent on healthcare in the past to determine their current risk for chronic conditions. Using this spending data, the algorithm determined that since white patients spent more on healthcare, they were more likely to be at risk for chronic illnesses. In reality, black patients spent less on healthcare due to a variety of factors that were unrelated to their actual symptoms — like their income levels and confidence in the healthcare system. 

This bias made it more challenging for black patients to receive care for chronic conditions, even though they had a high level of need. This not only harmed patients, but also weakened confidence in the fairness of the healthcare system.

Facebook

In 2019, Facebook neglected to enforce constitutional requirements that prevent advertisers from directly targeting audiences based on gender, race, religion, and other protected classes. During this period, Facebook’s algorithm learned that advertisers were targeting these protected classes for different products and services, such as real estate. As a result, Facebook’s ad algorithm reflected the bias of advertisers and prioritized showing real estate ads to white audiences over members of minority groups. This limited housing opportunities for groups who have historically had limited chances for owning property.

This learned bias violated the Fair Housing Act, and as a result, the U.S. Department of Housing filed a lawsuit against the social media company.  

Amazon’s hiring algorithm

In 2015, Amazon realized that its new automated job candidate review system had a noticeable gender bias. The issue began in the model’s training. After analyzing the application patterns throughout a 10-year period in Amazon’s history, the model recognized a pattern that most past team members were men.

Acting on this pattern, Amazon’s machine learning model identified and rejected women’s resumes. Graduates of women’s colleges and members of gender-specific extracurricular activities, such as a women's soccer team, were affected by this bias. These errors rejected candidates that could have had valuable experience and portrayed Amazon and the tech industry as a whole in a negative light. 

How to monitor a machine learning model

An AI Observability platform makes it possible for each member of your MLOps team to identify and resolve model issues efficiently and at scale. From a unified dashboard, team members can uncover and share insights, and perform root cause analysis to understand how and why models make the predictions they do. Fiddler’s AI Observability platform features best-in-class machine learning model monitoring tools, including:

  • Performance Monitoring
  • Drift Detection
  • Quality Checks
  • Custom Alerts
  • Ground Truth Updates
  • NLP and CV Monitoring

Having a dedicated AI Observability platform reduces your “time-to” factors: your time to market, your time to value, and your time to resolution.

How do you check the performance of a model?

Model monitoring metrics make it clear whether your model is performing properly or not. The five key metric categories are:

  • Classification metrics
  • Regression metrics
  • Statistical metrics
  • Natural language processing (NLP) metrics
  • Deep learning related metrics 

Depending on your specific project, these metric groups will have varying priorities. For example, if you are running a forecasting model, statistical accuracy is imperative.

How do I monitor my ML model? 

To monitor your machine learning model, you should develop a framework that includes all relevant stakeholders. A comprehensive model monitoring framework has three key elements: your models, your teams, and an AI Observability platform that gives everyone the access they need to stay informed and connected.