Why is model monitoring important?

5

 Min Read

The world is shifting towards heavier use of artificial intelligence (AI). What used to be solely contained in science fiction movies is making its way into everyday life. Within AI is a complex subset known as machine learning, which focuses on using intricate data and algorithms to imitate how we human beings make decisions. 

AI and machine learning have unique challenges that require unique solutions. Model degradation is unavoidable; machine learning models will degrade over time, become less accurate, and perform worse. This is primarily due to "concept drift," a phenomenon described in a Cornell University study as "Unforeseeable changes in the underlying distribution of streaming data over time."

That is where model monitoring comes in. The importance of monitoring comes down to the accuracy and consistency needed to implement machine learning successfully. Model monitoring identifies issues like data drift, negative feedback loops, and model inaccuracy, to name a few. If not corrected, these issues will turn into revenue losses, regulatory risks, and a myriad of other problems. 

What is model monitoring?

First, what is a model as it pertains to machine learning? A machine learning model is the output of an algorithm trained to  analyze specific data. Models are trained with baseline data sets that have been labeled to guide the model’s decisions. Once the model has been adequately trained, it is run  with a data set it has never interacted with before. The model would then make predictions about that new data set from the information gathered from the training set. 

Monitoring is a way to track the performance of the model in production. Think of this as quality assurance for your machine learning team. By closely monitoring how the model is performing in production, a variety of issues, such as model bias, can be remedied. This makes each version of your machine learning model more precise than the previous version, thus, delivering the best results. 

The great news is that you are not alone in this endeavor. A whole host of machine learning model monitoring tools are available. With AI performing so many tasks previously done by humans, it is absolutely essential to create responsible AI. Partnering with an organization like Fiddler can provide you with the tools necessary to accurately monitor your models in production and build trust into AI. 

How do you monitor a production model?

One of the most effective ways to accurately monitor a model in production is to keep up with consistent evaluations of the model's performance on real-world data. That, however, is not enough to achieve optimal results. You can set specific parameters or "triggers" for significant changes in key metrics you are tracking to take it a step further. These triggers help alert machine learning or data sciences teams that models may need to be re-trained to solve for model drift issues. While there is not a "best way" to monitor, there are some helpful model techniques or best practices: 

  • Labeled data: If you have labeled data, model drift can be identified with performance monitoring and supervised learning methods. We recommend starting with standard metrics like accuracy, precision, False Positive Rate, and Area Under the Curve (AUC). You may also choose to apply your own custom supervised methods to run a more sophisticated analysis. Learn more about these methods in this review article.
  • Unlabeled data: If you have unlabelled data, the first analysis you should run is some sort of assessment of your data’s distribution. Your training dataset was a sample from a particular moment in time, so it’s critical to compare the distribution of the training set with the new data to understand what shift has occurred. There are a variety of distance metrics and nonparametric tests that can be used to measure this, including the Kullback-Leibler divergence, Jenson-Shannon divergence, and Kolmogorov-Smirnov test. These each have slightly different assumptions and properties, so pick the one that’s most appropriate for your dataset and model. If you want to develop your own unsupervised learning model to assess drift on unlabelled data, there are also a number of models you can use. Read more about unsupervised methods for detecting drift.
  • Root cause analysis: A good place to start is to check for data integrity issues with your engineering team. Has there been a change in your product or an API? Is your app or data pipeline in a degraded state? The next step is to dive deeper into your model analytics to pinpoint when the change happened and what type of drift is occurring. Work with the data scientists and domain experts on your team to understand the shifts you’ve observed. Model explainability measures can be very useful at this stage for generating hypotheses.
  • Shadow Mode: Before a new model is deployed, it is advised that you launch it in shadow mode first in order to track it alongside the current model to gauge and predict performance. This allows for each model to improve upon the last.

Try Fiddler to better understand how continuous model monitoring and explainable AI can help:

  • Detect, understand, and remedy data drift issues.
  • Illuminate the “why” behind decision-making to find and fix root causes.
  • Ensure continued performance by monitoring for outliers.