Throughout the lifecycle of any machine learning (ML) solution, numerous supporting components and optimizations might be added after deployment, without penalty. Your MLOps framework isn’t one of them.
MLOps, short for machine learning operations, is a set of tools and practices that help ML teams rapidly iterate through model development and deployment. Model monitoring and explainability are critical pieces within MLOps that help bridge the gap between opaque ML models and the visibility required by humans to understand and manage multiple dimensions of a model’s performance and inner workings.
The early deployment of model monitoring and explainable AI (XAI) during the model training phase is a key success factor for ML solutions.
With MLOps still in its infancy, a consensus around best practices, definitions, and methodologies continues to evolve. Yet there’s at least some agreement on rules critical to success. If you had to pick a single golden rule of MLOps, it would probably be the one that’s most often overlooked: model monitoring and XAI should start from day one and continue throughout the MLOps lifecycle — not just after deployment, as often happens, but in training and validation, and through every iteration in production.
Let’s look at three concrete reasons to make early inclusion of ML monitoring and explainable AI a priority in your machine learning implementation.
Benefit 1: Discovering issues before model deployment
When leadership says 'go' to a machine learning project, the conversation naturally revolves around how to solve a business problem with a model, the key characteristics needed in training data, and the critical path to a successful deployment. Even if you’re savvy about MLOps, there can be too many moving parts to think about components secondary to those core concerns.
But therein also lies the seeds of destruction. For one thing, monitoring should not be an afterthought. It’s a critical complement to the core ML solution, and it remains much misunderstood.
When deployed in the wild, a model is immediately subjected to real-world data it didn’t see in training, often resulting in model drift. That’s why, in well-oiled training and test regimens, it is common for data scientists and model builders to split their data set in proportions for training and testing. By holding back some data for testing, the team can stress the model and simulate day-one of production, inside the safe confines of the training environment.
Exact dataset size ratios vary, but this approach helps avoid overfitting, and it allows data scientists to evaluate how the model arrived at predictions as data is fed into the models by leveraging monitoring and XAI. As a result, data scientists gain a new level of insights using XAI beyond monitoring metrics, but understanding what drives changes in those values, understanding the causal chains in reasoning, and how the relative importance of data features drives predictions.
In short, monitoring models during training has all the same benefits as monitoring models in production. In combination with A/B, counterfactual, and stress testing with hypothetical extremes, ML monitoring and explainable AI can help produce a more robust model from the start.
Benefit 2: Stakeholders need visibility from the beginning
The multi-layered neural networks that comprise deep-learning solutions are notoriously opaque to human understanding — business leaders and ML teams alike.
In addition to the core development team, multiple stakeholders, many who may not be technical, benefit from human-centric insights into model performance, right from the get-go. Some industries require it.
Risk and compliance teams from these companies must evaluate and approve models before they can be deployed. While the team can see how directly-interpretable metrics, like F1 score and accuracy, respond to differing datasets across testing iterations, many stakeholders outside the core technical team need deeper, human-centered explanations that address their specific areas of concern. Explainable AI makes the inner workings of ML models transparent and observable to stakeholders in all roles.
But insights from XAI must be generated early on in development in order to address specific concerns of different parties. This is partly to inform design decisions from the start, but also to ensure the level of trust in the model required by regulators and business leaders alike.
John K. Thompson, Global Head of AI at CSL Behring, notes, “In regulated industries like pharmaceuticals, biopharmaceuticals, and finance, MLOps is actually from the inception of the model all the way through into production. We need to view MLOps that way in the future. I don't think most industries think about it that way yet.”
As the number of algorithms and types of ML models expand, XAI remains a difficult problem with vast implications. Even so, it’s a critical tool for providing insights to the wider community of stakeholders, especially as new AI regulations come into force.
Benefit 3: The enduring value of training data in deployment
MLOps is often compared to DevOps, and while there is some truth to this comparison, these two disciplines differ in critical ways. ML deployment is far more iterative, parallel, proactive, and dynamic than traditional software. And the lines aren’t so bright between the phases of model development, testing, and deployment. Even a CI/CD development framework, which is a more apt comparison to MLOps than generic DevOps, isn’t iterative in the same way continuous delivery is in ML pipelines.
Unlike traditional software, in machine learning, data, code, and models all evolve through iteration, and design decisions propagate through every phase – with multiple versions of code/model combinations branching to leverage XAI for deeper insights into model behavior across a wide range of input features.
Identifying the root cause of problems quickly and determining the best course of action to resolve them are vital goals of any MLOps deployment. They’re made possible by tools capable of leveraging insights from previous iterations to fine-tune the next. And no previous iteration is more valuable than the first — analysis from training and validation is often critical to quick resolution of production issues and the quality of insights XAI provides in subsequent iterations.
In conjunction with model tracking, versioning, and source control, model monitoring and XAI enable ML teams to better understand model behavior across widely differing inputs, make better decisions, reduce the mean time to resolution when challenges inevitably arise, and ultimately help stakeholders assess and mitigate possible risks to the organization or end user.
Build Responsible AI
Monitoring, XAI and other MLOps components are critical to implement from the very beginning of the project, enabling your team and stakeholders to make better design tradeoffs and catch issues that surface, like model bias, which KPIs alone will never find.
If you’re tasked with leading a machine learning implementation, give some thought to the benefits that multiply throughout the project by implementing monitoring and XAI from the start, and know that by doing so, you’re taking legitimate steps toward building responsible AI by design.