Back to blog home

Human-Centric Design For Fairness And Explainable AI

This blog post is a recap of the recent podcast hosted by the MLOps Community

There’s no such thing as a successful machine learning (ML) project without the thoughtful implementation of MLOps. 

But MLOps is not a one-off installation. It’s a process, a swiss-army toolset of algorithms and applications that are added iteratively and continually as the model matures along the MLOps lifecycle and the team learns what’s best suited to the use case, with each iteration providing feed-forward information and insights to the next.

For all their sophistication, ML tools don’t make business decisions; the users do. MLOps provides supporting data and users interpret it. That’s what makes the integration of human users into the ML ecosystem as critical as any other component, and it’s why nextgen MLOps tools must reflect and embrace that reality by design.

MLOps as the foundation of responsible AI

Of course, MLOps isn’t an end goal in and of itself. Early on, it consists of the critical performance monitoring, tracking, and explainability tools the data science team requires to train, evaluate, and validate models. Importantly, those tools provide a foundation to build upon iteratively over the model lifecycle — a foundation for incremental addition of tooling, for establishing a feedback loop to improve subsequent iterations, and for earning users’ trust in the ML process.

As the model matures, additional explainable AI (XAI) algorithms are implemented and tweaked to provide users with insights about why the model makes particular recommendations. Other tools are added to provide anti-bias safeguards and monitor model output for fairness. Because  characteristics of real-world input data inevitably evolve, tools are implemented to detect model drift before its effects impact the business. The exact KPIs and algorithms vary, but these are all key elements of the ultimate aspiration for MLOps: building a Responsible AI (RAI) framework to ensure AI fairness, maximize transparency, and maintain users’ trust in both the tools and the model.

In the meantime, the need to establish trust in those tools is strong enough that new Fiddler users will often set all alert thresholds to “zero”, just so model monitoring alerts trigger easily and frequently, and they can experience all the available notifications for themselves.

That’s just the start of course. Trust isn’t ‘installed’ all at once. Before it can be maintained, it must be built incrementally and reinforced through time as users repeatedly use the tools and all elements of the project iterate through experimentation, learning, and updating.

Responsible AI is built incrementally too and is only realized as the model approaches peak maturity, yet it’s key to the continued success of any ML initiative, and of the business it supports.

ML drives business goals

As important as users and tools are to each other, it’s easy to lose focus on the big reason you invested in ML in the first place: optimization of business outcomes. The whole purpose of MLOps is to ensure that the model is supporting just that through reliable model monitoring. To do so, it must provide tools, alerts, and insights to more than just the data science team. It must tailor information for the entire spectrum of users and stakeholders who make business decisions based on them — both when things are going wrong and when things are going right.

That‘s why there’s a growing interest in using XAI to provide insights that bridge the gap between what a model sees and what humans think it should see. At the same time, we’re realizing that what constitutes useful and actionable insights from XAI is highly dependent on the user, on their own areas of interest, their own perspective and priorities, and their own professional lingo. 

As the model evolves, so too does the size and functional diversity of the user base, and well-designed tools, particularly XAI, must keep pace to deliver contextually relevant information. Raw KPIs from the MLOps stack won’t be sufficient for the business users upstairs. 

In fact, the most dramatic disconnect is often with the C-suite — between what raw ML metrics tell us and how they translate, or don’t, to business KPIs needed by executives. Despite their direct connection to the bottom line, raw model performance metrics and native XAI reports are meaningless to business stakeholders. It’s one thing to tell a data scientist the Population Stability index (PSI) is high, and entirely another to tell the CFO. 

But it’s no less important. 

Therein also lies the central challenge of calculating ROI from ML initiatives: how can you directly infer business KPIs from raw model metrics to determine their impact on business outcomes? The solution — deliver the KPIs each user understands.

Human-centric design

Human-centric design demands empathy for all users, so the same complex alerts and raw ML KPIs provided to data scientists must be available to users in other key roles as well, and in a format tailored to inform business decisions, or something appropriate to their particular sub-discipline. 

The quality of each user’s decision-making is highly dependent on the quality of information at their disposal. Sure, it’s the job of the MLOps tools to draw users’ attention to performance issues, and through XAI to help them understand how the model is performing. But that still leaves humans to interpret what the instrumentation is trying to tell them.

No matter how extensively you automate operations, or how refined the presentation of your XAI interface is, humans still make decisions by interpreting that information through the lens of their own experience, preconceptions, and skill set. They introduce their own cognitive bias and subtle personality differences into the decision-making process.

IT users know what to do when alerts tell them server resources are maxed out, but the monitoring and XAI tools in the MLOps stack aren’t so cut and dried. They suggest more complex, more consequential decisions, serve a broader, cross-functional coalition of users, and are far more susceptible to interpretation errors.  

In image classification, for example, post-hoc explanations like a heat map overlay can help users visualize the regions of an image the model focused on to identify something — let’s say a bird. The heat map is explanatory, but also introduces the risk that we’ll impose our own biases on why the model saw a bird.

So if the heat map shows that the model focused on a region containing the beak, we might assume that to be the identifying feature, rather than adjacent features, boundaries, or contours that may actually have driven the model’s results.

Assumptions can lead to bad decisions, which can have unanticipated side effects that impact the business. Scientists at Fiddler think a lot about the most effective presentation of dashboard information, to minimize ambiguities and maximize clarity, asking “What’s most understandable graphically?” or “What is better presented as text”, and considering what can be improved at each point of human-machine interaction, like, “how can we target each alert to only the need-to-know stakeholders”. 

So what options do designers have for making tools more human-centric?

To tailor information to business users, Fiddler provides a translation layer empowering ML teams to draw a linear connection between model metrics and business outcomes, providing them with rich diagnostics and contextual insights into how ML metrics affect model predictions, which in turn influence business KPIs.  

Alerts are their own challenge. Alert fatigue and cognitive overload are challenges faced by designers of any monitoring system. One approach is to create a higher-level design framework that categorizes alerts into different bins, such as data problems or model problems. This allows users to quickly understand the nature of the alert and direct it only to the appropriate team or individual.

You can also improve the selectivity of recipients by segmenting the whole ML pipeline into areas of interest that align with a defined subset of stakeholders. In some instances, machine learning can be used to classify alerts according to their attributes, albeit with great attention to pitfalls; this approach amounts to a "one-size-fits-all" approach that risks not capturing outliers or rare alerts.

Ultimately, addressing alert fatigue and cognitive overload is a complex problem that requires a multifaceted approach. It involves understanding the users’ needs and the nature of the alerts, as well as infusing domain knowledge and considering the trade-offs between different solutions.

Summary

There’s no getting around it. Users and the decisions they make are critical path when things go wrong. That’s reason enough to take a human-centric approach seriously. 

But even when things are going right, the new approach to MLOps means the functionality of XAI must extend beyond merely explaining the model's decisions. It must also improve users’ understanding of the model's limitations and suggest ways to use it in a more responsible and ethical manner. The potential for human bias also highlights the importance of training users in the XAI interface — a notable deficit of unmanaged open-source tools. Getting value from XAI tools requires educated interpretation by users and an awareness of the limitations and assumptions behind a particular approach.

In the ML ecosystem, it’s hardly surprising then that the solution to challenges arising from the human-machine interface lies in a human-centered approach — one that not only includes the technical aspects of XAI but also the business, social and ethical implications of user decisions. 

Read tech brief to learn how Fiddler does XAI.