At the AI in Finance Summit, NY, in December 2020, we had a panel discussion on the state of responsible AI with a group of model risk specialists from the financial services and tech industries. Below is a summary of the discussion. You can watch the full panel discussion here.
Panel Discussion: Accelerating AI Risk Mitigation with XAI & Continuous Monitoring
How is AI in Finance changing the traditional practice of model risk management?
Model risk management (MRM) is a well-established practice in banking, but one that is also growing and changing rapidly due to advancements in AI. “We run banks with models,” said Agus Sudjianto (EVP & Head of Model Risk, Wells Fargo)—a statement echoed by Jacob Kosoff (Head of MRM & Validation, Regions Bank), who added that 30% of his team’s models are now machine learning models instead of traditional statistical approaches. Innovations from Silicon Valley, such as TensorFlow, PyTorch, and other frameworks that are predominately for deep learning, have made their way to Wall Street, accelerating the adoption of AI in Finance.
The goal of MRM, also called model safety, is to avoid the type of financial and reputational harm that models can cause when they are inevitably wrong. Machine learning models pose new challenges: they are inherently very complex, and even if issues are caught before the model is deployed, changes in the underlying data can completely alter the model’s behavior.
MRM teams have to respond to these new requirements and become thought leaders in how to build trustworthy AI systems in finance. At banks, “it’s not only upskilling the quants, who have traditionally been using statistical models,” said Sri Krishnamurthy (CEO, QuantUniversity). “They have to think about the whole workflow from development to production and build out different frameworks.” Silicon Valley is approaching these problems from a holistic viewpoint as well. Tulsee Doshi (Product Lead, Fairness & Responsible AI, Google) explained that responsible AI principles covering everything from scientific excellence to fairness, privacy, and security are built into Google’s launch review process, and increasingly need to be applied to every stage of product development.
What are some strategies to implement responsible AI in Finance today?
The panelists shared some approaches they use to institute checks and balances into the model development process. At Google, Doshi said, context is everything: “How a model is deployed in a particular product, who those users are, and how they’re using that model is a really important part of the risk management process.” As an example, Doshi explained that an ML technology like text-to-speech can have some positive applications, particularly for accessibility, but also the potential for real harm. Instead of open sourcing a text-to-speech model that could be used broadly for any use case, “we want to realize where the context makes sense and prioritize those use cases.” Then, the team will design metrics that are appropriate for these use cases.
Banks experience high risks and strict regulatory guidelines, and it’s crucial to have the right guardrails in place. “In the past, the focus of data scientists was model performance and AutoML...for us, it’s very dangerous to focus on that,” Sudjianto said. At Wells Fargo, “for every 3 model developers, we have 1 independent model evaluator” reporting to different parts of the organizational chain in order to avoid conflicts of interest. After articulating the use for the model, what can go wrong, and the appetite for risk, the team will evaluate all the potential root causes for a wrong prediction, from the data, to the modeling framework, to training. “That’s why interpretability is so critical,” said Sudjianto.
To implement AI responsibly at a financial institution, having the right culture is essential. The MRM team needs to be “willing to challenge authority, willing to challenge executives, willing to say ‘your model is wrong,’” Kosoff said. And from the top-down, everyone at the company must understand that “this is not a compliance exercise, this is not a regulatory exercise”—and actually, MRM is key to protecting value.
As Krishnamurthy explained, sometimes the cultural change also means recognizing that “it’s not all about technology.” Focusing on having the latest, most sophisticated tools for deep learning systems can be dangerous for institutions just starting to move off more traditional statistical models: “You will learn how to use the tool, but you won’t have the conceptual grounding.” Instead, teams might need to take a step back, clearly define their goals for their models, and determine whether they have the required knowledge to use a black box ML system safely.
How do teams combat algorithmic bias?
Banks are accustomed to fighting bias in order to establish fair lending practices—but as financial institutions implement more AI systems across the board, they are confronting new kinds of algorithmic bias. These are the scenarios that keep our panelists up at night, worried about a model’s mistake causing news outlets and government agencies to come knocking.
For example, as Sudjianto noted, there can be marketing models that seem very innocent but actually touch on issues with privacy and discrimination that are heavily regulated; NLP is also a major landmine (“language by nature is very discriminatory”). Kosoff and Krishnamurthy gave a few more examples of potential bias, like fraud detection being more likely to flag transactions in certain zip codes, or minority customers getting a different automated call center experience.
To combat bias, teams need to consider a wide range of factors before launch, such as the model’s use cases, limitations, data, performance, and fairness metrics. Google uses “model cards” to capture all this information. “It forces you to document and report on what you’re doing, which helps any downstream team that would pick up that model and use it either externally or internally,” Doshi said. But even the best practices prior to launch can’t prevent the risk of some unforeseen change in the production environment. “We don’t know what errors we will see that we didn’t think about or didn’t have the metrics for,” Doshi said.
This is where continuous monitoring comes in. Kosoff shared an example of how monitoring has been especially critical during the COVID-19 crisis. “For fraud on a transaction for debit cards or credit cards, the most predictive variable is card present or card not present”—but during February and March of 2020, suddenly ML systems were detecting high amounts of fraud as customers switched to doing most or all of their shopping online.
What changes can we expect in 3-5 years?
In the next 3-5 years, we are undoubtedly going to see an explosion of increasingly complex modeling techniques—which will, in turn, put more pressure on monitoring and validation. So what changes can we expect from the responsible AI space in the near future?
Doshi noted that with whitepapers coming from the EU and movement from the US, Singapore, and other governments, “we’re going to see more and more regulation come out around actually putting in proper processes around explainability and interpretability.” There most likely will also be a shift in computer science education, so that students will graduate with training in model risk management and explainability.
Kosoff can imagine a future where there is a kind of “driver’s license” that certifies that someone understands the risks well enough in order to build models. As a step in this direction, Regions Bank is exploring the idea of having all new model developer hires spend their first 6 months embedded on the model risk team. Upon joining their permanent teams, “they’ll be more trained, more qualified, they’ll know more aspects of the bank, and they’ll have a strong understanding of fairness and everything we’ve talked about on model risk and model evaluation.”
Krishnamurthy pointed out that currently very few models are actually making it out of the exploration phase—but in the next few years, “the production story is going to start getting consolidated.” Krishnamurthy also believes that “some of the noise is going to subside”: the initial approach to throw deep learning models at everything will be replaced by a more sober understanding of the limitations. Finally, continuing a trend that began with 2020’s stay-at-home orders, cloud tools for ML will become more prominent.
In Sudjianto’s opinion, testing is still one of the biggest gaps: “People talk about counterfactual testing, robustness testing—it’s still in the academic world...in the real world, it’s not scalable.” Institutions need to train individuals to be the equivalent of reliability and safety engineers for ML, and they also need the tools to operate at speed and scale and detect failures ahead of time. As Sudjianto said, “Monitoring cannot be passive anymore.”
Agus Sudjianto, EVP & Head of Model Risk, Wells Fargo
Jacob Kosoff, Head of MRM & Validation, Regions Bank
Sri Krishnamurthy, CEO, QuantUniversity
Tulsee Doshi, Product Lead, Fairness & Responsible AI
Krishna Gade, Founder & CEO, Fiddler
P.S. We built Fiddler to fill in these tooling gaps and help teams build trust into AI. Teams can easily import their models and data sets to Fiddler and have continuous monitoring and explanations for their models, creating a system of record for ML in production. As the responsible AI space continues to evolve, we’re very excited to share more on this topic. If you’re interested in seeing what Fiddler can do, you can sign up for a free demo here.