Fiddler for Governance, Risk, and Compliance: Comply with AI Regulations

Table of content

Discover how the Fiddler AI Observability and Security platform provides a comprehensive observability solution for GenAI and machine learning systems to ensure governance, transparency, risk management, and compliance with the EU AI Act. This video explains how to inventory high-risk AI applications, track performance metrics, manage data quality, and conduct root cause analysis for generative and predictive models. Learn how to monitor trends and customize metrics for safety, fairness, and transparency in GenAI and ML applications.

Fiddler for Governance, Risk, and Compliance: Comply with AI Regulations
Video transcript

EU AI Act & Fiddler AI Platform Overview

[00:00:00] ​The EU AI Act mandates that all AI applications and systems are transparent and continuously monitored for risk, safety, fairness, and other metrics. The Fiddler platform offers a unified observability solution for AI and machine learning systems and models so that you can have transparency and monitoring of all performance, and also manage risk and data issues, and collect evidence or logs so that you understand when and why certain issues can arise in high risk AI applications.

Setting Up High-Risk AI Applications

[00:00:34] Let's look at the Fiddler platform to see how we achieve this. First, it is very important to have an inventory of all of your high risk AI applications. We can set this up in one of our projects for all of the potential high risk applications that can include both traditional predictive machine learning models or generative AI applications, such as LLM chatbots.

[00:00:56] This offers us the ability to have a model card with a description of what the model is doing, what are the features and data types, as well as the outputs and targets for this particular model. Once we have an inventory of all the AI applications and machine learning models, we can create a dashboard that gathers evidence and collects them to align with the EU AI Act and the certain metrics that are necessary.

Creating Dashboards for Compliance

[00:01:21] We can track high level generative AI application metrics, such as unsafe prompts detected, a number of hallucinations, percentage of jailbreak or prompt injection attacks, or even an aggregate custom cyber risk score that you can define and customize in the model definition. Similarly, for predictive machine learning models, we can track bias and fairness metrics such as disparate impact and group benefit.

Metrics for Generative & Predictive Models

[00:01:43] We also offer alignment with the EU AI Act for monitoring over time, so you can understand the trends, the accuracy, and robustness of performance for both your predictive machine learning models, but also your generative AI systems as a whole. In terms of understanding root cause analysis and audit evidence collection for trends and behavior, we offer comprehensive root cause analysis to provide a complete log of all of these events that were entered and consumed for the machine learning model application.

Root Cause Analysis and Data Quality

[00:02:12] You can also track data quality and integrity for a risk management perspective, where any violations in the data type or data range can be alerted within the platform to diagnose, and alert for any issues with your GenAI application or machine learning model. Similarly, for safety, bias and fairness metrics, any of these can be customized, but also tuned to align with the EU AI Act stipulations.

Transparency in Generative AI Applications

[00:02:38] For generative AI applications, we offer transparency and monitoring of any prompts and responses that are entering the application itself. We can visualize these in a comprehensive 3D view so that you can identify, get root cause analysis, and a full audit trail into any problematic prompts or responses.

Example of Monitoring and Analysis

[00:02:56] Let's look at an example here, so you can understand why and where this actually happened in the AI application conversation. We can dive into the full thread of the source documentation used to generate the response, the actual prompt and response itself, as well as any of the customizable metrics that we can configure and scoring to provide monitoring transparency for this application performance and response.