Root Cause Analysis at Every Step of Your AI Agent Workflow

Discover how Fiddler enables you to drill down from high-level metrics into individual traces and spans for complete visibility into your agentic applications.

Generative AI and LLMOps

Table of content

Root Cause Analysis at Every Step of Your AI Agent Workflow

‍

In this demo, we show you how to navigate from aggregate KPIs to trace-level analysis, helping you quickly identify and diagnose issues in your AI agents built with LangGraph and instrumented with OpenTelemetry.

What you'll see:

Navigate from aggregate metrics to individual trace logs in one click.
Examine the complete agentic hierarchy from system prompts to tool calls to final outputs.
Track faithfulness, prompt safety, PII detection, and answer relevancy across your application.
Perform root cause analysis by inspecting inputs, outputs, and evaluator scores at each step in the chain.

Video transcript

[00:00:00] Hey everyone. My name's Kevin, and I'm a solutions engineer here at Fiddler. Today I'll be walking through how you can use agentic monitoring to drill into traces and spans from the KPIs and dashboards that you care about.

[00:00:14] Here we have a demo agentic chatbot that we have created using LangGraph and is being instrumented using OpenTelemetry, and we can start to see some of the traffic activity around different LLM span calls and tool span calls, as well as faithfulness score aggregations. And we can start to see faithfulness scores across time and prompt safety scores as well across different metrics. And these are configurable charts within Fiddler that you can use and create to monitor your agentic application at a high level.

[00:00:45] Similarly, we can look at things like PII, which is a very relevant evaluator for a lot of our customers as well as answer relevancy. And if at any point you want to drill into these charts, you can simply click into an area in the chart, for example, right here. Faithfulness is a very important metric for a lot of our customers, and we can start to see a lot of the spans that have aggregated into this score for faithfulness.

[00:01:12] We can see these are all LLM focused spans because that's the filter that we've set, as well as the input and outputs for each span and the actual score for faithfulness. And if any of these particular spans look interesting, you can actually click into the trace view and see the entire agentic hierarchy from start to finish.

[00:01:32] And here we can see for this particular span, we have a system prompt and a user input as well as an output for the span, as well as some context it used. And if we expand the output, we can see that this made a tool call and we can actually see what that tool call is by clicking into the next span and see that tool's input as well as its output, whether it was a success or not. And we can see the entire flow, including all the tool calls, all the model calls, as well as the final output for this actual chatbot.

[00:02:09] And if we click into the final output, we can again see the same system prompt. A different output here and here on the bottom we can scroll to the evaluators and Fiddler has built-in evaluators as well as customizable evaluators that you can use within your agentic monitoring for each individual LLM span.

[00:02:29] So here we can see the same kinds of metrics around answer, relevancy, prompt safety, and other metrics that you can configure within Fiddler. And you can use these metrics to ultimately determine if this is the kind of output that you want from your chatbot. And if not, you can do root cause analysis across every step in the chain to determine what you need to modify.