Fiddler for Coding Agents: Control Every Line Your Coding Agent Writes
See how developers use Fiddler to create custom evaluators for coding agents directly from the terminal, then visualize agent behavior trends on a live dashboard.
Coding agents are shipping code autonomously, but most teams have no visibility, context, or control into what those agents are actually doing. In this demo, we show how the Fiddler AI Observability and Security platform extends into the coding agent workflow, giving developers the tools to monitor, evaluate, and govern coding agents.
What you'll see:
- Query coding agent data and diagnose individual traces and spans from the terminal.
- Create a custom evaluator using a prompt-based LLM judge to score any behavior you define.
- Surface trends by backfilling historical agent sessions with the new evaluator.
- Create evaluation reports on the Fiddler dashboard and identify spikes in agent behavior over time.
[00:00:00] If you've been using coding agents for a while, you've probably run into the issue where the model gets stuck on a problem and you've prompted something like, 'same issue, try again,' or 'still not working.' I'm gonna show how we use the Fiddler app to track issues like that so we can dive deeper into them, know how much they're happening over time, and generally use that information to improve our agentic experience.
[00:00:27] So first, I'm going to pull up the app where we are logging all of our OpenCode, Claude, and other coding agent interactions. 'Please load this app.' Now that that app is in context, I'm going to tell it to create the evaluator so 'please create an evaluator that detects when devs say "same issue, try again" or otherwise indicate the model is stuck.'
[00:01:06] I'm gonna tell it to 'use Gemini as the custom judge.' Okay, so the agent created the evaluator, it tested it against some sample messages. Some real interactions between our devs and coding agents. And now it's offering to create the rule just to finalize and it's gonna backfill some data so that we can see how this stacks up over time.
[00:01:33] And I'm just gonna tell it 'when you're ready, go ahead and create a chart and pin it to the dashboard.' That way I won't have to do it through the UI. Cool. So the agent did successfully make that chart for me. I've got it here so I can see some actual interactions where the user was stuck. From here, I could drill down and I could see the whole chain of events.
[00:02:04] I can see which model was used. I can see the reasoning why the agent thought this was a stuck example, and then we can use this information to feedback to our agents.md and our own prompting and processes so that we can improve how we code with agents.
