Lessons Learned From Building Agentic Systems With Jayeeta Putatunda
In this episode of AI Explained, we are joined by Jayeeta Putatunda, Director of AI Center of Excellence at Fitch Group.
She discusses essential lessons learned from building and deploying AI agent systems, including challenges in moving from concept to production, key evaluation metrics, and the importance of observability and guardrails in ensuring reliable AI systems.
[00:00:06] Krishna Gade: Good morning. Uh, good afternoon everybody. Welcome to another, uh, webinar on AI Explained.
[00:00:13] Krishna Gade: Today's topic is on Lessons Learned from Building Agent Systems. I'm Krishna Gade. I'm the Founder and CEO of Fiddler AI. I'll be your host today.
[00:00:24] Krishna Gade: We have a very special guest today. Uh, Jayeeta Putatunda, a director of AI Center of Excellence at Fitch Group. Jayeeta is a very, you know, accomplished AI uh, leader. She's a rising star in this field.
[00:00:38] Krishna Gade: Her achievements include the AI100 Award in Generative AI, uh, recognition among Top 25 Visionary Women in FinTech AI and selection as one of the 33 global advisors for NVIDIA's Enterprise Platform Advisor program. Jayeeta, you know, please, uh, join us on this program.
[00:01:00] Krishna Gade: Thank you very much. Thank you for the kind introduction, and thank you for having me here today.
[00:01:05] Krishna Gade: Awesome.
[00:01:06] Jayeeta Putatunda: Excited to talk a bit about AI.
[00:01:09] Krishna Gade: Absolutely. Jayeeta, I think, you know, uh, you are in the thick of things as in the field of AI, uh, uh, in generative AI now agent systems and, uh, you know, I think as the world is moving in the world of AI agents, um, what is the biggest reality check when you first moved your proof of concept to a, a live production agent. You know, uh, if you can share any interesting insights, that would be great.
Reality Checks in Production Deployment
[00:01:35] Jayeeta Putatunda: Yeah, I mean, that's a great point, right? Everybody and every now and then, every day when I wake up, I see there are five different processes, three different frameworks, and. When we are in this space and in the tech, the goal is always to build something that that is new.
[00:01:50] Jayeeta Putatunda: That's better, that works the best, but how? How do you really measure better? How do you really measure if what you have in place right now, does it need an upgrade? If it needs an upgrade, what are the baseline metrics that you're really evaluating it against? So my answer to that would be twofold, right?
[00:02:05] Jayeeta Putatunda: First is, I think it's always been there in the software development lifecycle, the 80 20 rule. The most of the time when, and I am guilty of that too because I am a data scientist, uh, by profession and I've been in the field for 10 years. My goal, I'm very, I get excited every time I see a new model is like, I have to implement that and implement that with a new framework and see how it works.
[00:02:26] Jayeeta Putatunda: But I think as in when you think more from not only like your developer perspective, but more of like the business function that how are you really adding value to the business? Is the, there is the ROI worth it. You kind of do that 80 20 rule again.
[00:02:39] Jayeeta Putatunda: 80% of the focus should be on, uh, all those use cases where like they say low hanging fruit, but that doesn't mean that it's not important. It means that it has a bigger impact and your impact to effort ratio is like good. And you can't take that on rather than spending, I don't know, six months just prototyping. And by that time the framework has already been obsoleted and we've been seeing it just talking about it, right? Like some models came up six months ago and now they're like extinct from the scene. And we are like, okay, what are we really doing here? So that's my first take.
Handling Model Hallucinations and Accuracy
[00:03:12] Jayeeta Putatunda: And I think the, the second biggest lesson I, I feel is that most of the time, and it has been true even during like when we were building predictive ml machine learning models, but it was easier to handle then because our systems were deterministic and we had like a good set of historical data that we were comparing it against and comparing one is to like, like apples to apples.
[00:03:31] Jayeeta Putatunda: Now it's more like the generative AI systems are giving so much output. How do you really categorize it? How do you really, uh, define your matrix? And not just matrix that, oh, it gave me productivity gains. What productivity gains? Is it, say how saving, how much of your dollars, how much of your developer's time? Or is it like adding, um, I don't know, like reducing cycle of completion.
[00:03:54] Jayeeta Putatunda: So very specific methods of evaluating of why you are really doing an agentic AI, um, application or building it through. But yeah, I feel like if these two processes are nailed at the head before you begin your process, the chances of success definitely increases multifold.
[00:04:12] Krishna Gade: Absolutely. So before we dive in deeper, right, can you give us an overview of the type of agent tech systems and products you have built recently and deployed?
[00:04:19] Jayeeta Putatunda: Sure.
[00:04:20] Krishna Gade: What are some of the use cases? What are these systems trying to solve?
Types of Agent Systems and Frameworks
[00:04:23] Jayeeta Putatunda: Yeah, I mean there are, and so a couple of things, right? There are two ways of defining what kind of agent solutions you would build first based on your use cases. And second, based on the agentic AI frameworks that's available. Uh, by frameworks I mean like, is it going to be a simple react model where it's like reasoning and doing some action for you, a one-to-one output, or it's going to be more of a reflection based model where you're also auto optimizing on the spot, uh, of, of whatever the output you are, uh, you know, outputting based on some business rules or like some evaluation criteria that you already have from the, the business side.
[00:05:00] Jayeeta Putatunda: The second part is I think we also have to figure out that where is that high ROI agentic pattern, like what are the use cases? Some use cases. And especially in finance, I still believe that we are not at that space where agents can really, and this is my personal opinion, uh uh, based on like what I'm seeing in the industry overall, the kinds of data that I'm handling.
[00:05:21] Jayeeta Putatunda: Is that the highest autonomy use cases or? The patterns is, we are not ready for that. I don't think we will ever be ready because we are also very highly govern and it's, it's for the good. I feel like there has to be some layer of credibility and responsibility, accountability that needs to happen. Uh, and, uh, the entire autonomy pipeline, I don't think we are gonna go there.
Use Cases and Business Value
[00:05:43] Jayeeta Putatunda: So how do you, how do you find a real balance and find some use cases that that is really, uh, giving you a lot of, uh, time savings and your analysts are freeing up their time to do some more high value analysis versus, I don't know, like going and reading a 500 uh, page PDF, like for and spending like three hours on it.
[00:06:03] Jayeeta Putatunda: Rather than if you can build a system or like an agent, TKI pipeline where you are processing the data the right way, you have a interface for them to really query conversation and then find exact places of information that they're looking for. That's much high value. So I would say. Yeah, use case driven.
[00:06:19] Jayeeta Putatunda: Um, uh, uh, some use cases that definitely comes into mind is like maybe report generations having some kind of voice and templatized, uh, uh, models ready and then utilizing some of these agentic AI frameworks to see that, how do you really hone in on the, uh, optimization side and output something that's really aligns to what business wants.
[00:06:39] Jayeeta Putatunda: Again, at the end of the day, whatever we are building or whatever anybody should build should align to what the business requirements is and not, not just for the sake of it.
[00:06:49] Krishna Gade: So, so you mentioned two different types of agent tech systems, right? One is more like a workflow like system where maybe somewhat of determinism is baked into it.
[00:06:57] Krishna Gade: And then there's like a fully autonomous agent that uses reflection and all. Can you, can you sort of go a little deeper into these two systems where you have found one better or the other then maybe get a little bit of internal yeah?
[00:07:09] Jayeeta Putatunda: Absolutely. Uh, again, like without. Going too deep or technical or like, you know, revealing a lot of the PIs.
Workflow Automation vs Autonomous Agents
[00:07:17] Jayeeta Putatunda: I think one of the biggest area where I feel a little process oriented workflow works is when, think about some RPA processes, right? Or any maybe the software development processes where we, where you were maybe, uh, calling three different APIs, trying to gather the data and then do some processing and then output in some certain way.
[00:07:37] Jayeeta Putatunda: We have all had those simpler, simpler tasks or simpler processes, right? Those processes, I, I do not think really needs a lot of, um, porosity from the models that the genAI usually adds. And it adds a lot of complexity, right?
[00:07:52] Jayeeta Putatunda: Like, why would I want to complicate my clean workflow that I had, but I still want to make it better. I want to still give it the option to call additional tools or additional memory just in case, so that it remembers the conversations prior or whatever that memory is. Right?
[00:08:07] Jayeeta Putatunda: So maybe that's a very good use case where you're following a workflow, but you're also giving a little bit more flexibility in terms of tool, ac, access to memory, access to different databases, and and then giving the user an end-to-end a better experience of working through that workflow. It's a new workflow for the user, but still with a little, I would say, augmented, um, features and capabilities to really make their life better first.
[00:08:34] Jayeeta Putatunda: And the second one, of course, like we said, uh, that is a little bit more where you are really trying to give the model or whatever the agentic system say.
[00:08:44] Jayeeta Putatunda: Say you're creating a batch of three systems and you want it to configure or like coordinate between each other and see what the output of the previous agent was. Say if it was an evaluation agent and you have a reflection agent, the reflection agent is supposed to take the output of the evaluation agent and then go and clarify with all the previous business rules you have inputted and see that, does this make sense?
[00:09:05] Jayeeta Putatunda: Are we avoiding by the rules? Did the output, uh, really follow all the, uh, processes, score it on that, and then feed it back again to make sure you're auto optimizing on, on the output. So here you have a little bit more autonomy, but again, that autonomy is driven by the set of rules that came from business and you are not just evaluating blindly if, if that makes sense.
[00:09:27] Jayeeta Putatunda: Right.
[00:09:27] Krishna Gade: So it seems like in the first case, you are constructing the workflow manually, so the routing is almost deterministic, but you are taking advantage of the model calls and tool calls to augment your existing business process automation. Yep. In the second case, it seems like the agent itself automates their workflow and it is, it is using reflection and evaluation to self-correct itself, right?
[00:10:21] Self-Reflection and Evaluation in Agent Systems
[00:09:49] Jayeeta Putatunda: Yeah.
Evaluation in Agent Systems
[00:09:50] Jayeeta Putatunda: I mean, it's planning a little bit by itself and seeing that it, at this stage, do I still need to optimize or am I good to go and just end the process and give the user the output that that they're waiting for, so that, that decisioning is still happening from the agent side and that's why I say maybe it's a little bit more autonomous, but again, everything is like checked end-to-end.
[00:10:08] Jayeeta Putatunda: So the, it's not really autonomous, it's just like, uh, uh, uh, somewhere, somewhere in the middle. Somewhere up there.
[00:10:15] Krishna Gade: Yeah. So, so in the second case in the autonomous agents, this concept of reflection is quite interesting, right? Because, uh, in traditional software, you, you don't have this, could you elaborate on what reflection is and how it helps autonomous agents.
[00:10:30] Jayeeta Putatunda: Absolutely. So I, I, I forget where I read, I was reading some, uh, uh, blog and I can share the link later. Is that how software is changing? Right? Like how the software development, like lifecycle as we know is like changing a bit because. Yeah, like you said, we were not using LLMs prior to this.
[00:10:49] Jayeeta Putatunda: LLMs are more non-deterministic. So how, what happens when you bring in non-determinism into a deterministic workflow pipeline? How many checkpoints do you need to measure? What do you need to measure for each checkpoints? And also, like, not really over-engineer it, otherwise it's gonna be like a too complicated system with five different agents where you really don't need that many.
[00:11:09] Jayeeta Putatunda: Uh, maybe it can just be one trigger agent and the rest of the workflow remains as is with the additional capability of tools, like I was saying. So when, when you use something like reflection, it's, it's mostly for the LLM to really critique itself. So underlying concept is still LLM-as-a-judge, but again, being like responsible builders, like as like consider myself to be, I really don't want to give all the autonomy or the decision making to my reflection agent itself.
[00:11:41] Jayeeta Putatunda: Because I think there was some studies also where, uh, we saw, and I, I know I'm, again, I'm, there's too many papers that's going around, right. I completely missed the headlines, but it's basically talked about how uh, uh, uh, LLM is a little bit biased towards output from another LLM and it can like figure out and then say, oh, this is better than something very similar, but written by a human agent, maybe in a different way. So how do you, how do we take all of these into consideration when we are building that system?
[00:12:10] Jayeeta Putatunda: So we have smaller checkpoints also, we have very specific business guidelines that the LLM is using. From the historical data as well as the current, uh, workflow data that the reflection agent is evaluating alongside the output from the previous, uh, eval component that we were talking about. Right?
[00:12:29] Krishna Gade: Yep.
[00:12:29] Jayeeta Putatunda: So again, I don't think there is like a, um, right way to do it. It's a little bit of trial error to see what works with the use case you're handling. All with the mindset of really making sure that you are, you are building a responsible system that should not be biased towards any particular system. And the truth should be, uh, yeah, following end to end process workflow, the best way it can.
[00:12:55] Krishna Gade: So it seems like self-reflection sometimes could be biased. And so a third party evaluation, third party reflection could be interesting and where it can encode all the business rules and, and sort of, you know, judge the system, right? So that, that sort of brings down to like, how do we test these agentic systems?
[00:13:12] Krishna Gade: Because in traditional software you have deterministic inputs and outputs. You can test for this, you can write all the. Your TD like test and development. But as you said, agent tech systems are inherently non-deterministic. Mm-hmm. Have you, you know, sort of thought about this approach of, you know, testing and validating this, these systems?
[00:13:30] Jayeeta Putatunda: Yeah, so, so I, I think it has to be done in stages. Like it doesn't matter. And since we are, again, like I said, highly goverance, matrix, and observability is a big thing that we internally do like with a lot of priority and a lot of focus, and that is something also that I really appreciate, but how do you really get started to make sure your eval components will be ready?
[00:13:52] Jayeeta Putatunda: You have to have that, I call it the data prep tax because if you're, I call it the tax because it's a lot of, again, data data preparation issue making data, like they say, AI ready, it's everywhere, right? Everybody's talking about how do you really make your data AI ready. That means that using that data for building your systems as well as keeping that data ready to evaluate your systems.
[00:14:14] Jayeeta Putatunda: So there's no other, there's no magic pill here. I think data prep tax, like we would like to call it, uh, really needs to be the focus for. Any business or I think it's more important for like bigger and legacy systems where data is like really like unstructured or unstructured, meaning it's in very different places.
[00:14:35] Jayeeta Putatunda: You need to bring it together, make sure they align together. They have some kind of lineage as well as versioning. And that is how you really track. Uh, I, I, uh, I was speaking to somebody, um, in our like industry and I think this, this concept of versioning, like how do you really version prompts as well as how do you really version eval.
[00:14:57] Jayeeta Putatunda: So eval outputs, so treat it as like, say API. How do you really, uh, when you use API, you version it, you make sure that with every upgraded version you have like a different set of test cases ready. Similar for like anything that goes into your system, maybe prompts the data, the system prompts the business rules.
[00:15:16] Jayeeta Putatunda: The voice or style prompts, all of that needs to be versioned and then output it against, uh, how is it changing? Depending on the model, depending on if you add an external tool, if you add an external step in reflection, and that has to be like an end to end view for you or like the developer, uh, and the business leaders to kind of really, uh, end to end.
[00:15:39] Krishna Gade: You know, we, you know, we all, coming from data science ML background, we have seen how to evaluate ML models. You can create confusion matrices, measure precision recall accuracy scores. Right. Now as we move to the world of agent systems and agent AI, how does evaluation change here? You know, what do you, how do you need to evaluate these agent systems?
[00:15:59] Krishna Gade: Can you give brief overview of all the type of metrics and things that one needs to measure?
[00:16:04] Jayeeta Putatunda: Yeah, yeah. Uh, absolutely. So. Again, I would like to break it down into four or five different components because I don't think one, there's one magic evaluation, uh, uh, metric that you need to track. Couple, right?
[00:16:17] Jayeeta Putatunda: Like, like I said, traceability. Do you really have enough logging matrix for all your. Calls all your tool calls if the output of the tool calls were really correct. Uh, is there a way that you can, uh, compare the links? Like say if you're doing a deep research agent in one of your step and it's trying to find some links from the web and it suggested the user links, is there a way for the system to evaluate if the links are correct as well as like top rated, uh, and not like some, you know garbage or like low quality, uh, links that came up in the search.
[00:16:49] Jayeeta Putatunda: So that's more for the traceability and logging. Second is like, how are you really using the models? Because again, models doesn't matter how cheap it gets, at the end of the day, it starts compounding. When you are building a complicated agenting system, it's not.
[00:17:04] Jayeeta Putatunda: About only one model call. It's about multimodal multi-model calls in different layers and sometimes at the same time. Because if you're initiating like a, say three different agents, it's calling two different tools you are going or routing that through some uh, uh, models to really get the output of it.
[00:17:20] Jayeeta Putatunda: So how do you do track? Like track the token usages response time. Were there error rates? How many time your, uh, processes failed because of a modeling issue, or is it because of like, it failed to generate a response out out of that issue? Both are important, right? Like otherwise you are not setting up your system to be like a really good.
[00:17:40] Jayeeta Putatunda: User experience for your customers or whoever you are, um, opening that up to. So as well as like, there's ton, tons of others, like how do we do drift detection? And then all that still maintains, but there are too many other components that's now equally important, especially from the infrastructure, uh, maintainability and infrastructure Observability as well.
[00:18:01] Krishna Gade: Awesome. And so when you diagnose these failures, right, how do you, you know, what do you need to, like trace through or debug the system failure so that you. Can understand if things are happening because of tool calls or like model, model issues.
[00:18:14] Jayeeta Putatunda: Yeah, I think, uh, I think I, I don't have the right answer to it, but it's, it's definitely work in progress and what I have seen working is having checkpoints at every point.
[00:18:24] Jayeeta Putatunda: Like I said, like if you're building one agent and that agent has five different steps that it's supposed to take, supposed to take after each stage, there has to be like some component of logging that it is doing that, okay, this was my input, this was my output, this is what I called, and this was the response.
[00:18:40] Jayeeta Putatunda: Sometimes it might. Get too much of an information to track, but it's worth the work, at least initially when you're setting up for your use case. As you become mature and you get to like understand a little bit of the workflow that you're building for, you can tone it down a little bit, but I think at the end of the day.
[00:18:58] Jayeeta Putatunda: The goal, like nobody said too much data is harmful. Like everybody said, scars of data is harmful. So logging too much is my way to go. I, I log as much as possible even sometimes when it's not required. But you never know when you can find some, you know, gold mine an um, idea or like some kind of thing that you're not thinking through, uh, from the data you log.
[00:19:19] Krishna Gade: Got it. And where does like observability fit in here? Because, uh, you know, traditionally observability, you know, has to deal with, you know, more deterministic software where you are measuring reliability, latency, throughput Yeah. You know, server utilization, right? What, what do you think about observability in the context of agentic AI?
[00:19:39] Jayeeta Putatunda: Yeah, I think that definition still holds true. I will, I will not take it away at all. Uh, but there are additional. Additional angles to where you would define observability. One thing that I really feel, um, excited about is that, at least in, in the finance industry as much as I've seen observability is no longer like a afterthought, like it used to happen in most of the initial ML spaces that I worked in.
[00:20:05] Jayeeta Putatunda: That, or you first build the model you have some pipeline, and then you start measuring and see what matrix you want to measure. That's not how it really works and that's not how you should do it. You start with the matrix. You start with the steps of where you are logging, and like you said, all the things that you said is still relevant, but there's also how do you really bring in effectively a human in the loop for multiple angles to observe a pattern of output?
[00:20:30] Jayeeta Putatunda: Make sure that pattern makes sense, because if you really want to scale a system. There's no way you can scale it with like say two or three human in the loop reviewing everything, say thousands of document data that you have extracted. But there will be pattern when you analyze the extracted data to see are there specific, just an example, right?
[00:20:49] Jayeeta Putatunda: Are there specific indicators that have seemed to have failed the multiple times for the same type of documents? That's the pattern you're looking for so that you know exactly either your, it's your model that's failing or maybe the. Kind of extraction you were doing just an example, for example, workflow, the, the kind of extraction you were doing.
[00:21:09] Jayeeta Putatunda: Maybe it's failing. One. I I can also share a learning that I had, uh, recently, is that some of the huge financial documents you will see is. Highly it, it's text, it's text driven, but it also has like infographics tables that looks like tables with's actually an image. So how do you really bring together the insights from a text based, text-based extraction and from like a table as well as info, uh, infographic and make sure they all align.
[00:21:36] Jayeeta Putatunda: And the summary of all that extraction really makes sense and tells the story correctly. And there's not any no specific data that's getting lost or you're not losing anything in, in, in translation. So. Yeah, like at, at every stage where you are bringing in, um, uh, the Observability factor, be it the human in the loop, finding the matter, uh, the, the pattern, and making sure the end to end story that you are building or like the solution that you're building really aligns with the expected output.
[00:22:06] Jayeeta Putatunda: And then you build on top of it slowly to really add, add more color to it, but start simple. Yeah.
[00:22:13] Krishna Gade: Yeah, makes sense. Especially as you kind of described this agentic workflow, which mixes. You know, structured and unstructured, right? Mm-hmm. And, and, uh, when you are in a financial domain, uh, uh, you cannot hallucinate on numbers.
[00:22:26] Krishna Gade: You know, you cannot pad an extra zero for a mortgage rate or, or, or, or like, uh, or some sort of, uh,
[00:22:32] Jayeeta Putatunda: intolerance Yeah. Tolerance there.
[00:22:34] Krishna Gade: Uh, how, how, how do you deal with it? Because this is a, actually a big problem that a lot of us are facing, right? There's this non-deterministic beast. And you're trying to control and, and put a deterministic layer on top of it.
[00:22:46] Krishna Gade: Um, how do we, how do we make sure that this has hallucinations or these things, you know, monitored and observed?
[00:22:52] Jayeeta Putatunda: Yeah. So that's why I said, right, like at least for the financial use cases, I do not think that. The current LLMs or like the large language models way they way they are built and how they output, we all know they are like all predictive token based.
[00:23:08] Jayeeta Putatunda: They're not really understanding the quantitative numbers and that's why we've seen, I'm sure you have seen those, uh, funny LinkedIn posts about is 9.9 bigger or 9.11 bigger and then you see all the wrong answers. That's why it's happening. Right. But I think the models are getting. Better and better as.
[00:23:24] Jayeeta Putatunda: As we are, uh, uh, making a little bit of infrastructure change and that this is where the, the business. Way of building a solution comes into play. This is your system design. It's not necessary that you really apply LLMs for all your steps. Yes, of course. For things like that's highly, uh, uh, time consuming, like really extraction of the data.
[00:23:45] Jayeeta Putatunda: You do a first pass with the LLMs. You have your own predictive models that I'm sure all companies have built, especially if they're a legacy company in the space that was doing this work prior to. The LLMs came into the scene, right? So why, why are we like moving completely outta that space and not really building a, a hybrid space or a hybrid model that's taking advantage of what we have been doing for so long?
[00:24:09] Jayeeta Putatunda: Use that as our learning curve. Uh, use that as our, I would say a training material, or I would say some. Maybe if I can say it in this way, maybe a few short based learning methodology from what we were doing with our predictive models. Yeah. And help leverage and ground our LM outputs. Again, there is no one easy way of doing it.
[00:24:29] Jayeeta Putatunda: It's, yeah, a lot of trial and error, figuring out where we are, our systems are failing, which indicators are too complicated for the system to handle, and then bringing in. Either SMEs or our previous generation of models to help us guide through that. So yeah,
[00:24:43] Krishna Gade: in other words, you're saying, uh, the classical machine learning models still need to exist and you need to layer on an agent, skin on them and try to ground your generative AI outputs on the predictive, you know, what predictive models are suggesting.
[00:24:58] Jayeeta Putatunda: There's a lot of value with sound predictive models that I've seen the financial industry usually work. There's a lot of work going on, and especially now with the power of LLMs, uh, knowledge graphs have again, come into the space, right? Like it's getting a lot of more traction because now it's easy to.
[00:25:13] Jayeeta Putatunda: Easy to stand it up and maintain with the help of the LLMs, uh, compared to prior to this as well as a lot of work is happening on the causal AI side, which is also a domain I'm highly interested in. I'm trying to like, find and do as much work as possible in there is that, how do you really ground your, um, non-deterministic outputs from LLMs with like the causal analysis that.
[00:25:36] Jayeeta Putatunda: I am sure you have already done, or your econometrics team or your statistical team have been doing prior to this. Uh, and then utilize that to gauge your level of correctness, if I can. Uh, it's not accurate, but correctness, how correct are you and where are the areas of um, uh, yeah. Most of the gaps.
[00:25:56] Krishna Gade: Absolutely. So there's a question from the audience I'd like to take at this point. Mm-hmm. Um, someone is asking, our agent system works great in dev mode, but we keep hearing production is different. What specific failure mode should we be preparing for so that, you know, te our testing isn't catching?
[00:26:13] Jayeeta Putatunda: Okay, so it's basically, could you repeat that? So it's basically saying the,
[00:26:16] Krishna Gade: Basically it's an, you know, it's the gen system, you know, you evaluate it, it works fine on your test cases, but when you get into production Yeah. Then you in and all kinds of noisy inputs, it seems to be having reliability issues.
[00:26:27] Jayeeta Putatunda: Absolutely. So this, this, this happens, right? Like when either we are not working, we meaning the engineers and the technologists or the developers are not working close enough with the business to really understand their edge cases or what their clients are looking for, what kind of questions or kinds of dataset that might come in that we have not tested in the system.
[00:26:46] Jayeeta Putatunda: That's number one. Like I and I have seen this multiple times with my prior organizations as well as that. I am guilty of that too. Like I said in the beginning, to really start building the moment we see a new architecture or a new process and say that, okay, look, I have 25, uh, tested data sets or like, even a hundred.
[00:27:05] Jayeeta Putatunda: It works absolutely amazing. Let's put it into a, a prod. Before that, did you put it into qa, open it up to your beta testers? Did you get like a handful of sample people to really push it to its edge and figure out where it's breaking? And with all those Observability checkpoint that we talked about, that is how you really catch what, what your product is missing that maybe your PM has think about before, or your developers haven't considered in the edge cases and you really refine that with your PET test.
[00:27:32] Jayeeta Putatunda: I'm a hundred percent sure, even if you're going down the. Predictive analytics route, there is always something that you miss in production, and this is how you really fine tune your, or tune iteratively your, uh, application, software application or whatever application you're building. So yeah, never release the first into prod.
[00:27:50] Jayeeta Putatunda: Make sure you're releasing in QA, dev have open it up with beta testers, uh, test them. Ask them to test it and push them. Push the application to its max and that's how you really, really know about your application more and, uh, get, get your stakeholders buy-in to really support you in that.
[00:28:09] Krishna Gade: Yeah. So that sort of brings to a meta point, right?
[00:28:12] Krishna Gade: Where do you see the biggest gap today between, you know, current agent system capabilities as you, you know, build on top of these orchestration frameworks? There and then what actually enterprises need for a reli reliable production, deployment and MA and maintenance.
[00:28:28] Jayeeta Putatunda: Yeah. So multifold, right?
[00:28:30] Jayeeta Putatunda: Like I, I definitely feel, like I said, I, I, the, the frameworks are changing every day. So, so having one framework or access to one model shouldn't be your moat. This is the word everybody keeps throwing around the modeling model and systems are not your moat. The. Application you are building based on your business' input, the ROI you're trying to get, the problem that you're trying to solve, uh, the matrix that you have designed and the process workflow that you have.
[00:29:02] Jayeeta Putatunda: All that comes together and this kind of very currently ties back to that. Great article you shared, uh, Krishna about the compounding AI systems. Like I said, it's no longer that model's responsibility to get you the right answer and you cannot blame, oh, the LLMs are hallucinating. Of course they're hallucinating because they are built on like.
[00:29:21] Jayeeta Putatunda: Entire internet and it doesn't know the specific requirements for your task if you do not give it the right directions. The, uh, the config files full of system prompts that has very specific nuanced guidelines, a structured workflow to follow a structured workflow, meaning like a set of. Agents, if you're building an agent system that follows or have a scope defined for that agent so that they're not, they don't have the tendency to go, uh, out of the box.
[00:29:49] Jayeeta Putatunda: One funny thing I actually recently read is that somebody asked an agent that help me with this, and then it started garling out like thousands of lines of code because that's the way the agent was trying to help the user, right? But if you tell me that, help me do 1, 2, 3, by using 5, 6, 7, these tools, that is how you really build up.
[00:30:09] Jayeeta Putatunda: More aware and context aware, like this, a context aware system that you have a better potential of evaluating, handling, and observing, rather than just saying that, Hey, help me to solve this problem. So it'll help it like it can.
[00:30:22] Krishna Gade: Absolutely. Now, now we have all these, uh, agentic frameworks that it, they're being catered to business users, right?
[00:30:29] Krishna Gade: So where, where you can go and make these questions and, you know, shoot yourself in the foot. Yeah. Yeah. Awesome. So let's take another audience question here. Uh, so someone was asking our agent systems generate tons of logs, like reasoning, API calls memory access planning steps when something fails, how do you trace through all of that to follow the, you know, follow the logic chain?
Evaluation and Observability in Agent Systems
[00:30:50] Krishna Gade: How do you do root cause analysis?
[00:30:53] Jayeeta Putatunda: Yeah, yeah, absolutely. And they can return if you write the systems the right way, you can return like from, uh, some of my, uh, reflection agents. I have returned the context that went into the final reflection chain. I, uh, returned the chain of thought sometimes a chain of draft.
[00:31:10] Jayeeta Putatunda: Chain of draft is nothing but like a simplified, lesser number of tokens to use for chain of thought so that you're not spending too much on like, you know, uh, on those, on those processes. Then it also tracks like exact, um, uh, match points depending on if you have specific matrix. There is like if it's contextual, relevancy and all that industry standard type stuff.
[00:31:30] Jayeeta Putatunda: Uh, and then you can also add custom metrics. So I, I feel like this is where your creativity should come in as a developer. That what you really need, and that's why I said start building slowly, have three, four basic components. See where you're missing, go back, refine your evaluation criteria, and then make sure that you're continuously observing or you maybe say continuously monitoring.
[00:31:53] Jayeeta Putatunda: And that's how you really build a good quality it, developing solution, uh, rather than trying to build it all at once without like, uh, much thought into it.
[00:32:04] Krishna Gade: Yeah, that's actually brings a question. You mentioned custom metrics, right? So like, you know, there is no out-of-the-box way to measure. These things because like every ap, every application that you're building is, is sort of slightly different from the use case and so, you know, you, you may have to write, you know, tests or custom evals to measure those things, right? Mm-hmm. So what would be that development process look like? For example, let's say you, let's say you're trying to build a customer support agent that making a call. Mm-hmm. Ticketing platform and summarizing the, the activity.
[00:32:41] Krishna Gade: Mm-hmm. What are some of the basic stuff when a developer needs to do to get, just get that going in the first place?
[00:32:47] Jayeeta Putatunda: Yeah.
Best Practices for Agent Development
[00:32:48] Jayeeta Putatunda: So yeah, like I said, like if you're using a agent framework and if you're using all the components like tool, calling, memory, make sure you're logging each of those components. That's baseline.
[00:32:57] Jayeeta Putatunda: That's industry specific, or like by industry specific, I mean industry standard. That has nothing to do with your specific use case. When you start building your use case with, say, for a customer agent that you said, I'm sure you're working with somebody from the marketing or the sales or whoever, which departments are the, the.
[00:33:16] Jayeeta Putatunda: Partnership between business side components and developers have never been this important. Mm-hmm. Uh, now that we are into, like really going into this, uh, open sea of non-determinism, and there could be so much, there could be so much, um, negative or noisy feedback that for me as a developer without much business context, I can, I can maybe, I can maybe feed out the, the the junk from an initial view, but what could be junk for me may not be for the business side.
[00:33:46] Jayeeta Putatunda: That could have, they could have like some internal insights that I'm missing. So working with partnership with business is your best bet to start from the start. And that's why I feel, um, everybody has spoken to in the, uh, in the finance industry style, who is trying to build care systems first talks about that we have to make.
[00:34:05] Jayeeta Putatunda: Our, get our stakeholders buy in. And your stakeholder is your business counterpart. So they need to believe in that vision of what you're trying to build and how it'll make their life easier. That's how you bring them in. Describe the entire process. Make them understand what the, the key risks are and help get their help to really define those key matrix.
[00:34:24] Jayeeta Putatunda: There's like literally no other way that you can, uh, define this matrix by yourself being a developer, um, without much business context.
[00:34:32] Krishna Gade: Absolutely. And so, uh, the question is like, you know, the world has moved in the, in the last five, six years from MLOps to LLMOps to now AgentOps, right?
[00:34:41] Jayeeta Putatunda: Yeah.
[00:34:42] Krishna Gade: What are some of the, what are some of the
[00:34:44] Jayeeta Putatunda: AgentOps okay. Yes.
[00:34:47] Krishna Gade: What are some of the differences or commonalities that you've seen in these worlds? You know, uh, mm-hmm. You know, what are some of the learnings that you could probably derive from your LLMOps? Uh, systems to like now the agentOps. And what are some of the differences? That as well, when you do, we need to do differently.
[00:35:03] Jayeeta Putatunda: Yeah. Yeah, absolutely. I mean, some concepts still still have the exact same amount of value. Like are your, the basic first level is, is your system useful? Is it useful in the terms of being accurate, in the terms of being, responding to what the user is asking and not just giving it like five different links that.
[00:35:21] Jayeeta Putatunda: The question didn't even ask for. So yes, relevancy, accuracy, making sure the system is not breaking down or there's no up downtime is your baseline matrix that rolls over for each of these components that you said right now with more agent side. And there's actually a, a, a great paper and I can, I can share the link in the chat, uh, that recently I was, I was reading this yesterday night and I found it.
[00:35:47] Jayeeta Putatunda: Really, uh, interesting because it aligns. I'm not sure how I share a share in the chat. Maybe I can quickly
[00:35:55] Krishna Gade: yeah.
[00:35:56] Jayeeta Putatunda: Share my,
[00:35:56] Krishna Gade: yeah, there should be a way our team can enable you with that.
[00:36:00] Jayeeta Putatunda: No worries. I can quickly share my screen. Can you see the, can you see like a paper?
[00:36:07] Krishna Gade: Yep.
[00:36:08] Jayeeta Putatunda: Okay, so this is actually a great paper call.
[00:36:10] Jayeeta Putatunda: Why do multi-agent llms fail? And I was just looking through some, uh, failure categories that it has identified here. If you see the biggest one is like, say system design agent coordination. The hardest part is like, how do you really make sure one agent is working to whether the other agent the right way as well as task verification.
[00:36:29] Jayeeta Putatunda: So yes, premature termination. Making sure there is no incomplete verification or incorrect verification that these are like very new terminologies or new ways of thinking for your system. But, but there's no way you can like really avoid it because if you're building a multi-agent, different orchestration processes, these has to be taken care of in like one of those chains of loggings that you're doing and thinking through it.
[00:36:56] Jayeeta Putatunda: So yeah. Very newer, new, I would say newer, uh, uh, domain of like, how do we really think about the systems, but the baseline still remains the same, right? Like, is it accurate? Is it helping your end of business user? Is it giving you the right kind of answer that you're looking
[00:37:13] Krishna Gade: Absolutely. So that just tells you that there's no, is just no black or white.
[00:37:17] Krishna Gade: There's actually a big gray zone here because the in incompleteness actually is like a, you know, gets you into this big gray zone of. Uh, how to evaluate these systems. That's amazing That's really cool. So I guess like, you know, from a, you know, like let's say from an SDLC of an agent development perspective, you want to sort of just give a simple recipe for a new AI team starting on this journey.
[00:37:41] Krishna Gade: Like maybe five, six steps. Just like follow this recipe uhhuh, to deploy your first few agent DEC caps. What would that be?
Key Steps in Agent System Implementation
[00:37:48] Jayeeta Putatunda: Very interesting. Again, that recipe can be made in so many different ways, but if you ask for, like, say, the top three biggest components.
[00:37:57] Jayeeta Putatunda: First component is like, what is the expected output? Like what, what is the user, user problem you're solving and the expected output. From there, you kind of work back to see what you have in your system. So far. Is there data gap? Uh, if there is data gap, how do you fulfill that data gap? If there is. Process gap that is easy to, I think now with all the frameworks is it's easy to fill in the process gap with like.
[00:38:22] Jayeeta Putatunda: A lot of open source, like if you wanna build a, uh, workflow and chain, a couple different, uh, tools and models you can easily do with something open source like ran graph. But that's not the point. When you are putting it all together, do you know exactly where to put your checkpoints, who your SMEs are?
[00:38:39] Jayeeta Putatunda: Again, all the common themes that I highlighted comes back here as your recipe, that you need all those people in your team or to have your back. Otherwise, if you're. Building in a silo, I can guarantee you the, the, the product is never going to take off. And there you'll see like no adoption because everybody will be like, Hey, did you.
[00:39:00] Jayeeta Putatunda: Check in with me. This is not what I really wanted. So do your market fit? Like they say, uh, analysis really well. Make sure that you're really solving a problem that is a bottleneck and not something that you just want to build for uh, and use the right,
[00:39:14] Krishna Gade: defining the problem, uh, is the most important, right?
[00:39:17] Jayeeta Putatunda: Find the problems and the matrix that you are so that you can map at the end of the day, um, before you get started, and then you can keep hydrating on it. And then, um, make it more, I would say defined and colorful.
[00:39:30] Krishna Gade: Right, right. And then include all the stakeholders That's awesome.
[00:39:34] Krishna Gade: So I guess, uh, you know, let's actually take a few, um, audience question. Um, there's one interesting question. In your experience, have you seen a use case that made you go, this is awful for agents looking for a framework to think about what makes a good versus bad use case?
[00:39:53] Jayeeta Putatunda: Very interesting. So I actually.
[00:39:57] Jayeeta Putatunda: Very interesting. I actually did a, uh, um, uh, read a recent paper about, uh, how agents, human agents, and, you know, gen AI or AI agents, whatever, will come together to solve different kinds of, you know, problems. There's this concept of if it's, if it's low risk or, or the theme is if, if the, um, variance tolerated in the output is really, really low.
[00:40:26] Jayeeta Putatunda: That's not something you would really want to build in an autonomous system. By that I meaning exactly that same point you said, if you are building a financial analyst, uh, agent and it analyzes a trend pattern and gives you complete garbage output based on the spreadsheet document that it evaluated, do you really want to build an autonomous, uh, agent on top of that?
[00:40:48] Jayeeta Putatunda: Maybe the autonomous agent is helping on the initial stages of data extraction and putting it into like some templated format that will help you or any other predictive kind of process that you have in place to expedite that process, but end to end. Is that a good, good, good use case? I, maybe we are not there yet.
[00:41:08] Jayeeta Putatunda: Maybe in two months or three months or six months, I don't know. Like maybe we will get there, but not yet so far. Mm-hmm. So you really think through from the risk as well as. Uh, tolerance of fault or false tolerance for your, for your audience, uh, as your first grounding level that is that what you, something should do.
Building Stakeholder Trust in Agent Systems
[00:41:28] Krishna Gade: That's a good, good point. You know, variance in terms of being able to be happy with the Correct output. Yeah.
[00:41:35] Jayeeta Putatunda: Yeah. It's a q and a procedure. Sure. Like if there is a couple of changes in the line of my q and a, uh, conversational agent, I really don't mind. But if there is a factual error or like a trend analysis error where it's supposed to show high and it's showing no low or no changes, that's catastrophic in terms of reputation as well as, our entire business line.
[00:41:58] Krishna Gade: Yeah, makes sense. Uh, another audience question here. Um, our leadership asks how do we know agents that won't randomly break in front of a client? You know, it's basically they're building agents and, and we can give them accuracy metrics like we used to. How do you convince stakeholders to trust something unpredictable?
[00:42:19] Jayeeta Putatunda: This is, I think I. Industrywide problem. Uh, you start by educating them. And this is what, uh, with everything that I do like outside my day-to-day, even during my day-to-day work, my entire goal is that if I'm talking to somebody from the non-technical side or even from the technical side who are more from the predictive analytic side or more from the software development side, how do I.
[00:42:40] Jayeeta Putatunda: Add a little bit of extra information in any conversation I'm having so that they have some new perspective. They don't have to agree with me, but, so something like that. Okay, that's a takeaway for me to go and read up. Uh, and then I share some materials or I tell them, what do you think about it? So start having this conversation so that your leadership or your.
[00:43:01] Jayeeta Putatunda: Other side of the technical or non-technical stakeholders doesn't feel like their opinions doesn't matter, and you are trying to like kind of bulldoze and build on top of their systems. That's not how we're gonna. Make it productive and effective, but it should be more like really collaborative, trying to understand where they're coming from, where their fears are, and some of those fears are really legit.
[00:43:23] Jayeeta Putatunda: So you really don't know if you can, uh, stop an agent from crashing during a live demo. It happens all the time, but it happens in traditional software as well. I've seen so many software demos where the API calls just. Fails in the middle, or like there's some connection, error or, so it's part of the tech world.
[00:43:39] Jayeeta Putatunda: It's part of like building, uh, smaller components and making sure they're tight, evaluated, and, uh, you know, kind of gauged the right way. So yeah, start with maybe friendly, uh, education and upskilling and awareness kind of discussions and, and, and see how that. How that bonding goes there. There's really no other way.
[00:44:00] Jayeeta Putatunda: Uh, yeah.
[00:44:02] Krishna Gade: So I guess, you know, finally, to end this conversation, right? Like as the world is moving from deterministic software into this agent tech, non-deterministic software mm-hmm. How important are the things that we talked about in evaluation, Observability, guardrails, you know, where would you. You know, how do you think, how do you sort of think about it from an AI, you know, leadership perspective?
[00:44:24] Krishna Gade: Also from a developer perspective?
Future of Agent Systems
[00:44:26] Jayeeta Putatunda: Yeah, I mean, at least everything that I build, I try to make sure that I build in a way that I can prove if somebody wants to run that process again, I show them some lineage of information, even if there is like some change in ity or like some change in uh, uh, extraction process.
[00:44:44] Jayeeta Putatunda: But the key goal should be to show that, hey, look, this process works. It's making, um. The work that I used to do in like say, I don't know, just give an example, five hours now I do in like one or one and a half and my productivity boost has been insane. And look, I have three more ideas that have come up with that extra time that I have in hand.
[00:45:03] Jayeeta Putatunda: So it's, it's. It's evaluation is key to doing all of that, just to make sure that you are building the right way, thinking in a more modular structure, and making sure that each module and here module, meaning your data pipeline is one module. The output from that should be somewhere where you're checking the output quality, drift direction and all that stuff.
[00:45:23] Jayeeta Putatunda: Then your com agent components, or like the individual business, uh, uh, components that you have should have another segmented module. So yeah, build the, build it. In a way that you can break it down easily and then see where the gaps are. Don't build a monolithic legacy agent system end to end five agents calling three different agents.
[00:45:44] Jayeeta Putatunda: I don't think we really need that. Yeah, sometimes people really complicate stuff just because they want to build complicated systems. Sometimes I've seen simple agents with two different calls that they want to have a very specific task they want to fulfill, can do a lot and like save a lot of time. So think about the 20 80 rule.
[00:46:01] Jayeeta Putatunda: Find out the use cases that you want to prioritize that solves the most amount of problem, and, and really start there and make sure you evaluate as you
[00:46:09] Krishna Gade: Awesome. Thank you so much. Uh, I think, yeah, that sums it up, you know, uh, thanks for spending the time with us today. Uh, we'd love to see that paper that you'd, you know, shared.
[00:46:19] Krishna Gade: Maybe you can share a link with us. Uh, did I do that? And then we can share it with our, you know, webinar attendees later. Um, and, um, yeah, absolutely. This is a exciting time. And, and yeah, I think, uh, the paper is there for everybody to,
[00:46:33] Jayeeta Putatunda: I think it just went to post on the panelists, but yeah, you,
[00:46:37] Krishna Gade: Awesome. Okay. Uh, that's it for this week folks. Uh, thanks for joining us on the AI. Explain, and, uh, we'll come back with another great edition with another great speaker. Until then, you know, see you.
[00:46:50] Jayeeta Putatunda: Thank you.
[00:46:51]