Building Agents at Scale: Lessons from the Front Lines With Gary Stafford
In this episode of AI Explained, we are joined by Gary Stafford, Principal Solutions Architect at AWS Strands Agents.
He delves into how enterprises choose between AI/ML and agentic approaches, patterns for multi-agent systems, and the role of MCP. Gary also shares real-world use cases and practical guidance on safety, scaling, and delivering enterprise-ready agent systems.
[00:00:00] Wil Pong: Welcome and thank you everyone for joining us on today's AI Explained on building agents at scale, lessons from the front lines featuring AWS Strands. My name is Wil Pong. I'm VP of Product Management here at Fiddler AI, and I'll be your host today.
[00:00:20] Wil Pong: We have a very special guest on today's AI Explained, and that is Gary Stafford, Principal Solutions Architect at AWS Stands.
[00:00:27] Wil Pong: He's a seasoned technology leader, speaker and author, where he is applied his deep expertise in AI ML, data analytics, enterprise architecture, and software development in order to support a wide range of enterprises and partners across industries. Welcome Gary. I'm really looking forward to this conversation.
[00:00:43] Gary Stafford: Thank you that you just set a high bar.
[00:00:45] Wil Pong: Oh, I'm very excited to dig in with you here. Um, but first and foremost, maybe you could take a minute and, you know, share a little bit of your background and history with us. What should we know about you as we enter into this conversation about agents and AI?
[00:00:57] Gary Stafford: Sure. So I've been at AWS for about six years now. Six years In September. I started in the Bay Area with our venture capital team with startups, but have worked across enterprise, greenfield, what we referred to as digital native businesses. So I've time here with customers that are all in on the cloud, and then most recently in the ad tech space and the media entertainment space.
[00:01:20] Wil Pong: That's awesome. So, you know, a lot of, uh, range of different companies there, but I think one thing that we share in common is thinking about that sort of enterprise layer in particular and how we sort of bring AI transformation and adoption to that layer. So I'm curious, you know, just asking you this question, as lots of enterprises are exploring the idea of AI, how should enterprises decide whether or not a particular problem or challenge in front of them calls for an agentic or an AI type approach?
[00:01:50] Wil Pong: What kind of rubric or, or sort of process do you put these teams through to figure out what's right for them?
[00:01:56] Gary Stafford: Sure. I, I think it's interesting, and I somewhat unique with generative AI and agents and agent workflows. Oftentimes that is the reason, right? Their stockholders, their, their senior executives have told them that they need to develop a product or a service that has genAI, even if it's just for marketing purposes.
[00:02:14] Gary Stafford: That aside, which is a legitimate request from a customer, I think what we try to do is go in and understand the problem first. And I think of all the Amazon concepts that we have, I, I think working backwards is probably my favorite one.
[00:02:27] Gary Stafford: So If, if we're allowed, sometimes the customer's like, I have to use AI, let's just start there. But if we have the opportunities really work backwards and understand the problem, um, to, because to that point, I think it was the same with generative AI. Oftentimes these are solved problems, whether it's statistical analysis, data analytics, traditional machine learning, there's already solutions out there which are probably quicker and maybe cheaper for the customer.
[00:02:51] Gary Stafford: So I think we really seek to understand that problem before we suggest a solution.
[00:02:56] Wil Pong: So what I'm hearing from you is the impetus can come from one of two places. One is sort of a top down initiative, almost like a skunk works or a lab. Hey, we need to get in AI, get in there fast. We don't know what to do, but we have a great partner in AWS. Let's go lean on Gary, that that sort of pattern one, pattern number two is sort of, hey, we have these processes.
[00:03:16] Wil Pong: They could be customer facing, they could be back office facing. We know we want to bring AI to these processes in order to transform our business. So it's sort of like these two things that are there. Uh, I'm curious on the sort of top downside or the mandate side to your point. What, what do you see teams end up doing in that case?
[00:03:35] Wil Pong: Do they stand up centralized teams? Do they uh, sort of have one team be a Guinea pig and try something? What are some patterns you've seen on that side?
[00:03:45] Gary Stafford: Yeah, I think initially, and probably this might be just because at AWS we tend to be, you know, IT oriented, we're talking to, to technical folks.
[00:03:53] Gary Stafford: Oftentimes Those conversations, going back to even generative AI like 2023 started with with, with the technical folks. But I think as AI has expanded the importance of AI, of, of ag agentic systems have expanded. More often now we're talking to certainly somebody with a te, a level of technical understanding. But oftentimes there's a product development person in there, a product manager, and a business person. and, and oftentimes that's because we're trying to solve a larger problem or trying to add a feature to a product versus just trying to implement a new technology, like a new database engine.
[00:04:26] Gary Stafford: You're really talking about a new feature or a new product a new technology the company hasn't explored before.
[00:04:33] Wil Pong: Yeah, that's a great point. This isn't just a capability, it's actually now some kind of outcome you're trying to drive. Right?
[00:04:39] Gary Stafford: That's right.
[00:04:39] Wil Pong: It sounds like you have to be more holistic than just saying, okay, I've got this latest way that'll drive down your latency by this much. Just go implement it. Right.
[00:04:47] Gary Stafford: Yeah, absolutely.
[00:04:48] Wil Pong: So switching to that side then, some of the sort of use cases that we talked about, what have you seen as maybe.
[00:04:54] Wil Pong: Some trends or, or sort of, uh, repeatable things across the folks that you've worked with in terms of use cases that they'll, they'll want to explore using agentic technology.
[00:05:05] Gary Stafford: Sure. So I maybe just start, you know, internal facing outward and AWS has always had, you know, a strong area of expertise within multiple industries. But, you know, in the last few years we've really, uh, segmented and, and really focused on each one of those industrial segments.
[00:05:21] Gary Stafford: We love to lead with that and look for solutions that are specific to industries. the reality is, and I I don't wanna say the majority of the time, but let's say half the time the customer's looking to solve a problem that probably transcends that industry. Right. To your point, you said backend, it's a,
[00:05:35] Gary Stafford: it's maybe a mundane, uh, backend, uh, system or workflow that they have that if they can optimize that they reduce labor, they can reallocate that labor to, to maybe focus on their core business more. So I would say it's a combination of both. Both front customer facing products and features, but also just more backend task automation.
[00:05:57] Wil Pong: Okay. So let's take a half a step back then as we start thinking about solutions there. Um, we've both used the term genAI and agentic a little bit, but I'd love to know from your point of view, what's the difference, uh, especially for folks that are a little bit newer to these technologies. What does it mean to use genAI generally?
[00:06:18] Wil Pong: And then what are agents and how should we think about those capabilities being different?
[00:06:22] Gary Stafford: Sure. I, and I, I'll, I'll refer, I love Anthropic. I think Anthropics done a great job and agents, obviously defining MCP, and I love to reference their website and they have definitions, but when I think of an agent, I think of three things, right? An LLM, the ability to, to communicate with an LLM.
[00:06:40] Gary Stafford: Uh, the ability to use tools and the ability to reason, uh, in, in an agentic loop, right? So the ability to leverage. GenAI to to leverage a large language model, which is capable of, of, of a generative process, whether that's text or image, and the ability to reason about tools, and use those tools to derive an answer to a, to a question or a query.
[00:07:03] Wil Pong: I see. So just reflecting that back to you, I think genAI approaches, you know, even as we think about the last couple of years, here are a little bit more about that single turn of processing information, right? I'm a user asking a question of a chat bot, or maybe even if I'm a backend process, I take in some input and I give an output output.
[00:07:21] Wil Pong: But you're saying one, one big difference from agents, especially as philanthropic has helped us define, is the idea that that reasoning isn't just applied to the input from the user. But that reasoning is also imp, uh, applied to how to do the job. So reasoning about what plans to go make, or what tools to go use and et cetera, that that is what separates an agentic use case from sort of a, a more straightforward, LLM based use case.
[00:07:48] Gary Stafford: Yeah, I would
[00:07:49] Wil Pong: d does that sound right?
[00:07:50] Gary Stafford: It does. Yeah. I think non-deterministic. Right? The way in which it decides to use those tools is purely non-deterministic, or it determines the use based on the the input usually varies in that case, or it can be deterministic. Uh, and there's frameworks that have more deterministic, DAG based workflows. Um, but I think either way, where the agent is making the decision on which tools to use, I see that more as an agentic system where there's less determinism.
[00:08:15] Wil Pong: That makes sense. So let's tie that back to some of the business use cases that you've seen in the verticals that you've looked at, whether it's m and e or elsewhere. What, what are some particular use cases where this age agentic approach, where the sort of. The, the software itself is making plans and sort of figuring out what to do, where is that particularly useful compared to the, the recent world where we would, might have to code all of that logic ourselves.
[00:08:40] Gary Stafford: Yeah, I mean, I think the obvious one in this, this transcends AWS, and I think it goes for all vendors in this space, is really code development, code agents, uh, co-pilots. Coding assistance and being able to develop, um, what started off as, you know, a simple method or a class and now developing entire application, developing the testing, developing the documentation.
[00:09:02] Gary Stafford: Hero, more of a spec and development where you're writing the requirements and it's, and it's creating the, the process and the flow using agents to develop that.
[00:09:10] Gary Stafford: I think that's the obvious use case. That's probably the, and, and probably one of the larger use cases that I think the entire industry has seen. Um, I think beyond that, there's the back office task. There's the back office automation. Um, I have, you know, one, two or three humans in the loop with specialized knowledge. And this might not be their, oftentimes it's not their primary job, uh, but they're, they're required to perform some task and that's usually an, an obvious area of automation to free them up to do their, their value tasks. Then there's, then I think the third one is more product and, and product feature, customer facing. Uh, being able to add AI or agent capabilities to existing products or develop new products that have those capabilities working with customers to do that.
[00:09:54] Wil Pong: That makes sense. And, and let's dig into the sort of back office conversation a little bit more. You know, uh, traditionally you, you might argue that that world you just described is still broadly true whether or not there's agents, right? Uh, I always think about back office automation taking more stuff and putting it in code rather than having humans do rote work.
[00:10:15] Wil Pong: So, you know, how is this different than just using Zapier or using some integrations that sort of automate work generally? Right. Why do agents make the difference here compared to the tools we've had before?
[00:10:27] Gary Stafford: Yeah, and maybe I'll, I'll answer that with kind of the approach, and again, everyone has a little bit of different approach when they approach a problem or talking to a customer. And I come from a consulting background, so I think. Taking more of a consultative approach. It's understanding, you know, who are the humans in the loop, what are their tasks?
[00:10:43] Gary Stafford: Are they specialized? But I think specifically to answer your question, is there institutional knowledge there that needs to be captured? Um, so they're performing this task because they have particular knowledge that maybe no one else in the company does. Uh, and those are the type of tasks that I think agents can help with.
[00:10:58] Gary Stafford: Right. So it's non-deterministic. The inputs can vary. The problem, although the problem may have boundaries around it, the specific, uh, task can vary somewhat depending on the day, the week, the month, the particular scope of the problem. And then those humans in the loop are applying, um, a particular knowledge that they have specific knowledge to the problem.
[00:11:17] Gary Stafford: And those are usually the types of problems where I think an agentic system can, can help or assist.
[00:11:22] Wil Pong: I see. That's really interesting. Um, do you have an example that maybe you can walk us through, you know, you can feel free to sort of scrub it or sanitize it a little bit for our conversation, but, you know, help us understand, you know, what, what kind of use case where you've seen that institutional knowledge be able to be transferred to an agent and therefore automating work that needed a human before.
[00:11:43] Gary Stafford: Yeah, I'll, I'll, I'll, I'll use one and, and, and it's come up multiple times, so it's not specific to any customer. Um, but I'll use, I'll use the sales process. So whether that's reaching out to a new prospect or responding to, um, an inbound inquiry, or, or more or more often, um, let's say a contract is up for renewal or a customer's, you know, term, uh, for their contract is coming up. oftentimes that's very much dependent on the sales team, how they address that, right? They get a Salesforce, uh, a Salesforce report or whatever system they're using, and then it's really up to them who do they engage, how do they engage, what channels do they use to engage? oftentimes if there's a high turnover in a sales team or, or folks are moving around in accounts, which happens everywhere.
[00:12:28] Gary Stafford: A lot of knowledge of that customer. So we see that as an area. We see a lot of use cases where customers saying, I wanna leverage the knowledge bases that I have. I have, uh, different systems that I use to drive sales. I have customer 360 data. How can I use that more intelligently? And, and most often I'll stop talking is instead of reaching out to 20, 30, 40% of those customers, how can I touch a hundred percent of my customers using automation and agentic workflows, but do that intelligently, you know, leveraging, leveraging knowledge that we have on those customers or institutional knowledge across the sales team.
[00:13:03] Wil Pong: Yeah, that really resonates with me because you, you can think of it in extremes, right? The a hundred percent human piece. You might have a rocking sales team that has a really great playbook, they have a great regional sales manager or something like that, but they're gonna apply that playbook all the way across, right?
[00:13:17] Wil Pong: Med pick, or, Hey, at our company we do it like this and this, and we reach out after these many days. And you may actually, you know, lose some of the context, especially if I'm a new rep ramping up. I might do some of the same tasks that the last person before me just did, and now the customer isn't happy either.
[00:13:33] Wil Pong: Instead, you're feeling irritated, right? So there's a side where human is not enough. Uh, you need that nuance and history of how that account has gone. And then to your point on the other side, if we tried to do this with all code before agents came, the amount of branching logic that we'd have to write and how engineers have to be experts at the sales process, it becomes really brittle and that's why people don't end up doing it.
[00:13:57] Wil Pong: There's a long tail of automation of if this then that at three or four or five different levels that are just really painful to do with traditional software. Now you picture flipping this. Give an agent instructions like you would a new rep and give it access to your CRM and all this information. Now you can say.
[00:14:14] Wil Pong: Look, this is the last time we did a touchpoint with 'em. We haven't done an EBR in a while, and my playbook says to do that every quarter. And by the way, if, if that's already happened and somebody's reached out in the meantime, so on and so forth, you give it those instructions and now the agent's doing that job of creating its own branching logic and an engineer doesn't have to write that out ahead of time and maintain it if anything changes.
[00:14:37] Wil Pong: So yeah, that really resonates with me. Um, in terms of back office work.
[00:14:41] Gary Stafford: And that really transcends every industry. Every industry that, that need no matter what they're, what industry they're in.
[00:14:47] Wil Pong: That's right. It could be any enterprise process, right? This could now be HR how we hire people or go through a performance process. This can be marketing.
[00:14:56] Gary Stafford: Yep.
[00:14:57] Wil Pong: Is that right? Yeah. That's awesome. And you know the complexity there too, right? Upskilling people looking across the board, hiring and, and moving on and all that kind of thing.
[00:15:06] Wil Pong: So that really works for me as that sort of second example.
[00:15:10] Wil Pong: Let's flip the script a little bit. Third example you gave us is now we wanna leverage AI to better serve our customers. Not just to make our company more efficient, but actually to improve our offerings.
[00:15:20] Gary Stafford: That's right.
[00:15:21] Wil Pong: Um, what have you seen on that side, you know, as, as you're on the front lines here, have there been any kind of particular patterns of, uh, where customers decide to use agents that make sense?
[00:15:33] Gary Stafford: Um, onto that question, maybe within the realm of agents, I'll start with MCP servers.
[00:15:38] Wil Pong: Sure.
[00:15:39] Gary Stafford: I work with a lot of, a lot of, of, of customers, a lot of teams that have very mature APIs, so they're B B2B or B2B2C, not, not, maybe not B2C, but they're, they're working with businesses or indirectly those businesses are then having consumers, um, consume their services.
[00:15:54] Gary Stafford: And oftentimes something as simple as standing up an MCP server. So being able to communicate with that API using natural language, uh, and then having their business partners stand up, an agent which can communicate with their MCP server. And that often can be a fairly light lift for a company that has a mature engineering team, that has a mature API that's not a, that's not a huge stretch for them to be able to add that capability. So we see
[00:16:19] Wil Pong: Yeah,
[00:16:20] Gary Stafford: an easy, an easy path. Those are obviously for customers that have that, that, that ability.
[00:16:26] Wil Pong: so basically modernizing the way that software can talk to software, right? If you've got agents on either side, just throwing out some vanilla APIs may be actually very difficult. Setting up something like this fosters that collaboration. That makes sense that.
[00:16:39] Gary Stafford: just another channel to consume those, the, their resources, their, their intellectual property, their products. Um, in addition to an API, and you saw that we, and I know of, of several of the cloud vendors have released, um, agent marketplaces or MCP server marketplaces to make that easier to consume those, those services in the cloud.
[00:16:58] Wil Pong: That makes sense. That makes sense. Now while we're on the topic of standards or protocols, you mentioned MCP for those that are less initiated to this world, what is an MCP? How is that different than sort of the API approach that you talked about? And and why do teams need to learn a new skill? Like, like MCP in order to sort of make this work in the new world of agents?
[00:17:19] Gary Stafford: Yeah. And I would, and I, I'm happy to answer your question, but I would also, I love referring folks back to entropy. 'cause I think they've done a great job, is kind of the originator of that protocol and a lot of the thinking around that. Uh, their documentation is excellent, uh, refer folks that, um, but I think the model to context protocol, along with some of the other standards that are evolving. important to customers, especially enterprise customers.
[00:17:40] Gary Stafford: You mentioned enterprise customers. I think the challenge is AI. AI, AI is a new genAI is new. are very new. And in the absence of, of a lot of testing and evaluation and long-term production type architectures, I think at minimum I'm seeing customers that wanna adopt standards.
[00:18:01] Wil Pong: Mm-hmm.
[00:18:01] Gary Stafford: sure they're adopting a framework, let's say an agentic framework that they're adopting, one which follows standards, which gives them some guarantee of robustness, of performance, of security, uh, that they're, they're not using a proprietary system that will quickly evolve. And there's no guarantee, obviously. But I think that's the first question that we hear from a lot of customers is, you know, what, what are, what are the protocols? What are the standards that this particular framework, um, uses that I, that I can rely on? That's an industry standard.
[00:18:32] Wil Pong: Yeah, that makes sense. You know, one thing that we hear a lot from our customers and folks that we talk to is also this idea that, you know, as much as we've talked about so far, serving developers and building new ways to, to go do things, uh, we use that sales use case where we're actually seeing third party as well really get involved.
[00:18:48] Wil Pong: The Salesforce of the world are also building agentic platforms on their own systems, right? And, and so on. So what you start seeing is a proliferation of first and third party agents hanging out in an enterprise that really need to start working together. Right.
[00:19:04] Wil Pong: So, Gary, tell us a little bit about this sort of multi-agent world and how agents have to talk to agents.
[00:19:09] Wil Pong: I think MCP is a part of that, to your point, but what have you seen when it, it comes to orchestrating? You know, a bunch of these different players, right? If we think about the analogy of humans, this is when sales needs a handoff to marketing or product needs a handoff to customer success. This is what humans might do.
[00:19:27] Wil Pong: How do we help these multi-agent systems perform really well also?
[00:19:31] Gary Stafford: That's right. And I think we think about that. So when we engage with a customer, it's really working backwards from that existing human structure, right? That um, management hierarchy structure, specialization structure. So I, I may be very capable of performing multiple tasks, but there's other specialized tasks, Ryan, we need to rely on my peers.
[00:19:50] Gary Stafford: And oftentimes that's, that's a good entry point from moving from a single agent, an agent that has multiple tools available. I'm a carpenter. I have a dozen or two dozen tools that I can use and I'm competent in versus I'm not a plumber and electrician. I need to reach out to my peers in order to perform those specialized tasks.
[00:20:08] Gary Stafford: Um, so when we engage a customer, we try to understand, you know, what are the processes? Who, who are, who are conducting those processes, and so is there an, is there a logical separation of concern from a, from a, I'll call it a human in the loop hierarchy translate into agents?
[00:20:26] Gary Stafford: Does it make sense? 'cause oftentimes. They're very compatible, right? If there's a natural segregation of duties there because of specialization, then that's probably a good sign that you want to move from a single agent to a multi-agent architecture, in my opinion.
[00:20:40] Wil Pong: that's a great analogy, right? Let's say I'm a marketer and I wanna run a campaign against, you know, my, my most loyal customers,
[00:20:47] Gary Stafford: Yep.
[00:20:48] Wil Pong: what does that mean? It's an orchestration of how much they use the product and maybe how long they've been with my company, what their spend might be, and so on. So for me, as a marketing human, I wanna reach out to all these other humans or departments in my organization to figure it out.
[00:21:03] Gary Stafford: That's right.
[00:21:04] Wil Pong: Say now I am an organization that wants to get into agents and I wanna say, look, I wanna bring that into technology. And I heard Wil and Gary talking. I wanna automate all that. So what should I consider as an organization to sort of take that process and transform it, right? Um, do I have to learn anything or do my teams need to know anything about that sort of agent to agent communication?
[00:21:25] Gary Stafford: Um, yeah, I mean, I think there, there's obvious ways that we have customers that do it all themselves. They're, they're very competent and technical. There's certainly other customers that are extremely successful and, and, and maybe to some degree focus more on their core business and outsource that right to, with a lot of great partners out there and folks that can build that for them, to build those systems.
[00:21:44] Gary Stafford: But I think it's really, it's really comes down to understanding what the problem is. are all the tools that you're using as a human today? What are all those different capabilities? What are all the systems? You know, we often look at what are all the data sources that you touch? What are all the systems that you touch?
[00:21:59] Gary Stafford: Those APIs try and document all those. And you know, through that documentation, you're really kind of drawing out the architecture for that single or multi-agent agentic system, and really diagramming out the existing process and then overlaying that with where can we implement auto automation?
[00:22:15] Gary Stafford: That makes sense. And that, that there's an ROI on for the customer.
[00:22:19] Wil Pong: So I think that's a great strategy. Uh, I'm curious as we kind of get into implementing that then, and sort of like building those systems that matter, are there tools that we should be considering or thinking about that facilitate that sort of agent to agent conversation? Let's say I did build a marketing agent that knew everything about my campaigns and I wanted it to be able to talk to a sales agent that had all that context about the relationship.
[00:22:42] Gary Stafford: Yep.
[00:22:42] Wil Pong: How would I go about and do something like that?
[00:22:45] Gary Stafford: Yeah, I think, you know, I, I was gonna say six months, but that's probably too far back with agents. But months ago, the answer probably was you would need to build something, right? You would need to, uh, maybe you already have, uh, existing code or methods or functions to, to connect to those, those, those, uh, secondary systems.
[00:23:05] Gary Stafford: It could be as simple as annotating those tools. But I would say today, and I think you touched upon this before, kind of with the explosion of let's say MCP servers, a lot of the, of the, the systems that are very common in enterprises have MCP servers, or they're being developed very quickly. You know, you mentioned Salesforce and some of their systems. Uh, so it's, it's, it's probably easier today to plug into an existing MCP server and do that in a secure way, um, with that agent and give that agent those capabilities without having to write code or having to go back and, and even something as simple as annotating existing code. Uh, for, for the agent. So I, I think that's quickly evolving, but it's becoming easier and easier to connect to those third party systems as we see, uh, vendors developing, um, interfaces into their systems that argentic interfaces into their system.
[00:23:54] Wil Pong: That's really great that that is really the, the culmination of what we said earlier, right? CPS are kind of the, a new way to expose this information, functionality of the product. It's much more than saying these are API 2.0. You know, this is actually. Almost like creating a ey for your agent, right? This is how an agent understands how another agent's built, how it can talk to it, how it can work with it, and so on.
[00:24:17] Wil Pong: So that, that's really great guidance for anyone who's looking into this and saying, look, I've got a multi turn. I've got this whole ecosystem of different actors here. I wanna make sure they talk to each other to do a particular task. You know, the other, uh, sort of standard or protocol that we tend to see a lot is this idea of being able to see all of that activity.
[00:24:38] Wil Pong: So now that we have those agents, let's say we got 'em talking to each other, how do we know that they handoff is successful? What were they saying to each other? And did they call each other unnecessarily or do they do it well? So we hear a lot about open telemetry or hotel for short,
[00:24:51] Gary Stafford: sure.
[00:24:51] Wil Pong: ability to sort of see and, and sort of monitor what these folks are doing.
[00:24:56] Wil Pong: Tell us a little bit about that standard. You know, have you seen this, uh, sort of help enterprises as they're trying to stand up these solutions? Are there other ways to think about Observability? How, how do you see it?
[00:25:07] Gary Stafford: Yeah, I, I was kinda laughing 'cause I have a customer. I, I see a lot of customers that are into open standards and use both first party AWS, but many use third party. Uh, Observability platforms is very important. Having open standards is very important to them. Uh, and it's nice to see agents, which is AWS's open source framework, but many of the other frameworks have adopted the open standards, which I think is great.
[00:25:32] Gary Stafford: Uh, and it shows the importance of those open standards. for that. I think what's interesting, and I, and I find myself and, and, and maybe other folks on the, on the, the webinar today too, is realtime logging is more important. To your point, I built an agent system, single agent, multi-agent. I have multiple tools and I get to an answer, but how did I get to that answer?
[00:25:52] Gary Stafford: Did it, did it, you know, I would expect it to maybe make two or three calls, but it made 15, or I had a funny use case. I was developing a system and I got an answer back, and it was only later that I realized actually that was the LLM answering directly. The agent had not actually used any of the tools to go out to the web and find the answer or to the knowledge base.
[00:26:10] Gary Stafford: It tried to answer the question without the tools, and had I not. the Observability tools that were available and look through the logs. I probably wouldn't have realized that. So I think that real time logging, especially during the, the, the testing and the development phase is critical. Not only to, to make sure it's working, but how is it working?
[00:26:28] Gary Stafford: Is it, is it, even though it's non-deterministic to some degree, it's still deterministic in that you would expect it to call certain tools to call certain tools maybe in a certain order. Um, or if it's calling too many tools, right? You, you know, if it's going a minute or two minutes and you need to get that time down, how can you do that?
[00:26:45] Gary Stafford: So I think Observability is
[00:26:46] Gary Stafford: critical. Critically important in agent systems.
[00:26:50] Wil Pong: I think you got it. Yeah. Out of the box. A lot of these agents are people pleasers by nature, so to speak.
[00:26:55] Gary Stafford: Yes.
[00:26:56] Wil Pong: And they, they kind of. Their job oftentimes is to tell you what you want to hear. So that's where these tools that you talked about, being able to see what happened, was this an anomaly or not?
[00:27:06] Wil Pong: And then, you know, going further and actually tuning what you want to have happen. Maybe that's more explicit instructions in your prompt. Maybe that's actually behaving other parameters that will help it sort of stay more on rails or get more creative. There are a lot of things that we can play with to sort of manage agents, almost like we manage our employees, right?
[00:27:24] Wil Pong: Give them coaching and say, look, this, this worked out well, this was a bad example, man. Like, let's do it a different way next time. Uh, and the nice thing is, of course, with agents is you can encode that and make sure that they do it that way going forward, right?
[00:27:38] Gary Stafford: Yes. Yeah. I think
[00:27:39] Wil Pong: Yeah.
[00:27:40] Gary Stafford: field. Not to go off tangent, but how you encode that, right. That system prompt, that agent prompt and the set of rules that you give it. Um, and I, and I think similar to any other system, right? Rules what happens when there's a failure, uh, rules and, you know, there's, there's, there's a, there's a degree of complexity and I think that's still evolving in terms of what direction you give the agent and what context you give the agent, um, in advance of using the tools.
[00:28:06] Wil Pong: So that's a perfect segue because you know, you're kind of mentioning in, in this example, you're not just a strategist and consultant, you're also a builder, right? And you've gotten hands on with this stuff. Uh, I think a couple examples that I'm aware of is a perplexity like search, you know, a, a weather kind of function.
[00:28:20] Wil Pong: Tell us a little bit, you know, you can choose either of those examples or maybe walk us through one of them.
[00:28:24] Gary Stafford: Yeah.
[00:28:25] Wil Pong: What have you learned from a builder point of view about, you know, this sort of agent piece, and more importantly, you know, how you kind of do this orchestration. To your point, how do you know what good looks like compared to in the past when you wrote code and expected it to be performed the same way every time?
[00:28:44] Gary Stafford: Can I pick a third example? Not
[00:28:45] Wil Pong: Yes, please.
[00:28:46] Gary Stafford: I, so we just did a talk at LA Summit last week and we, and, and, and it is a recomme recommendation or personalization engine using an agent approach. Um, and that's probably just of mind for me 'cause I spent quite a bit of time on that. So we get a lot of, lot of requests for not, how can I replace my existing personalization or recommendation system, more traditional machine learning approaches, which, which work very well. But can I augment that with generative
[00:29:14] Wil Pong: Hmm,
[00:29:14] Gary Stafford: which, know, today I would say really, really more of an agent approach, which incorporates generative AI into that. so spent a lot of time building out a system, and I, and I think to your question, what did I learn the first one, and it was, was part of the talk is a majority of that system is not AI. A majority of that system is kind of those mundane, um, how does the customer log in? How do I authenticate the customer, how do I load balance the requests? How do I store the customer data? So the first 80% of that system is not agentic. It's really more of a feature of the system that I, that I can plug in, plug into the system to give it agentic capabilities.
[00:29:51] Gary Stafford: And I think that gets lost, right? As we, there's such hype around agentic AI and generative AI. You would think that that is the application, but the reality is that's more of a feature of the application, a, a smaller part of the appli of the application that you need to think about when you're developing, especially a new product. but yeah, no, it was, it, I, I think going through that process, I think I talked about the one example, right? Getting answers back. And not using the Observability that I was collecting and the metrics that I was collecting, the telemetry, then realizing, you know, it's not actually calling the tools. I think it's calling her, in some cases it's using the LLM to try and answer it with, its, uh, its limited knowledge with the context of the, the knowledge that the model had, um, through its training versus actually using the tools and how do I instruct it to use the tools.
[00:30:37] Gary Stafford: And I think the one last thing, the, the one pattern, which was interesting and I haven't seen a lot around that yet, is I have two tools and, and a a, you're seeing this a lot often, there's always limits, right? There's token limits, there's number of call limits on a lot of these third party tools. So I enabled it with two tools.
[00:30:54] Gary Stafford: Now how do I load balance those two tools? I have two external sources of information. They're relatively the same or maybe I'm using them within certain contexts. How do I direct the agent to use those tools in a particular way without dictating the order? So I think there's a, a lot of work around that, especially when. You have a multi agent, agent system, multiple tools, multiple cps, and there's definitely some coordination that needs to happen through prompting and rules beyond just defining the tools. It's, it's how do you use the tools, um, effectively.
[00:31:28] Wil Pong: One thing I find so interesting about that is it kind of changes the role of testing where if you think about traditional software development testing is about. Making sure the happy paths work the way they're supposed to. Like we already kind of know what should happen. And to an extent, software is software.
[00:31:45] Wil Pong: We think that's, uh, of a agents too. There's some tasks that they have to complete, but the pathway to that is also deterministic, to your point, in traditional software. So when we write things like unit tests, we say, look, I wrote the code to do X, Y, and Z. Make sure that the outcome happens the same way.
[00:32:01] Wil Pong: And then look for anomalies in my system. Right? Um, you know, resources that weren't there, et cetera. Agents are a little different because you're not trying to pre describe a path to your earlier point, how do I tell it the tools that it needs, but not be so explicit about when they use them to where they're dependent on me and my instructions in order to win.
[00:32:20] Wil Pong: So I'm curious, you know, when you think about tuning up that system as you're getting ready for primetime or production, you know, what are some tools that you've used to be able to. A, understand how your app is working at any given moment. You know, we talked about Observability, but I'm curious if there's anything that you've used to actually improve that process.
[00:32:40] Wil Pong: You know, basically tune up your agent before you release it out to the wild with customers or employees.
[00:32:46] Gary Stafford: Yeah, I, I'll, I'll, I'll tell you what I guess where my thoughts are and what I'm trying to develop. So I think in additional to traditional testing, right, performance testing, unit testing, integration testing, and again, that application AI is just a feature. So all of that. Conventional testing is still important, right?
[00:33:03] Gary Stafford: You have methods which are annotated as tools, but they're still methods which are testable. I think another area of, of, of, of research, which probably needs more thought is around, to your point, testing.
[00:33:15] Gary Stafford: And even though maybe the route that the agent takes or agents take is non-deterministic in terms of the tools that he uses, there's still probably a logical, um, if not ordering a logical process that it's taking, right?
[00:33:28] Gary Stafford: I know it has to call current time and date. I know it has to create a sandbox and execute a script. I know it has to reach out through this web tool or to this third party, API, and I think within the context of that, you can write, I'll call them non-deterministic tests, right? You may not define the exact order, but you know, some of the steps and, and that type of testing and, and I'm sure there's research into that.
[00:33:53] Gary Stafford: I find myself trying to write those type of tests.
[00:33:57] Wil Pong: So what's interesting is,
[00:33:58] Gary Stafford: of regex test, right? In conventional
[00:34:00] Wil Pong: yeah.
[00:34:00] Gary Stafford: conventional sense, you're just kind of regex and to make sure it's calling the right tools. It may have called it three times. I don't care, as long as it called it once, because if it didn't call it, that could point too narrow.
[00:34:10] Gary Stafford: If it called a tool, I didn't expect, why is it calling that tool? Maybe that's a sign that I need to go back and, and retool my prompt or re-prompt engineer my system prompts.
[00:34:20] Wil Pong: You know, you've got me excited on this topic because I think a lot of what you're describing, it almost blurs the lines of how we coach people versus how we write software. Right? A lot of times we, we tell high performers, look, I don't care how you get there. I just need you to go do these things. Right?
[00:34:35] Wil Pong: Um, you know, this is really important in this project to make sure we cover 1, 2, 3, 4 things. Look into it. Let me know what you need to do. Let me know if there's some blockers. But otherwise, have a good day. And then, you know, when we're done with this, give me a report. Right? We often give instructions to.
[00:34:52] Wil Pong: Agents in a similar way. Look, you've gotta accomplish these things and you have these tools at your disposal, how you use them. Feel free. You know, we trust you. Go ahead and try it. And that, I think is the, the thing you've gotta see. In other words, um, not just giving those instructions, but starting to understand how your agent interprets those instructions and whether or not you need to put more guidance around them as a second, third, fourth round and kind of go through.
[00:35:18] Gary Stafford: That's right. That's right.
[00:35:20] Wil Pong: So here's what I'm thinking about as you're sharing that, then let's say we're getting to that point, okay, we've, we've augmented our instructions, now we know, okay, in this case, you know, Billy, the agent, you have to go do this instead of that. And we feel like we've got a pretty good instruction set here and pretty good tool calls and things like this.
[00:35:39] Wil Pong: Now we have to think about putting this in production. So tell us a little bit about that as you're thinking about putting these solutions, whether it's a personalization engine you talked about, or the things. What are some pitfalls that we might encounter when we're trying to scale this up to many, many users?
[00:35:55] Gary Stafford: Yeah. So couple couple thoughts. One thought is, and, and, and we're all guilty of this, you know, with the excitement around AI, generative AI and agents and, and all these evolving technologies, we're only focused on the POC or the MVP, in some cases that is the, that is the deliverable, right?
[00:36:13] Gary Stafford: Whether we wanna admit that as vendors, sometimes the objective was to create a POC to prove that. We could implement AI sometime in the future, but for, for, for, for folks that want to develop a solution and go into production, I oftentimes with AI, we're not putting enough thought into Willis scale, right?
[00:36:32] Gary Stafford: Am I picking the right model? I'm not saying build out a production architecture for A POC, but also I see sometimes a lack of thought in terms of how will this scale and now you have a successful POC, but now you have just as much hard work to try and productionalize it because there wasn't a lot of thought put into that during the POC.
[00:36:50] Gary Stafford: Uh, if that makes sense.
[00:36:52] Wil Pong: It makes sense. You know, one, one of the core examples that are coming to mind for me is identity and access. You know, we, we talked about back office just now and as a developer, oftentimes what do we do? We say, okay, in order for my agent to work, I'm gonna give it god mode, powers, right? Here's this admin token that can access anything and everything.
[00:37:09] Wil Pong: 'cause I'm just trying to build this, right. I don't wanna worry about all that. But then we release it to production and we realize all of the calls, like all of the Slack messages that are being sent, or all of the emails that are being, you know, uh, forwarded or whatever it is, or documents created and so on, they're all created by Billy the agent.
[00:37:27] Wil Pong: It's not actually by the users that are doing it. So now I have no insight on who does what. Now I've given people powers that they probably shouldn't have in the organization. I'm really stuck.
[00:37:37] Gary Stafford: Yeah.
[00:37:37] Wil Pong: being able to build agents that can inherit the access of the user requesting things or can intelligently hand off to different folks in the organization, that's really difficult, you know?
[00:37:47] Wil Pong: And that, that's an example of, of, uh, production issues that may occur. I'm curious if you encountered that or any other kind of similar sort of production things that you don't think about when you're building, but you think about when you deploy and scale.
[00:38:00] Gary Stafford: Yeah, so maybe two thoughts. One is, certainly not to disagree with, I agree with you. It's complex, but I also think, and I see a lot of that, um, let me put it another way. A lot of those, a lot of that thought has already gone into distributed systems, into microservice based architectures, right? I have multiple microservices.
[00:38:17] Gary Stafford: Each one of those have a specific role, just like an agent. Um, they have different permissions, right? So a lot of, I think a lot of, uh, oftentimes we're sometimes guilty of ignoring existing architectures we think AI is totally new. But there's a lot of people that are arguing. A lot of those patterns are just repeating themselves. So I think looking at how do we, how do we secure distributed systems? How do we secure. Microservice architectures or clusters that talk to other clusters, multiple clusters. We have a lot of customers that do that, right? They're talking across the accounts, across clusters, a lot of, so a lot of, I don't think that thinking is brand new.
[00:38:53] Gary Stafford: Um, I, I think we can leverage a lot of that existing learning that teams already have and how to secure those systems. How to make sure that they're, they're redundant, that they're, they can, uh, take failure while those type of things. The, the other and I, I think you and I talked about this ahead of time, I don't wanna make this all about AWS and, and, and I'm sure there's competitors out there and there's evol, there's other products that are evolving.
[00:39:16] Gary Stafford: I think, you know, we recently released Agent Core, and I think what's really unique about that is that it's, it's looking beyond just AI, generative AI in the agents, but how do I deploy those and manage those at scale? Uh, and I think that's an, that's obviously an area of interest for a lot of vendors is moving beyond just having an agent framework.
[00:39:35] Gary Stafford: But to your point, how do. do I deploy that? How do I scale that? How do I think of things like memory authentication or authorization that you mentioned? Um, securing how do I ensure that if I'm gonna reach out to a third party, that I do that securely and use standard authentication or authorization methods.
[00:39:51] Gary Stafford: So I think Agent Core, and again, I'm sure there's, there's, there's similar products that are coming out. They seem to be, you know, fairly first to mark with that. I think that's a really interesting area, right? So frameworks that, that allow you to productionalize, uh, agents in agent systems is, is a, I think will be a growing area,
[00:40:10] Wil Pong: Yeah, I think that's really smart because to your point, there is a common infrastructure to agents that we don't see for other types of software. Like you said, you know, real-time memory is really, really important. How do we sort of spin that up and make it work well? Uh, similarly, how do we think about GPU support?
[00:40:25] Wil Pong: I don't normally use GPUs when it comes to writing software, right? So even spinning up a new cluster or load balancing has multiple dimensions. Now, it's not just sort of how quickly can I spin up another EC2 instance, for example?
[00:40:38] Gary Stafford: Sure.
[00:40:38] Wil Pong: so I, I'm totally with you. Um, having some of those tools that allow us to productionalize the software itself is really important.
[00:40:46] Gary Stafford: I
[00:40:46] Wil Pong: So.
[00:40:46] Gary Stafford: evolution, uh, right. As agents have, have come into being and are evolving very quickly. I think we, we've seen this in other, other technologies and I think that'll, it'll follow that same path to, to be able to productionalize these easier.
[00:41:01] Wil Pong: Yeah, I, I think that's really smart. So, so here's where the. You know, I'm processing what you're saying and it makes me start thinking about this tension or dichotomy that I hear, which is, to your point, agents have a bit of a superhero cape these days and we say anyone can build agents, right? Hey, you're the HR person, you should build the agent, not an engineer.
[00:41:20] Wil Pong: Like, you know, the business process, and we've kind of seen this rinse and repeated with tools that, you know, I, I won't name specific companies, but folks that say, Hey, you could build no code B2B software, or you could build no code integrations and things like this, and try to like, move that technology and abstract it so that the business logic can sort of take center stage.
[00:41:39] Wil Pong: I'm starting to see this pattern with agents too, yet the, there's a tension that's created because of what you just shared. They're very much computer science, like software engineering techniques in order to make great agents. So how do you, how do you sort of address that tension as you're working with some of these great organizations and they're trying to get this stuff out to the enterprise?
[00:41:58] Wil Pong: They want non-technical users to define the agents, but you need this sort of software skill in order to build ones that will serve the enterprise well. Right. So how do you address that for them?
[00:42:09] Gary Stafford: Maybe comment first. I think we're all guilty of that, right? AI machine learning, even data analytics, it's a very complex subject and I think you don't want to over complicate it if it's not necessary to solve a problem.
[00:42:20] Gary Stafford: We're also all guilty of under simplifying it, right? To your point, we, we simplify to a point where we hide the complexity and then the complexity becomes apparent later, slows down POCs moving into production often, right?
[00:42:34] Gary Stafford: The customer didn't realize the, the POC was easy, right? They, to your point, they, they, they clicked a few buttons, they're able to launch something and testing, but now they actually want to scale it up. Now you need to understand a little bit more about the underlying technology. So I think that's, I think we're all guilty of that. I think the entire industry needs to, to kind of have a better balance of, of explaining complexity, uh, and not under explaining it, uh, to,
[00:42:58] Wil Pong: That's right.
[00:42:58] Gary Stafford: to a point.
[00:43:00] Wil Pong: I think it connects to your earlier point, which is, you know, agents and genAI are an incredible capability. They're a new thing that we've never seen before, and it's transforming the way software is built, but it doesn't mean that it replaces the whole process. Building great solutions still involve everything we've known before.
[00:43:17] Wil Pong: This is additive rather than, you know, strictly a rip and replace, uh, at least as it is today. So you've really gotta use both pieces together in order to build something. Awesome. Um, so
[00:43:29] Gary Stafford: well,
[00:43:30] Wil Pong: yeah, go ahead, please.
[00:43:30] Gary Stafford: I fully answered your question though. Like in terms of the low code, no code or getting people
[00:43:34] Wil Pong: Yeah.
[00:43:35] Gary Stafford: I think where those people, and those people are probably more critical than folks like myself or you that are technical in nature, is they're the specialists, right?
[00:43:43] Gary Stafford: We're trying to represent, not replace 'em. I mean maybe replace 'em, but, but not necessarily replace 'em, but augmenting them. In order to do that, it's more important to understand what their specialized knowledge is, right? And that may or may not be technical. They understand the company, they understand the products, they understand how they would solve the problem, and we want to imbue the agent system with that knowledge.
[00:44:03] Gary Stafford: So I think they're critical to the process. It doesn't matter how many technical people you have, if you don't have the domain expertise, the specialized expertise, you can't, you can't build that system and replicate that domain knowledge within the system,
[00:44:17] Wil Pong: Yeah, I strongly agree that that was a point you made earlier around that. That's the point of all this, right,
[00:44:21] Gary Stafford: right.
[00:44:22] Wil Pong: is to transfer a lot of that special sauce to the agent and also to keep the agent accountable and honest. We're still gonna need that human in the loop that that allows us to sort of make sure that that works well.
[00:44:33] Wil Pong: So I totally agree with you. It's a holistic thing. Rather than saying either technology can replace the people or people could just go make it, and technology is just agnostic and we don't care. This sort of coming together is, is really critical and, and I would really agree with that. So if we zoom out a little bit, so far our conversation has been about how do we identify problems that agents can help with?
[00:44:57] Wil Pong: You know, what are the capabilities themselves? When we talk about agents, what can they do that previous kinds of software tools couldn't do? And now we've talked about the idea of not that we've built a solution, trying to get it out there and getting it to scale. So let's imagine, you know, we're all through these three pieces.
[00:45:12] Wil Pong: Now, sort of the final part of our conversation is about now that we're at scale, you know, we've done the right things, we've scaled well, we'd have the right permissions in place, all that kind of thing. And great. Now we have thousands or even millions of users coming to our customer service solutions or to our marketing solutions or whatever it is.
[00:45:30] Wil Pong: How do you protect your tools as they go forward? You know, one thing that we saw from the poll from our uh, participants here is that, you know, safety, compliance, risk, this is a big part of their story. So how do you think about making sure these systems are safe, especially because they have that degree of non-determinism, we don't always quite know what they're doing.
[00:45:51] Wil Pong: So how have you advised different teams or companies in this regard?
[00:45:56] Gary Stafford: Yeah, so I think two, two parts to that answer.
[00:45:59] Gary Stafford: Part is not a cop out, but I think again, to some degree, a lot of what you're building is traditional software.
[00:46:06] Gary Stafford: That to the point of we have systems in place, right? Mature development organizations have mature practices in place. DevOps practices, deployment practices, SDLC processes where they're able to, to efficiently, effectively test these things. I think to your point, the non-deterministic part of this is, is, is relatively new to a lot of us, but I think it also follows those same, um, practices, having pipelines, being able, a, a good example that comes up and I we see it a lot, is customer. Cus I don't say customers, I feel, I feel like I'm saying negative.
[00:46:39] Gary Stafford: We, we all tend to like, want to use the latest model. Right? But if I change models, even if it's a, it's a minor version, my responses can be very different. Right. The end result from the, from the model, which is underlying, if it's a reasoning model, which is underlying the agent. The, the behavior can be very different and totally change my system and, and, break the system.
[00:46:59] Gary Stafford: So I think, being, being careful about testing and also testing in advance, right? What's the next model we might wanna evolve to? And testing that in advance and already building out those prompts. And then having a way to not only replace the model, but you may have to replace maybe those system prompts, which then speaks to maybe having better configuration management.
[00:47:20] Gary Stafford: You know, think of those prompts as configuration. Think of the, the, the model configuration and how easily those things can be compartmentalized and modularized and replaced in a system as you roll that out to production. So I, I think just thinking through that, I think that's something we don't think about.
[00:47:36] Gary Stafford: I hear that complaint, like, we switch models and this stopped working, right? But to some degree you wouldn't do that with traditional software. You wouldn't change out a database engine, or maybe more specifically like a package version. Uh, of a particular framework you're using without thorough testing. But oftentimes I think with ai, with the enthusiasm around it, maybe we're,
[00:47:55] Gary Stafford: we're jumping models to the latest model a little too quickly without thoroughly testing it. Yep.
[00:48:00] Wil Pong: That's a great point. 'cause there's, there's different levels that you talked about. You know, you mentioned packages and I think, you know, the developers on the call will understand this. You're using some OSS package. You don't just go to the 3.0 release, you know, you, you probably build a new build with that thing.
[00:48:15] Wil Pong: Try to go test it, maybe put it behind a flag, try it out, make sure it's good before you go live. Um, but to your point, we're a little bit quick to just switch over our, our value, you know, when, when we build agents and kind of go, so one thing I'm hearing about with this kind of configuration piece, it's true for apps, you know, when we think about overall commits or overall versions, but we also have to think about that infrastructurally too.
[00:48:39] Wil Pong: 'cause these are some underlying components, right? Kind of sits in the middle. So when we think about like infrastructure as code. Maybe that's one thing that we need to think about, which is what framework are we using? Was there a new version of that? Is that something that we have to sort of safeguard?
[00:48:52] Wil Pong: So to your point, even though the processes are familiar to this, that build software, the way we go about them or adapt them to the genic world can be different. So you need new tools to be able to bring that to bear, that's for sure.
[00:49:05] Gary Stafford: I think especially with agents, I know just in doing a talk last week, I think in the three weeks it led up to doing the talk and doing the deck. bet the three frameworks that we referenced changed at least three to four times,
[00:49:16] Wil Pong: Yeah.
[00:49:16] Gary Stafford: is, is abnormal, but not for this, not for this quickly evolving technologies.
[00:49:21] Gary Stafford: Right? They're, none of the frameworks are 1.0. They're evolving quickly, they're adding features. Oftentimes those are breaking changes. Uh, so I think more so whether, whether it's as simple as a framework or something as major as changing a model or, or a family of models is using the same testing rigor.
[00:49:37] Gary Stafford: First point. That you would apply to any other software product that you were building.
[00:49:42] Wil Pong: Do you have any advice or best practices for how to rethink those processes in an agentic first world? Is there anything that needs to change or evolve a as a result of how the software works?
[00:49:55] Gary Stafford: I I, maybe it goes back to the point about oversimplifying a com, a complex process, understanding the implications of changing a model. You know, a great example is, Hey, I have this prompt. I tested five models. This one worked best. the reality is, for each one of those five models, that prompt should have been slightly different and optimized for that particular model provider or that family of models.
[00:50:18] Wil Pong: hmm.
[00:50:19] Gary Stafford: I think sometimes there's maybe a little, a little bit of naivity around what can change, right. I have, I have a system prompt. That works great. I'm not gonna change it. Well, the, the, the answer is that that prompt needs to continue to evolve as the system evolves, as you add more tools, as the framework evolves, as the model changes. You know that, that those things are not static, right? They're, there's just as dynamic as other parts of the EnTec system. I don't know if that answered your question, but
[00:50:48] Wil Pong: I think it does. I, I think if I was to reflect it back to you, what I'm hearing is, you know, the, the process for rapid iteration, we, we've learned a lot about how to rapidly iterate software and do it in a good way. Adding these additional tools that can measure non-deterministic systems and feed that into the process of regression testing or, you know, testing CICD, different builds like that.
[00:51:08] Wil Pong: That's how we have to think about it. Let's give us new signals into existing process so we can continue building good software. And I truly agree with you. Yeah.
[00:51:16] Gary Stafford: I think those tools are Still evolving, right? That, that natural arc as agents come into being, POCs, MVPs into production. I, I, I mean there are some out there, there's some companies doing some great things, but to your point, non-deterministic testing I think is still an a, a, a growing area of research and development for companies to develop those tools, just like you saw it with.
[00:51:38] Gary Stafford: AI and around testing models and evaluating models
[00:51:42] Wil Pong: Yeah.
[00:51:42] Gary Stafford: machine learning models. What's, you know, testing, evaluating generative AI models is very different in some to some degree.
[00:51:49] Wil Pong: I'll echo you earlier. You know, I'm, I'm not about to make this an infomercial for my company either, but you know, for anyone who knows what Fiddler AI is, we deeply believe in what you're talking about, both that it's early in terms of building the tools that will help people build great agentic software, but also that this is a core need that people need to happen.
[00:52:06] Wil Pong: And to that point, I think the other trend that I would say and, and maybe pick your brain on a little bit, is the idea of preventative medicine. You know, because these tools and systems are non-deterministic on any given output, we don't quite know what it'll be. So it's very good to be able to observe and rapidly iterate, but that's usually downstream from the actual action that's happened, right?
[00:52:29] Wil Pong: So how do you actually stop the sort of, uh, you know, bad outcome from happening in the first place, for example, getting. Prompt injection or jailbreaks or on our side, you know, making sure that your agent doesn't leak PII or something like that. Um, are, are, are there any sort of considerations that you've thought of there or any suggestions you would have for the audience when it comes to sort of protecting them against those outcomes In real time?
[00:52:55] Gary Stafford: Yeah, I would maybe, and I see, I see, uh, Karen's hitting us up for questions here. I, I
[00:52:59] Wil Pong: Yeah.
[00:53:00] Gary Stafford: the, the great starting point and, and, and it's in existing, and they, they've been around this, certainly not just specific to AWS, is guardrails and most of the have access to guardrails.
[00:53:11] Gary Stafford: There's some great third party guardrails and, and obviously more mature products out there from cloud vendors and other vendors. But using guardrails effectively, I think is a great first step. I see a lot of customers that, that's kind of an afterthought, but I think building that in right from the start, even if they're just for toxicity, the PII and then build those guardrails up as you, as you need, as you need them, is a good starting point.
[00:53:34] Wil Pong: In other words, testing super important as part of the flow of how we build software,
[00:53:38] Gary Stafford: Yep.
[00:53:38] Wil Pong: but also having some kind of runtime protection is equally important. You kinda have to balance those two.
[00:53:44] Gary Stafford: Sure.
[00:53:45] Wil Pong: Totally agree. All right, well with, with a couple minutes left here, uh, Gary, let's see if we can answer a question or two for the attendees that are, are here.
[00:53:54] Wil Pong: I think there's been a lot of good stuff that's come through. So why don't we start from the top here. Um, how do you decide when it makes sense to add an orchestrator instead of just extending a single agent?
[00:54:08] Gary Stafford: I'll tell you how I start. I tend to always start with a single agent, unless it's very clear. I have two there's a clear separation of concern. I tend to start, and this is probably how we tend to develop software too, with a single agent and adding tools to a point, and then I might go back and refactor and say, you know what?
[00:54:25] Gary Stafford: It makes more sense to split these tools off. This really should be a separate agent. There's a clear separation of concerns here. The, there should be a hyperplane here. Let's, let's separate this, these, in these tools and move those into a, into another agent. But I think I tend to, like, I tend to develop, right?
[00:54:41] Gary Stafford: I just start writing code and then at some point I split those into classes or modules. I do the same with agents. I don't know if that's the best approach. That's just personal kind of pattern that I follow with customers. I think we're more organized, right? We try to understand again, who are the humans in the loop? Are they specialized tasks? And if they are that, that's, we, we tend to start there with developing probably multiple agents or, or, or some type of, of multi-agent architecture depending on how they're used to operating.
[00:55:11] Wil Pong: That, that flows perfectly in the next couple set of questions. 'cause I think people are really wondering about this multi-agent kind of approach, you know, so, so the next question I've got here is, when you've tried connecting multiple agents together, what has been the hardest part? Technically? Is it coordination latency or something else?
[00:55:29] Gary Stafford: I think it's coordination. So what worked well, I had a single agent with probably too many tools and logically I should have separated. It was working and now I've separated that and now I'm having trouble orca the orchestration between the two specialized agents or the orchestrator agent and the two specialized ones. I've also, I tried swarm, so, and I know that there's a couple frameworks that have that concept of no overall orchestrator, that the, the agents are kind of self-organizing. I've tried to implement that sometimes and it's probably not the best use case. So still trying to understand maybe where a swarm architecture makes sense over a, a, a more hierarchical, uh, orchestrator specialist specialization architecture.
[00:56:05] Gary Stafford: Sure.
[00:56:07] Wil Pong: Yeah, there's a trade off there, right? For, for every, um, for every benefit, let's say in answer fidelity or speed or, or expertise you get from a multi-agent framework, you now have introduced new edges to your graph. Right? Now you have to worry about every handoff was the handoff. Correct. And now you're testing those connections as well as the age ancient behavior itself.
[00:56:28] Wil Pong: So definitely a consideration of saying, do I need that separation of concerns. To your point, I think that's the single most important thing, is to nail the strategy and then the technology of connecting them is, is fairly straightforward.
[00:56:41] Gary Stafford: Maybe I'll just add, without going into too much, Jeff, because it's probably a longer conversation, also, maybe a different model would make better sense for this task versus that task. So maybe a more specialized model that can address this particular task. That's maybe not the best reason to have two agents, but it, if I can split the, those functions off to a separate agent and the underlying model under that agent is something different that has a very specific. Uh, performance characteristics that can solve that problem, or it's cheaper or faster. That part isn't as complex as other parts. Maybe I can get away with using a smaller agent or a faster agent. So thinking about, sorry, a, a model, thinking about the underlying models and maybe using multiple models and using the agent as a point of separation between them.
[00:57:28] Wil Pong: I think that's great advice semantically, a lot of folks I talk to, I like to say, chat GPT is an English major. Claude is a STEM major. Think about it that way, you know, that's important. And then also, do you really need Claude's PhD or do you need a, a really smart bachelor's to, to be able to solve this problem?
[00:57:44] Gary Stafford: A very small reasoning model or two models sometimes
[00:57:46] Wil Pong: Exactly. Are we doing four oh nano versus, you know, the, the full fat oh one. That's what's critical. So listen, Gary, it's been a pleasure getting to know you and talk to you about this, uh, you know, frontline of agents here. It's been so exciting, uh, to, to be able to discuss the front lines.
[00:58:02] Wil Pong: But again, we just want to thank you for your participation and we'll be in touch really, you soon.
[00:58:06] Gary Stafford: Yeah, Wil. Thank you, Karen. Thank you. Appreciate the time today.
[00:58:09] Wil Pong: All right. Have a great day.
[00:58:10] Gary Stafford: Thank you.