Generative AI Meets Responsible AI: LLMOps - Operationalizing Large Language Models

Table of content

While generative AI offers huge upside for enterprises, many blockers remain before it’s used by a broad range of industries. LLMOps is the new ML workflow to accelerate adoption and productize generative AI.

Watch this panel session on LLMOps - Operationalizing Large Language Models to learn:

  • How LLMOps iterates on MLOps to optimize for large language models
  • The key pieces of a generative AI workflow
  • How ML teams can approach leveraging LLMs in their applications

Moderator: Krishna Gade, CEO and Co-founder, Fiddler AI

Panelists: 

  • Amit Prakash, CTO and Co-founder, ThoughtSpot 
  • Diego Oppenheimer, Partner and CEO in Residence, Factory
  • Roie Schwaber-Cohen, Staff Developer Advocate, Pinecone
Video transcript

Mary Reagan (00:07):

Krishna will be moderating. So I'm gonna turn it over to Krishna to interrupt, to, excuse me, introduce our panelists. Take it away, Krishna.

Krishna Gade (00:15):

Hey everyone, welcome back. Thanks Mary. I'm super excited about this panel. I was very much looking forward to this all day. Let me welcome my panelists one by one. We have Amit Prakash who's the CTO and Co-founder of ThoughtSpot. Are we gonna let the panelists in?

Amit Prakash (00:44):

Yeah, I'm here.

Krishna Gade (00:45):

Oh, you're here? Okay, awesome. So, welcome Amit. Amit is the co-founder and CTO of ThoughtSpot for the past 10 years. And they are an AI powered analytics platform that allows anyone to ask data questions in a Google like search interface. Prior to that, Amit was leading a machine learning team responsible for predicting clickthrough rates for ads at Google. And prior to that he was part of the founding team that built Bing at Microsoft. And then another panelist is Diego Oppenheimer, Partner and CEO in Residence Factory. For a lot of us in the MLOps community where we know Diego as the Founder and CEO of Algorithmia.

(01:30):

And previously he was the Executive Vice President of DataRobot and Diego is an entrepreneur and product developer and investor with an extensive background in all things data. Currently, he's a Managing Partner at Factory Venture Fund, specialized in AI investments and also an interim Head of Product at two LLM startups. Diego's active in AI ML communities as a founding member and a strategic advisor for the AI Infrastructure Alliance and MLOps community and works with leaders to define the ML industry standards and practices. He holds a bachelor degree in information systems and a master's degree in business intelligence and data analytics from Carnegie Mellon University. Welcome Diego.

(02:13):

And then finally we've got Roie Schwaber-Cohen. He's a Staff Developer Advocate at Pinecone. Roie is a Full-stack Software Engineer specializing in AI and data-intensive applications. And for those who probably don't know, Pinecone is a leading vector database startup. And we are very excited to welcome all the three panelists. Awesome. Thank you so much. So let me start with Diego. You wrote an insightful blog recently titled "DevTools for Language Models — Predicting the Future." Could you walk us through the key insights?

Diego Oppenheimer (02:58):

Sure. So I co-wrote this with David Hershey, so I wanna give him credit for that. But the general idea was and this is actually kind of similar, a similar exercise that I went through before starting Algorithmia, which was I looked at kind of the DevOps process or kind of like what the DevOps process was for applications. And then I thought about, okay, well what happens when all this deterministic code becomes probabilistic and like, what breaks and, and what pieces need to be solved and what changes and what's the same? Except this time I started from the MLOp stack. And so I'll start from the controversial point where I don't think I know the title here, but like, LLMOps, FMOps, like everybody wants to call it something new.

(03:43):

Like, look, it's operations for machine learning, right? Like, there's different components. Like it's you know, at the end of the day what we're doing is we're trying to operationalize machine learning at scale. We're trying to do these things in production and like, you know, things need to be done for that, right? Which is like different than, you know, running notebooks on your know, laptop. So I will put a line in the sand and say it's all the same, but there's different components and it's kind of like evolving, right? So I think that's important. So I think the I think the part of the most important thing and that really cool and exciting piece about this whole thing, you know, so if we thought about the traditional MLOps process, I'm oversimplifying it, but you had like, you know, kind of preparation of data, figure out your data, you're training your models, you had the deployment inference, you have monitoring, you have on the process kind of experimentation management.

(04:31):

So you have a bunch of these components that we were all, you know, we work on different parts of that, and obviously for the most part of it, and Algorithmia a piece part of that. That was kind of like you looked at the stack and, you know, kind of how you were gonna go evolve this. And a lot of this was on this kind of basis of, you know, machine learning to get anything done with machine learning I needed a ton of data. I needed to be able to build models and train them and understand how to do that. And then I had to deploy those models, and then I had to integrate those models into applications. And the interesting thing about that is just like, if you think about like, the level of how hard this is to do it was like hard, hard, hard, harder, hard, hard.

(05:10):

Like there was no easy piece to this, right? And it was just complicated and massive thing. You know favorite quote of the day around your failure rates and like, who gets it done and who doesn't? And you know, I know you and I Krishna definitely put it at the top of all our marketing material, right? How hard this is to do. It's true. Like we weren't lying, right? But it's you know, this is very difficult. So bowling ball comes in of these like, foundational models, right? Especially the large language models and suddenly using them so, you know, building them and training them super hard, like magic, you know, almost black boxes to a certain degree, but using them super easy. Oh my God, anybody, right? That's just like basic language going in. Stuff comes out. So like, the access point to these models it's amazing.

(05:55):

Like we just, like, you know, the true democratization kind of aspect just happened and they feel magical. They don't, but like, let's be clear, they don't feel magical just to the layman person. They feel magical to ML people too. I mean, at least that's my experience, right? Like working with like, you know, these, these like LLMs and so the access point became like, anybody can go build. And so you have this explosion of people building, you know, kind of a thing. And the best way that I equate this to from in the past is like how if you were, you know, 20 years ago raising venture capital or something like that, you know, it'd be like, well, I need just like, you know, a couple million dollars to be able to go like buy servers and build a web app.

(06:38):

And now it's like you should have done this with $3 in AWS and like got your website running. And like, I mean, like the entry point is so low. Right? To build something of, you know, kind of something. And I think that's just happened with AI, right? That's what was really that, and that's kinda like the core of the article came out to be, which is like, we got to this holy shit moment of like, this is, sorry, I didn't mean this swear. That's literally the quote. The you know, that where anybody can access this, and it's like the world is like seeing the power of kind of these large language models. And then, okay, great, now that you have this like, giant population of people, we're approaching these large language models. Let's go reevaluate the MLOp stack and be like, what are the pieces that are still important, right?

(07:24):

And what, oh, and what are the pieces that change or need to be reinvented, right? And so I think this is where you start getting pieces like, you know the data piece, which was always important, but now it's like, you know, it's not quantity of data unless you're building your own FM, it's like sniping the data, right? It's like getting the perfect like manicured sample sets of data to be able to do that. And there's a whole revolution around that, or new. There's the whole like, you know, experimentation management for the most part stays the same, like almost the same, right? Training is like a whole different, you know, it's like, it's not even the same like category anymore, right? Like everything from hardware to how you do the data to how you translate, to how you start the models, to how you like dissect it, that's all changed.

(08:08):

And so there's new tooling that needs to come out with that inference. You know, most of these models don't run on anything, right? I mean, like, we're stupid this week, actually this week that was more true last week than it was this week. This week you could start like writing LLaMA on your laptop and all that stuff. But like, there's a whole complication around inference and how do you deploy this skill you found? And then monitoring is really interesting because now you have this whole, how do you interact with these models as via these prompts? And then how do you track those prompts and how do you see the outputs and how do you see how the effect of these models are having and do that? And, you know, you have the, you know, the kind of vector databases that are paying this huge, huge new role.

(08:46):

And not saying that they didn't have an important role, but they just kind of like, have such more important role. I'll let Roie, you're not gonna take Roie's magic on this. Like, you can talk about it all day. But you know, like around these kind of like, why we need these vector databases. And so the article at the core of it was really thinking about, okay, so the new thing is these, we see the power, right? And I think anybody who's worked in an LP for the last decade, I know I had these conversations with a bunch of PhDs, which is like, well, everything I did is like, I mean, what do I do now? Right? Like, I mean, like, there's obviously still more to advance, but like everything changed at a pace that's so impressive. The tooling's you know, changing with us. So anyway, that was a very long answer, but that's essentially like the core of the article about thinking about the new style of tooling, but the framing is, okay, let's grab what we had, what changes in this new world? And thinking about like, what are the new components? Which components do you throw out? What needs to change? What stays the same?

Krishna Gade (09:42):

Awesome. Thank you so much for the detailed answer. You know, maybe like, let's step back a little bit. You know Amit, at ThoughtSpot you have, with this new search experience that uses LLMs like GPT-3. Can you tell more about it? You're on mute, Amit.

Amit Prakash (10:04):

Sorry. Yeah, so it's both new and not new in the sense that for Thought last 10 years, we've been trying to crack the code on how to make it easier and easier for people who know little about data to be able to ask data questions. And, and like, if you look at how the world is, sort of, if you draw the venn diagram of people who have the business context and the real world context of what this data means and how this can be applied strategically versus the people who know how to operate the tools that can pull an insight from data, the intersection is very thin. And so as a result, what happens is that people are always stuck in this world where the producers of data insights are feeling like they're doing a thankless repeated job where like every once in a while somebody comes and says, give me this slice of data this way quick, and so on and so forth.

(11:01):

And they don't see a lot of career progression and value in it. And the people who need this, they feel like they can't do their jobs because if every follow-up question takes a week of sitting in the queue, then how do they get to their five why's, right? So that's been the mission we've been on. And in 2012 when we were getting started, we had like a 10 person team and we said, there's no way when Apple with a hundred person team is building Siri. And its accuracy is kind of somewhere maybe in high eighties, there's no way that we can build a system that's gonna be so accurate that a business user will be able to sit in front of it and make decisions if we make this probabilistic. So we went and built sort of a factory for DSL that allows business users to get as close as possible to natural language without losing the determinism and predictability of how they get the answers.

(12:02):

And on top of that, for the last four or five years we've been building sort of NLP layer that can take your natural language intent and bridge the knowledge gap and get you there. And we made a bunch of progress, but what we realized was that to solve this problem, you need a couple of things. One, you need a lot of knowledge about the real world, right? When someone says, what's the longest movie that was made in last 10 years, the longest could mean duration, which is the most likely answer here, or it could mean sort of the length of the name of the movie or any number of other things, right? So to be able to answer that question, you really need to know a lot about the world. Like movies have duration, and that's what people mean when they talk about length and things like that.

(12:51):

And the other thing that you need is a lot of institutional knowledge. So the example I'll give you is that we are working with an airline company where they would ask questions like, what's a 0 for DFW? And a 0 for them stands for average arrival delay, and DFW is Dallas Fort Worth Airport, but because it's A-0, it means that DFW is arrival airport, but if it was D-0, there's departure airport. So DFW is departure airport, so this kind of institutional knowledge is not available in public domain. So we've been trying to solve all of this, and what we found was that everything that we built and everything that LLMs bring, if you put them together, there's kind of a magical picture that appears where we are very good at capturing the institutional knowledge and injecting that into the flow.

(13:40):

And LLMs is very good at capturing the world knowledge and new answer in natural language. So combination of that is the product that we just launched that we call ThoughtSpot Sage, where essentially you can ask questions in natural language, and that translates to our intermediate representation, which is the thing that we built in the first place. And that allows a business user to understand how it was interpreted and manipulate if they need to, and then from there, get to a data visualization and sort of traditional sense how you do it in BI, but also gives you something that's very interactive and you can drill down and sort, filter, whatever you want to do. So that's been the most exciting thing we've been working on for the last three months. We've been, we've done a bunch of other things as part of this launch, which is being able to generate narratives around insights, being able to enrich the metadata so that you have synonymous and stuff like that automatically generated because there's a lot of real world knowledge and the ability to produce beautifully crafted sentences in these LLMs. And so all of that has allowed us to kind of push the product much further.

Krishna Gade (14:45):

Awesome. Just maybe a follow-up question. You know, are you actually fine-tuning the LLMs with all of your data? Or at this stage, are you mostly prompt engineering? So could you describe the differences between those two?

Amit Prakash (14:57):

Yeah, definitely. So the larger LLMs have this interesting immersion property that they can learn in context, and that's what everybody's calling prompt engineering, that you can inject some knowledge that the model doesn't have in context, in the prompt, and then expect the model to do some reasoning on top of that new knowledge and give you something. So this is kind of one place where you can inject institutional knowledge, like when somebody is talking about revenue in this company, what they mean is the opportunity, ACV column in Salesforce dataset where the opportunity is closed, kind of thing, right? And then fine-tuning is this idea that you or somebody else trained this massive model over a massive body of training data that had nothing to do with the problem you were trying to solve, but it helped the model learn the underlying representations, which is super useful for solving your problem.

(15:59):

So now you take that model, feed your training data into it, and let the weights get adjusted, or maybe add a few layers on top of it so that now it becomes a specialized network for training your data. And that dramatically cuts down the amount of training data you need because the underlying representations have already been learned and dramatically cuts down the cost and gives you a much smarter model. So as far as our use is concerned, yes, we are using a combination of both of these things. But at the moment, I think prompt engineering is probably more promising than fine-tuning.

Krishna Gade (16:38):

Awesome. Yeah, so coming back to fine-tuning, you know, one of the things that you probably would do when you're fine-tuning is to generate embeddings, and then you need a database to store those embeddings. So Roie, maybe you could tell us about Pinecone, you know, what is even a vector database mean, you know, maybe just start with that introduction and then, you know,

Roie Schwaber-Cohen (16:58):

Yeah, the basics are pretty straightforward, right? So we know what databases are and we know what vectors are. And so vector and databases, store vectors, right? So where traditional databases would store rows of scaler values, vector database is store and are optimized to query vectors. and to basically use similarity metrics to give you back the best results. Now in the context of LLMs, I think that the most relevant thing to think about here is kind of like how we ground the LLM in some context that is relevant, right? And we've talked about, Diego and Amit both kind of mentioned this. LLMs are sort of detached, so to speak, from you know, knowledge that is specific to a particular problem.

(17:52):

They're very general. They know a lot about the world and they know about know a lot about language, but when it comes to actually solving problems that are specific to a particular domain, they're not tuned to do those things. And so what we're finding, and this is a pattern that we're seeing a lot, is that users typically bring their corpus of knowledge and they create embeddings for that knowledge, right? So now we have this, this this textual data in a semantic space, and if in the form of embeddings, and then we can take interactions with the LLMs itself, embed those as well, and produce this combination that kind of combines the query that the user has, right? With the knowledge that the application has indexed. And that kind of again, grounds the LLMs in something more more reliable and trustworthy. And when we're talking about responsible AI it's very easy to see like where the dangers are, where we let you know, LLMs hallucinate answers to the questions that we ask them. And you know using vector databases in that context is one of the ways that people can mitigate that.

Krishna Gade (19:12):

Got it. So let's say if I am like building an application to create a vertical app on OpenAI for that matter, so now where would I use vector database? You know, am I actually creating, taking my corpus and creating embeddings and putting in Pinecone, in that case? You know why would I, you know, need this, this Pinecone database if I could just get the response back from OpenAI? So maybe could you just like elaborate on that?

Roie Schwaber-Cohen (19:41):

Yeah. So I'll give you an example. Yeah, I'll give you an example, right? So like, let's imagine for example, that we want to build a chatbot that would be able to answer questions about our documentation, right? So we have a product, we have documentation for that product, and we want the bot to be able to answer specific questions about that documentation. Now, obviously, if you just asked OpenAI about that documentation, right? It wouldn't know anything about it, right? So we need to be able to access that information and kind of combine the thing that the user asks with the documentation itself and the things that are relevant within that documentation. So what we typically do is, like you said, we take the documentation and we create the embeddings and save those in the vector database, and then we take the user's prompt, we create an embedding for that, and then we match that prompt with the data that's in our database. So, and then in that way, we can basically get the documents that are most relevant to produce a coherent answer. We inject them into OpenAI as part of the context, right? That goes into producing an answer. And we literally ask OpenAI or any other provider, right, for that matter, right? Given this context, please answer the question that was asked, if that makes sense.

Krishna Gade (20:56):

Got it. So OpenAI, you're using it as a translation tool between this like embeddings, which are not understandable from humans to like a human readable language that, you know, people can see and understand.

Roie Schwaber-Cohen (21:06):

Exactly. It's sort of like a natural language interface right on top of an application.

Krishna Gade (21:12):

Awesome. Great. So maybe going back to Diego, you know, one of the things that building with LLM APIs is, you know, fundamentally different as you just pointed out with the last wave of ML, right? You know, how do you see this language, how do you see this landscape changing? You know, for example, you know, people now are, you know, building applications through prompt engineering. You know, we just talked up to Amit about, you know, they're using a combination of prompt engineering and fine-tuning, you know, where would, you know, where would you start first, you know, and then, you know, when do you have to do the fine-tuning and when do you do have to do the full shebang? You know, maybe you'll training your own, you know, a foundation model if you want.

Diego Oppenheimer (21:52):

Yeah, no, that's a great question. And I talk a little bit more freely about it because we announced what was it on Monday? One of the companies that I was working on, so like Numbers Station just came out of stealth and we work a lot in the data transformation space using LLMs. So I think you're gonna get a set of applications where, you know, you can use an API and you can fine-tune a model and you can kind of pass knowledge and you're kind of like the fine-tuning plus the embedding database and kind of like, you're gonna go build off of these kind of like transformations. And it really is about the UX, not the pixels, but the, like, the experience of the application working with these things where you're gonna get the effect, right?

(22:33):

And so I think this is more of the category of things like, you know, like Amit describe where you can kind of like, you know, in the experience that they have, you know, you can now actually do this kind of like, question and answering and it's like super powerful, right? Because they can, you know, it's in that application. I think in other cases you're gonna have to think about, I can't use the kind of like API based, you know, thing either because I have to bring that, it could be a data privacy situation, it could be a cost situation, it could be a latency problem. And all three are kind of different categories of how to think about, you know, where you're looking at. So I'll give you a quick example of you know, in the Numbers Station case, so we're doing, you know, data wrangling, which means that we're pushing transformations over like, you know, hundreds of millions of rows.

(23:18):

And so doing that over API calls is like suicide, right? Like, you just can't do that, right? Like, it's just never gonna work from a cost from all those angles, right? And so we've come up, you know, we've come up with a way of, you know, starting from more generic LLMs and then being able to distill small LLMs out of that, right? For very, very specific tasks where you can get to, you know, something in the hundreds of millions of parameters versus the billions of parameters. And then now you have something that's hundreds of millions of parameters, you can now push those into the data warehouse and can you know, and actually run those directly in a data warehouse. So you can see here, like where you're going through a process to, that's kind of like the extreme of complexity to your point, which is like you're actually starting from large, you know, kind of like starting from larger language models and then distilling down smaller language, smaller large language models.

(24:08):

The syntax here is gonna get weird, you know, like there's still hundreds of millions of parameters, but you know, you're doing that for cost, you're doing it for privacy, and you're doing that for you know, kind of like latency efficiency around that. And that'll be a category of use cases. It's not all, it's not a you know, blank statement that you need to do that. And then there's the combo, right? Where you start with kind of like these large language models as, you know, kind of like, think of them as routers, like massive, like very generalized routers into tasks. And then they can, you know, go into smaller tasks and you have the combo book, right, where you're trying to fine-tune and build very, very specific applications off of kind of like the offshoots of those models versus you're using the large, like generalized language models you know, with your embedding in your fine-tuning.

(24:57):

So I think like that's kind of like the we're seeing a, you know, I think like the vast majority of the use cases that you've seen hit the news or like people are like, you know, the 400 different YC companies that have come out, like in this space are really in the kind of like, use the API in a, you know, creative way. and there's a ton of value there, like just a ton of value, but it is using, you know, kind of somebody else's model.

(25:21):

Models are gonna be more and more commoditized and that's a good thing, right? But, you know, we are very comfortable buying compute, buying network, buying storage. There's really no reason why we're not gonna be comfortable buying AI as a fourth element and building our stacks on top of that. So hopefully that kind of explains kind of like I said, like it's a little bit of both worlds and the complexity varies.

Krishna Gade (25:46):

Yeah. So it's like AI as a service, you know, some of it could be offered on the cloud behind an API, some of it you could, you know, build in-house and, you know, fine-tune it. So Amit, you know, one of the, you know, popular applications for LLMs is this aspect of question answering. And you know, OpenAI just launched a bunch of plug-ins where you could ask, you know, stock questions and you know, a whole bunch of different questions is able to answer them. And in the wheelhouse of analytics, how do you see the world of analytics, you know changing with LLMs, you know, what's the future for that?

Amit Prakash (26:19):

I think if you look at how people are using data today, kind of everybody maybe a decade ago got convinced that data is this new superpower and every company needs to be data driven. And so people got very good at producing data and collecting data, but if you look at how they're benefiting from data, there are very few sort of shining stories. And most people kind of all they know what to do with data is jam it into a dashboard and get there every morning or every week and look at, are we doing okay or not? And if you're not doing okay, then start a fire drill, right? The true power of data comes when people who have a mental model of how this process should be running are able to kind of investigate and find either things that are working really well and scale it up or things that are not working well and find all the opportunities and squash it.

(27:26):

And that requires an active dialogue with data. And if you look at the broader industry landscape, there are very few places where people are able to do that. And this is a thing that we've been trying to bring to the industry where, like, you don't have to be a data person. You don't have to be an analyst. You just have to be a curious person who has a large responsibility in terms of what you're doing for your business. And you should be, you should be data driven. You should be not just looking at sort of a KPI, you should be asking questions that like, why is this number going this way and not this way? What are the factors that could move it up? Or if everybody else is, you know, for my every dollar input, everybody else is producing $1.50, where are the places where it's only $1.10 and how can I push that? So that's the thing that I feel that maybe LLMs are that inflection, that that makes it so much easier that everyone finally feels comfortable asking those questions and gets trained to think in those ways, kind of apply the scientific method to business as opposed to, you know, just doing it based on what everybody else has been doing and how they were trained to do when they started the job.

Diego Oppenheimer (28:43):

If I can add just two seconds there cause I 100% agree with Amit. Like, I think one of the most exciting parts about this is that you actually like, you know, and I think this used to be the holy grill, maybe still is the holy grail BI, right? Like, which is like push as much to the subject matter expert, the data expert, the person who understands the business as much as possible, but there's this kind of like understand, like this gap understanding of like, how do I query the data? Like Amit, I know what, I know what the numbers mean, I know what the definition means, I know what the business rules are, but I don't know how to ask that. And suddenly this world has pushed that ability, right? It's kind of opened it up, right? Like where this subject matter experts of the business truly can query the data at the way that they think about the business. And that was, I mean, at least when I worked in BI, you know, many, many, many, many years ago, like that was the holy grail. Like that's what we were going for, right? To a certain degree.

Krishna Gade (29:38):

Awesome. Yeah.

(29:39):

Maybe let me take an audience question right now. So maybe Roie, would fine-tuning yield better result than using GPT-4 as a black box with embeddings for domain knowledge? So it's a question from Ram.

Roie Schwaber-Cohen (29:54):

Would fine-tuning yield better results that using GPT-4 as a black box with embeddings for domain knowledge? I think, I think it really depends on your ability to fine-tune. So like what you're starting off with, right? Like the model you're starting off with. I think that it would be easier to use embeddings rather than fine-tune because fine-tuning also requires that you have a really well labeled dataset and that you know what you're doing and you have the compute resources, et cetera. Whereas leveraging embeddings in the way that I've described is pretty straightforward. And you have like just as much control in terms of the dataset that you're using to then feed it into the context of GPT-4. That said, the biggest problem with LLMs in general is that they are still non-deterministic and it's still hard to kind of predict exactly how they're going to behave. I would caution that, you know, either way it doesn't really, you know, guarantee you 100% deterministic behavior and that you're gonna get the results that you're expecting. And at the end of the day, it's just like a cost benefit analysis, right? And I would say that it's easier to I think that it's easier and cheaper to approach these things by using embeddings rather than fine tuning. Cause again, like requires a lot of knowledge, know-how expertise and compute.

Krishna Gade (31:26):

So in this case, you are creating embeddings from your raw data and you're using say, OpenAI or some other large language model to create those embeddings.

Roie Schwaber-Cohen (31:36):

And you can create, you can use pretty much any, any kind of embedding model that you'd like just given that it produces embeddings in the same dimensions as the dimensions that you're going to then use down the road. And again, it really depends on your particular situation because you could, you, there are ways right, to completely decouple this layer of using GPT-4 and interacting with it and feeding it context with the way that you are creating that context to begin with, right? You don't necessarily have to use the same mechanisms, right, for both,

Krishna Gade (32:14):

Right. Right. So maybe switching gears. So Diego, I mean, we have been in this MLOps world where observability of models is so important. And so now how do you think about observability in the context of LLM or LLMOps? You know what needs to be observed? Is observability important? You know what's your take?

Diego Oppenheimer (32:34):

Yeah, so I mean, I think observability is just as important if not, and I didn't get paid to say this. So like it is just as important if not more important than ever was before, right? I think in at the core of it, I think, it comes down to kind of like the human interaction, what systems we're kind of building. And, you know, the best way that I have to describe it is we just enable whole world that is used to deterministic workflows, you know, interacting with programs that always behave in the same way, you know, interacting with like things that will always provide the same roles. We just opened up the probabilistic world to them. And a lot of people don't understand what that means or understand like what a 98% accuracy is, and you know, what that looks like.

(33:20):

And so there's going to be a big disillusion in terms of, you know, how do I, you know, what can I trust this? Oh, it gets these things. I mean, you know, you go online and like, you know, people love talking about how these things gets everything wrong. It's like, yeah, I mean, welcome to statistics, right? Like you know, like this is how this works. And so I think observability is going to provide a key, key element in the design of the workflows so that they can feel deterministic, right? So you know, I'll pick on, you know, one of probably like something Amit's probably working on, sorry my bad. Like, you know, like the business answer always needs to be the same, right? If you're, if you, if you ask a question, you answer it three different ways, like it's a big problem, right?

(34:06):

The trust just goes like way, way down. And so understanding what answers are happening and how you're actually forming the questions and what answer gets to like this, I consider that inside the observability world, right? Because what I'm trying to figure out is like, how do I make this workflow feel deterministic, even though it's probabilistic by nature? So that's kind of like where I see a lot of the energy in the monitoring space. This is outside of like, you know, obviously the companies that are building these large language models and training and like, they need to do the monitoring and like all the traditional things that you need to do in a in machine learning models. But I think there's a whole new layer to it, which is I need to make this workflow feel deterministic, even though it's probabilistic by nature. And that's, there's a lot of clever design that needs to go into that, and observability is a piece of it.

Krishna Gade (34:54):

Yeah. Makes a lot of sense. Can you mention, yeah, go ahead, Amit.

Amit Prakash (34:57):

Yeah. I'll just jump in and quickly say that. I think what you need here is not very different than what you and I worked on. My screen just went blank. I don't know if you can still see.

Krishna Gade (35:11):

We can still hear, yeah, we can still hear you. And see you

Amit Prakash (35:13):

What you and I worked on at Bing, for example, to monitor the relevance of the Bing search engine, or I'm sure you did something similar at Twitter or what I did at Google, right? We've been building these probabilistic systems that do non-deterministic things and for more than a decade. And even though the engine is the same, sort of, the nature of the output is not that different. So I think a lot of those mechanisms of both sort of offline benchmarking and online engagement score tracking is probably where this thing is going to go.

Krishna Gade (35:50):

Awesome. So maybe a general question to the panel is, you know, we touched upon this aspect of trust issue, right? You know, like when people, like when I speak to large enterprises, you know, banks, you know, healthcare companies, insurance companies, they get really excited about when they see, when they see OpenAI and ChatGPT and all these applications. You know, but then there are all these concerns around, you know, we just talked about you know, observability, we talked, you know, I think, you know, there are questions around like, fact, you know, are you producing facts versus not facts, right? And especially important in a business analytics case where you have to get the right answer. There is aspects around data privacy, you know, hosting, self-hosting versus using proprietary aspects around explainability, which is probably still unsolved. How do you, what's your general advice, you know, for enterprise thinking about LLMs, you know, what are some of the applications that they can try these LLMs today and, you know, where do they need to proceed cautiously? Maybe we'll start with Amit, you know, since you answered last.

Amit Prakash (36:51):

So I think the no-brainer cases are where there's a process of a draft and the final output, right? So in any workflow where somebody produces a draft in a labor intensive way, and then somebody else goes and reviews the output, the  draft portion can be replaced. That's why you seeing sort of so much check excitement around like the paralegal industry, right? Or like marketing copy prediction, production and things like that. So that's kind of the easy no-brainer. And then I think the ability to generate code of LLMs is also, I think, pretty powerful and people will realize the potential of it in many different domains not just in the domain that I'm working, which is sort of generates SQL. So you can essentially take intent and drive action from there. And this is where, depending on sort of how large of a domain you're going after, you have to be cautious. Like, if you're going after much larger domain, then the chances to make mistake is much higher if you're going in a well contained domain than chances. And then I think these things are essentially inching towards just having reasoning and understanding. So it'll get into every part of business sooner or later, but you have to be cautious and you have to know what you're doing before you do that.

Krishna Gade (38:19):

Anyone wants to follow up? Diego? Roie?

Diego Oppenheimer (38:22):

Yeah, I mean, I think like, I mean, one of the founders that I'm working with right now, like he had that great quote about it, and he was just like, everybody needs to just drop everything they're doing right now and start working with these models. Like and playing around with like ChatGPT and like Copilot. Like, it is the 10x, like the productivity that you get. Like the best way I can describe it for myself at least is like, I feel like I'm Neo in The Matrix, like the productivity that I've gotten out, the amount of knowledge I can consume, like the learning, like this is, it is clear that this is the future. I'm like I don't have any doubt in my mind in how we work with these workflows.

(39:04):

And yeah, there is, there's things like you need to understand like the hallucinations part and where you need where guardrails need to be built around these systems and how you're going to go implement this is all true, but like, you need this, drop everything you're doing and start working with this because like, it's like, it's every, like, it's amazing. Like, it's like the level of productivity and that's the way you're gonna learn also, right? At like, okay, what are the boundaries? How do I work with them? Like what it can do, what it can't do. But the, like, it's, yeah, I mean, I just don't see like, it's like I have these like large language models like running now, like constantly for like at pretty much every single part of my workflow in my day. And it's amazing. So I'm very bullish.

Krishna Gade (39:48):

Awesome. Yeah.

Roie Schwaber-Cohen (39:49):

I think that like, there's a lot of promise here, obviously and I think that Amit hit right on the head. Like there is going to be reasoning, right? There is going to be deeper understanding capability to kind of like explain what's happening. Just take, for example, what came out today like the integration with Wolfram Alpha, like you know, this whole movement started with the paper and 2017 "Attention Is All You Need," I think that the model now should be context is all you need. And that just means that like we regard these LLMs as, like I said, the, this natural language interface that sits on top of other capabilities that have the ability to provide the large language model with that extra layer of either resources reasonings and like deeper kind of understanding that then can be again, communicated to users in a natural way.

(40:48):

And that I think is like the most exciting thing is that we're now discovering that we don't have to have this like, clunky way, like with buttons and, you know, old UI to like, you know, interact with these very complex systems. We could just do it in a very natural way and we can get back really, really good results. So I'm also very bullish on this, but I think that like, there's a lot of work that still needs to go in but the path forward is pretty clear.

Krishna Gade (41:16):

Awesome. Well with that thank you so much for all our panelists for attending this session. It seems like we are embarking on this exciting journey of AI and LLMs and generative AI and it's also important that we need to, you know, do this right and, you know, do have all the guardrails. Thank you so much Amit, Diego, and Roie. Thank you for coming to the panel and thanks for the audience for all the questions that they have.

Roie Schwaber-Cohen (41:43):

Thanks for having us.