AI Forward - Can LLMs Be Explained?

Table of content

In this session, Joshua Rubin shares the importance of explainability in AI, emphasizing that understanding a model's behavior is crucial for identifying weaknesses, ensuring robustness, and aiding human decision-making. He also discusses the complexity of LLMs, their unique abstraction capabilities, and the challenges in achieving reliable self-explanation, underscoring the need for advanced techniques and tools to better comprehend and utilize these models effectively.

Key takeaways

The Importance of Explainability in AI: In the realm of AI and machine learning, the significance of explainability cannot be overstated. It is essential for making complex behaviors of machine learning models transparent and accessible to human understanding. This clarity is key for improving decision-making, identifying weaknesses in models, and ensuring robust, trustworthy AI applications.
Complexities of Language Models: LLMs represent a significant evolution in AI, distinguished by their advanced abstractions and capabilities, such as the prompting paradigm. These complexities present unique challenges, including the potential for models to produce unexpected or incorrect responses, underscoring the need for deeper understanding and specialized handling.
Addressing Explainability Challenges: The task of explaining the decision-making processes of LLMs presents notable challenges. Traditional methods of explainability often fall short in addressing the intricate nature of these models. However, emerging strategies and continued research are showing promise in enhancing the explainability and reliability of LLMs, paving the way for more intuitive and transparent AI systems.

AI Forward Summit - LLMs in the Enterprise presentation by Joshua Rubin, titled 'Can LLMs Be Explained?' highlighting the importance of explainability in AI and how understanding model behavior can aid decision-making.

‍

Speaker: Joshua Rubin - Director of Data Science, Fiddler AI

Video transcript

[00:00:00]

[00:00:04] Kirti Dewan: Thanks for joining us, everyone, uh, for Josh's talk as he explores the intriguing question on whether LLMs can be explained. Josh has pored over the latest and greatest research, dug into his PhD roots in physics to draw parallels between the world of physics and the world of LLMs, and combined it with his own deep knowledge of data science to help shed light on this provocative question.

[00:00:29] Josh, over to you.

[00:00:33] Josh Rubin: Awesome, thank you, Kirti. Uh, yeah, this should be fun. Um, I'm glad this is called Can LLMs Be Explained and not, uh, How to Explain LLMs. Uh, one of those questions is a lot easier than the other one. Um, so, uh, this is gonna be a little bit of a weird combination of literature review, maybe a little bit technical, some personal musings, uh, some, uh, sort of, um, uh, what do you call it?

[00:00:57] Uh. Some, some practical, practical approaches. So I hope there's something in here for everyone, or either everyone will be equally annoyed by how I did it. So, uh, in any event, jumping over to slides, I think you guys can see my screen. Somebody stop me if this doesn't work. Cool, cool. So, welcome. Can LLMs Be Explained.

[00:01:21] I'm Josh Rubin, Director of Data Science from Fiddler. Um, cool. So let's just start with why, why, why explain AI? Like, what's, what's the point of explainability at all? Um, explainability in machine learning aims to distill a model's behavior and present it in a format that's useful to human stakeholders.

[00:01:41] So, You know, uh, it's some combination of a human plausible thing with a sort of, um, uh, uh, a technically precise thing, right? Um, and, and it has to be useful. Uh, and a few benefits, uh, are, you know, it can help identify weaknesses in a model's understanding of the relationships represented in its inputs. So that could indicate vulnerabilities to generalization in novel scenarios.

[00:02:08] Robustness is important, and the way that we use models sometimes You know, we're looking for unusual cases. We're trying to use them to identify things that are out of the normal. And so we, it's important that they're robust, even in, uh, in strange quarter cases. It can help domain specialists identify important features driving a model's output.

[00:02:26] So, uh, this is sort of the human in the loop use case. This is when a human being is having their own skills augmented by working with the machine. So it can help a human operator be more efficient in their decision making, and less prone to overlooking, uh, important details. It's super helpful to know the why in model reasoning.

[00:02:45] Um, building confidence that a model makes decisions according to desirable reasoning rather than bias that's reflected in the historical record or some other malfunction. Um, there's, you know, I won't dive into bias too much, but you know, there's no shortage of examples of models making You know, inappropriate decisions just based on things that they've, uh, gleaned from, uh, you know, human behavior as recorded by the internet, um, and, and we'd like in most scenarios for them not to be, uh, exhibiting those biases in how they're used.

[00:03:20] Okay, so understatement of the entire slide deck is, uh, oh, here we've got poll results. Can LLMs be explained? And it looks like we mostly like, in some cases. Um, I'm glad no one is in, or, uh, I mean, there's a few of you in the not ever camp, and there may be some scenarios where not ever is. Is, uh, you know, plausible, but I, I think in some cases, or not quite today, I'm kind of with the rest of you folks on, uh, where we are in this distribution.

[00:03:46] I don't think we're at totally yes. Um, great. Okay, so here's this slide. Models are getting complex. Understatement of the entire deck, right? We all know this. Uh, the point I want to try to make here is that complexity is different. Um, so, uh, most traditional predictive machine learning can be thought of as function approximation at some level.

[00:04:09] You know, basically it's like trend lines in, uh, Excel, right? Like, you're interpolating and extrapolating around training examples, even if it's very multidimensional. Um, You know, ideally our models do pretty well at interpolations and often they do badly at extrapolations. Um, and you know, and that's most of the game, whether it's a regression or a ranking or a, um, a classification tasks, task.

[00:04:34] You know, there's a lot of flavors and it gets complicated, but it's sort of, it's sort of curve fitting, right? Um, so, so how is complexity different? Um, deep learning models are not just bigger ML models. Um, so the most efficient approach to optimizing their training objectives, we've discovered, and I'm including, this is all large deep learning models, this isn't, this isn't specifically a statement about LLMs per se, um, but the most efficient approach to optimizing their training objectives is to learn complex abstract, abstractions.

[00:05:05] So for LLMs, this is the prompting paradigm. Um, and, you know, we, we talk about this casually, but it, but it's a pretty profound thing. We've talked about things like in context learning, few shot, zero shot learning. It's, it's not really even learning in the ML sense, right? This is, uh, causing models to make, you know, uh, know what to do based on what you've described to them, and you're entirely working in this language abstraction that the model has learned through a traditional training process.

[00:05:33] Um, Pause for a minute. This is a really strange thing. Um, you know, we're not optimizing an objective when we've done this. We're, um, we're just telling the model what to do, and it's, it's part of what makes it feel magical, but it's pretty profound. Um, they exhibit this characteristic called hallucination, and it's a kind of an error.

[00:05:51] I, I don't know if it's a interpolation error or an extrapolation error exactly. Um, I don't think we know entirely what that is. Um, and so we kind of need to figure it out, because it's on everybody's mind who's trying to operationalize these things. Um, and then finally, our ML explainability tools, even the right ones here.

[00:06:10] This is, this is really a kind of profoundly different paradigm. Um, so, the shift to prompting in itself is an example of, of, of emergence, and emergence basically means a behavior that would be difficult to predict from the microscopic rules governing the system. You know, the, the linear algebra of weights and biases being multiplied together and, um, attention weights, um, you might not be able to anticipate that you would be able to describe a certain course of reasoning to a model via this prompting paradigm.

[00:06:41] from those microscopic rules. The function fitting is easier to understand, right? That's the traditional motif. And if you'll just, I used to be a physicist, so you have to bear, this is my one self indulgent slide on physics. Um, so this is a phenomenon that we're actually pretty familiar with in the physical sciences.

[00:06:56] So systems with many interacting components lead to emergent properties. Um, there's this phenomenology that emerges that isn't just naively the sum of the parts. And we need to study that and learn it in order to be able to do useful things with these systems. Um, so, you know, examples of this might be things like sand dunes or flocking birds.

[00:07:14] Um, you know, just knowing how friction works on a few grains of sand and the direction of the wind, in principle, is everything you should need to know from a kind of, uh, uh, you know, purely reductionist perspective, from the pure mechanistic perspective, how the system works. And yet... There's clearly structure that emerges out of the interaction of all those many, many particles.

[00:07:37] Um, I, I used to be a particle physicist, and I worked on proton structure a lot. Um, and the most interesting part of that problem was that the proton's properties, like its mass, uh, are not the sum of its parts, right? The quarks that make up a proton, um, if you naively add up their masses, you get about two percent of the mass of the proton.

[00:07:57] All the rest of that behavior is generated via emergent degrees of freedom that come out of a lot of components interacting in a, um, in a simple way, which is, which is pretty, pretty amazing. Um, it's also the reason why when you're sick you see a physician, uh, and not a physicist. One wants to see your organs, the other wants to see your atoms.

[00:08:19] You wouldn't be able to predict, uh, you know, or, uh, diagnose or treat a stomachache based on the atom perspective, even though fundamentally the organs of the human body are just made of, made of those atoms. So, anyway, uh, I'm, I'm done with the physical analogy, but, but I think that's the kind of, um, that's the kind of chasm we're crossing here into large deep learning models, um, especially these, like, enormous.

[00:08:45] Um, you know, foundation, uh, LLMs, is that there's something really new that's happening here. Uh, so here's a slide on impressive phenomenology. So um, you know, results from May from, uh, from OpenAI, and there they, they try to summarize, use a GPT model to summarize the activations of neurons, uh, of a smaller, of a smaller transformer model.

[00:09:12] Uh, so they feed it lots of examples that excite particular neurons in the system, and then they ask GPT 4 to give an explanation. So, you know, this particular neuron is stimulated by references to movies, characters, entertainment. You can then go back and pick a particular word in a, um, you know, in an output and, and understand a lot of things, or try to understand a lot of things about, uh, you know, what, uh, the model is thinking by what's stimulated in it, uh, when it's, what's doing, uh.

[00:09:42] It's, it's, uh, it's prediction. Um, we had this amazing recent paper, October, uh, from, from Anthropic, um, where they focus on the fact that neurons actually seem to be polysemantic. So, there are actually many different levels of human meaning that are layered into even individual structural components in the model.

[00:10:02] So a particular neuron may participate in, like, dozens of different individual concepts. Um, they use this sparse autoencoder to try to learn the... Uh, the concept basis the model's working on. They produce these amazing, um, maps of, um, you know, the, the different concepts and how they relate in terms of their neighbor relationships.

[00:10:22] Um, they also let you drill into specific content, uh, specific concepts and look at You know, examples from them. So, and it's this beautiful, like, interactive thing. These are both, both papers are, uh, that include interactives and should be, should be played with. They're, you know, very impressive. Um, but what do I mean to say here?

[00:10:40] So, so in, in both of these cases, These very formidable research teams are applying these techniques to very simple transformer models, like, uh, you know, sort of like the simplest transformer or some of the simplest transformer models, at least in the entropic case that you can, you can cook up. Um, so, it's fantastically interesting from an academic perspective.

[00:11:03] Um, but we're all building applications out of GPT 3. 5 and 4 and, uh, things coming in the future. Um, and I think both of these teams have conceded that it's really hard to apply this sort of reasoning in the context of, um, you know, full size production grade LLMs. So our practice is way ahead of our understanding right now.

[00:11:22] Um, we need to keep gathering this information so we can understand better. Um, but I think this isn't a tractable explainability technique, um, for practical purposes right now. Um, so there was a nice review article recently by Zhou et al., uh, came out in September. They give this, uh, this particular hierarchy of talking about explainability techniques.

[00:11:42] Um, there's a bunch that are sort of the fine, what they call the fine tuning paradigm. That's my sort of mac microscopic, um, you know, sort of the traditional ML approaches. Things like, um, attributions, Shapley, integrated gradients, um, even, um, you know, looking at attention vectors and things like that. The world we've entered, models are enormous, um, they're often proprietary, they have dozens of attention layers and heads, um, and.

[00:12:11] That kind of microscopic reasoning may not describe the emergent reasoning that's happening in these systems. Um, I think there are still some great corner cases where some of these tools are useful. Um, but I'm going to focus for the rest of the talk on the prompting paradigm, which is sort of the emergent, the emergent paradigm.

[00:12:27] So, uh, you know, let's study the phenomenology of model behavior. The rules in this domain, um, I'll talk about model confidence, uh, in, for, for understanding hallucinations, higher order attribution, and then I'll dive into, like, a, a sort of a thing about self explanation. Um, so, uh, here we go. This is going to be, like, the total other end of the spectrum from the anthropic OpenAI research.

[00:12:50] This is two practical explainability strategies, um, that we're playing with now in Fiddler. Here's one that's described in, um, This, this recent paper by, by Xiong, uh, from June of this year, this has been an amazing year, um, where they look at, um, consistency based confidence. And so basically they define self consistency confidence, like how much does a model's output vary if you just reword the same prompt, so same meaning on the input side.

[00:13:17] How much variability is there in the output? And can you use that to measure whether the model is rooted in some, um, solid knowledge, or whether it's just making stuff up? So the hypothesis is that if it's making stuff up, um, then there's going to be a lot more variability in its output. Um, they also talk about induced consistency confidence.

[00:13:35] That's when you, uh, try to give the model a misleading hint in a prompt. Um, those two confi uh, consistency based techniques are certainly better than verbalized confidence and seem to work well for identifying places where the model's just making stuff up. And in fact, uh, You may have attended Amal Ayer on MIT in his workshop a little bit earlier on Fiddler Auditor.

[00:13:57] If you haven't, uh, I direct you to the, you know, take a look at the recordings of the, of that particular session. So we've developed this open source tool that can do a variety of different kinds of evaluations, but one of them is looking at stability of model responses under lots of permutations of the input.

[00:14:12] So, you know, here's a prompt, uh, you know, for a multiple choice question, um, which of the following treatments has not demonstrated efficacy in BED? Um, and Auditor runs through a whole bunch of perturbations using another model, um, of the input. So it just rewords them, and it looks at the changes in the model's responses, and then does a cosine similarity measurement to look at the dispersion between those outputs to try to understand whether the model's reasoning is actually stable for that question.

[00:14:41] Um, so there's some really interesting questions about, like, characterizing your model. for different, you know, sort of sub genres of content and understanding where it's most reliable, and that's what Auditor aims to, to solve. Um, you can also do attribution like explainability in RAG settings, so, uh, check out, um, Mirtuza Chagudwala, also a team member of mine, um, At 2.

[00:15:05] 15, he'll be doing a workshop on how we built the Fiddler chatbot, our documentation chatbot. So I think we all know about RAG. This is Retrieval Augmented Generation. When you ask a model to answer questions using documents retrieved from some more conventional document retrieval process, um, and what Modo's been playing with is, you know, trying to measure which of the, the source documents have the most impact on, um, the model's response.

[00:15:30] There's some reference generation, and then you can play this game where you regenerate using a lot of different variants of the references you've provided, um, and try to look at what causes the largest deviations from, uh, the reference generation. Great. And now, so here comes my, uh, my sort of, uh, the long, the long story on, on self explanation.

[00:15:54] Um, I've really, like, varied a lot on my, my opinions of model self explanation. I'm gonna start with something from a blog post. Feel free to check it out. I think it's actually pretty interesting. Um, I played this game of 20 questions with ChatGPT, um, and I made it guess my clue, and so that was sort of interesting.

[00:16:11] It did a good job of figuring out what I was thinking. Um, but, but, uh... I got, I, I decided to switch the tables and ask ChatGPT, this is GPT 3. 5 Turbo, to let me guess clues that it's thinking of. And so here, here's the dialogue. Me, you know, I would like you to think of an object, uh, and what, uh, I'll ask you questions to try to figure out what it is, and it says, sure, I'm ready.

[00:16:34] I say, does it have moving parts? And it says, yes, it does have moving parts. And I say, does it use electricity? And some versions use electricity, but not all of them. And I ask, is it used for calculations? Yes, it can be used for calculations. And then I determined it's a computer. So, it's like the most basic game of 20 questions you could play.

[00:16:50] Um, but I got curious whether, um, the model had any clue what its clue was before the last term. Um, and so I... Reduced the, uh, the model's temperature parameter to zero to get it to be deterministic. I'm probably the only person who, uh, who my favorite part of the, uh, the OpenAI Dev Day keynote this week was, uh, you know, when Sam Altman announced that you'll be able to give a random seed to the, the model and reproduce any generation that you want because this was a real headache.

[00:17:21] Um, so this allows me, by reducing, in this case, reducing the temperature to zero, I can reproduce any dialogue exactly. And I can branch the dialogue in different ways. So the dialogue you just saw is the leftmost branch here. And then I try interrupting it three, two different times before that at prior questions to ask if it's thinking of a computer.

[00:17:40] Um, it wasn't thinking of a computer until the last step. So, you know, it only decided the last term that a computer was the answer. Um, the things I would point out here is it feels like a human, but don't presume that it thinks like one. Um, it had no consistent plan. It's stateless and just predicting the next token.

[00:17:56] Like, we all know that, but we've trained these things to create the illusion that it thinks like a human. Um, would you expect it to be able to introspect reliably and self explain? That would have been a really good, uh, uh, poll question here, but, but I didn't think to do that. Um, I think, I think that's an important question.

[00:18:12] Uh, if it doesn't have state, if it, can it actually introspect? Um, and in fact, if you look at the sparks of general, of artificial general intelligence paper that came out of Microsoft Research along with GPT 4, Um, you know, they basically do an experiment and, uh, they, they do a set of experiments, but they, they define these two things.

[00:18:31] Output consistency, does a model's self explanation plausibly describe how it arrived at a result? And then maybe more importantly is process consistency. Does the model's reasoning about a particular example generalize, um, to describe against output? For other analogous scenarios. So in their words, it's often what humans expect or desire from explanations, especially when they want to understand, debug, or assess trust in a system.

[00:18:57] And I'm going to, just because I'm running short on time, I'll be kind of quick here, but basically they've asked the model to translate into Portuguese the phrase, the doctor is here. The model comes back using the male form of the doctor is here, and it explains that as being one of the professions for which, um, the male is the default.

[00:19:17] And maybe that's how the world works, I don't know, but it's something the models learned. And then they ask it for some, um, professions where it would use the female, uh, uh, the feminine article. And so it says nurse, teacher, secretary, and actress. So the researchers begin a new session, they ask it to translate the teachers here, and it again comes up with the male version, and it provides a different explanation there.

[00:19:37] So that's a failure of process consistency, right? It's not reasoning session to session in consistent ways. Um, and so that, that indicates a problem. Um, you know, in January of 22, we had, uh, Wei et al., who, you know, on the, on the kind of reassuring end, We have learned that there are things you can do with, um, chain of thought reasoning by giving explicit steps to the model that very much help rein in its ability to do complex reasoning tasks.

[00:20:05] You know, so maybe there's some help there in terms of, um, model understanding if we can, uh, You know, give it, we can part out to simpler sub pieces what it's supposed to do. Um, in May of that year, like last year, Kojima et al showed that, this is a cool name for the paper, large language models are zero shock reasoners, right?

[00:20:24] They can make their own plans and we can create two stage generation. Um, and, and, and, and, Uh, allow the model, they introduce this prompt, let's think step by step, which you may have heard before. That seems to work really well at letting the model generate its own prompts. Um, and they do this kind of two stage invocation of the model.

[00:20:41] It produces explanations that are more robust to changes of the input. So that's, that's reassuring in terms of model reasoning. Um, but then, uh, we have, uh, Turpin et al. from, uh, 23, from May. Um, that finds some pretty, uh, damning ways in which... The explanations, even with this kind of chain of thought reasoning, may still be unreliable.

[00:21:03] So they deliberately bias prompts, um, and, uh, rather than the model explaining the bias that they've introduced in the prompts, um, the model seems to provide an answer, which is sometimes incorrect, and, uh, and the explanation it provides... Uh, seems, uh, more aimed to justify the decision it's made than, than to disclose the bias it was given.

[00:21:30] So it's not giving us the full picture. Okay, so I'm, I know I'm moving fast through this. I'm almost, almost to the end of what I wanted to say. Um, but still there's other promising directions, so we're kind of going back and forth, like we're feeling this thing out, we're learning in real time. Um, here's, uh, Mukherjee et al.,

[00:21:48] uh, this is from, uh, June of this year, they've created Orca, this is a 13 billion parameter small model. Um, that learns from GPT 4's reasoning process. They use the think step by step style instructions to get GPT 4 to do, um, uh, to provide explanations with its answers. And instead of just training on questions and answers, they train on question answer explanations.

[00:22:12] Um, and, uh, it helps Orca to significantly outperform similarly sized models. Um, by learning the reasoning from a much larger model. Um, so here is an idea of my own that was sort of inspired by the last two, uh, things I mentioned. Sort of one, the, um, the unreliable explanations from Turpin and then the Orca example.

[00:22:35] Um, we know that we can use, um, you know, uh, Human feedback to fine tune models to make them behave very differently and align themselves very well with what we want. Can we deliberately in build, um, alignment or fine tuning to enhance faithful self explanations and process consistency and bias disclosure?

[00:22:57] Um, I think between the two previous papers, there may be enough ideas in order to create training sets where we can, um, encourage model in, in one of these training processes, um, come up with a set of good examples. And maybe enhance their, uh, ability to self explain. Um, even if it's not introspection formally, like humans think that humans do, uh, we can certainly change their behavior in ways that make them behave more like humans.

[00:23:26] We've proven that before. So I think there's a, I think there's a new, a new direction there. Um, and then just, uh, down to conclusions. Um, LLMs are sufficiently complex and they work in powerful abstractions. We need to learn phenomenology. This is going on right now. Um, we're getting better at it. We're still learning, as with physical systems, that phenomenology is important.

[00:23:46] Self explanation doesn't generally appear to be a safe technique. Uh, use it cautiously and in task appropriate ways. Um, it may be, I would love to know if, uh, you know, aligning or fine tuning for faithful self explanation is possible, because I think there's a potential there. Um, we already have practical techniques that we can use, consistency for model confidence.

[00:24:09] Perturbation of inputs to RAG. Um, and then finally, you know, this is all shown that, uh, you know, comprehensive observability is more critical than ever when the explainability tools are weak. So, um, you know, use feedback from humans or from other models to understand how your model performs, uh, how well it performs to, to, and, and, and where it fails.

[00:24:30] You know, quantify that. Do evals on your model on different data sets. Um, so I think that's a really important theme is just to be quantitative about understanding where your model performs well. And that's it. That's all I wanted to say. So, uh, thanks everybody. And I, it looks like Kirti's queued up to, to sling a question.

[00:24:47] Kirti Dewan: Yeah. Some good questions in here, Josh. And thanks. That was a great presentation. So the first question is, do you think humans are humanizing such models more than what they actually are? In other words, do LLMs have the same mental state like we do?

[00:25:02] Josh Rubin: I don't think so. I would, I would direct you to that blog post, uh, the What's ChatGPT Thinking.

[00:25:08] Go Google search it, um, or find it on the Fiddler blog. Um, you know, we know that they're stateless, right? They, they, they, every forward pass is predicting one token. Uh, that might be part of a word, right? It might be a letter. Um, and everything that is their state is in their, uh, is in the chat transcript of what was asked of them and how they've responded up until that token.

[00:25:31] Um, so, you know, so,

[00:25:37] you know, we sort of coerce them into behaving like humans because we want to interact with human like things. Um, but, but I don't think, and so, so there's a danger there, right? I mean, we're sort of empathetic, um, you know, uh, uh, You know, sort of pack, pack animals that have all learned to, um, uh, model the behavior of other humans that we interact with.

[00:26:02] I don't think that as, you know, this is maybe a little bit, a little bit philosophical, but I don't think that as a species we're well prepared for something that acts like a human but, but isn't a human. I don't think humans have encountered that before. And so, uh, I think, I think caution is advised here.

[00:26:17] Kirti Dewan: Great. Um, so we can do two more questions and we have about two and a half minutes, so let's try to get through them. So, is explainability the same as chain of thought reasoning? If not, what is the difference?

[00:26:32] Josh Rubin: Explainability can take a lot of different forms, even in traditional ML. I mean, in some, some, uh, so, so it's, you know, I, I think the general umbrella goes back to that first statement about, you know, explainability, uh, you know, trying to make model reasoning, uh, The factors that are affecting the model's internal logic accessible to human stakeholders regardless, you know, and, and that whole concept may vary depending on who the stakeholder is.

[00:27:00] Maybe it's a model developer who's very technical. Maybe it's, you know, a person at home who's just received a loan rejection or their insurance company has said, uh, we're not going to pay for this procedure that you just had according to this, you know, some rubric of, uh, some, you know, heuristic that a model used.

[00:27:18] Um, so, you know, it, it both ha again, has to be faithful to the model's logic and to the human's logic. And so I think there are some kinds of chain of thought reasoning. Like I think if you ask if, if you, if you could create a model whose, um, self explanation was actually faithful to its process, um, I think that's a kind of explainability that's useful.

[00:27:41] And I, and I think part of the theme that I was trying to get across is I do believe that the kind of prompt conversation space with the model is really the first a, a first class. You know, it's not a, um, it's not a hack, right? This prompt paradigm is really, it's this emergent paradigm that's really different than the underlying ML.

[00:28:01] And so we have to get used to that, and we have to understand how to work with that. I don't know if that totally answered the question, but it was a good question, for sure.

[00:28:09] Kirti Dewan: Um, so, in about 45 seconds, if you can, Josh, do you think reinforcement learning based LLMs can maintain a notion of state?

[00:28:24] Josh Rubin: Um, I would again direct the, the asker of this question to the, that blog that I wrote. I try to reflect a little bit on what might be happening. I, I don't think it's madness to think that in the process of training a very sophisticated LL, I think, I think, I think at some point part of this paradigm shift is all bets are off.

[00:28:46] on abstractly what the model's learning, and even how it persists information, how it regards its, um, the context that it's passing and generating the next token. So, so I don't think it's total madness. To think that, um, there is something that the model could learn to regard, you know, its prior context as some sort of state, but, you know, you might be that, I, I don't know that I am the person who has the best answer to that, because I, you know, there are certainly people who are very focused on the, on the training process.

[00:29:19] I think it's actually a pretty good question.

[00:29:21] Kirti Dewan: Josh, I do need to interrupt because we should start the next session.

[00:29:25] Josh Rubin: Oh, yes.

[00:29:25] Kirti Dewan: Um, so, sorry about that, but, uh, that was a great session. We have someone from the audience who has even said, great work in summarizing a complex subject, so I definitely plus one that.

[00:29:38] Thank you.

‍