Season 1 | Episode 6

Managing the Risks of Generative AI with Kathy Baxter

‍

On this episode, we’re joined by Kathy Baxter, Principal Architect of Responsible AI & Tech at Salesforce.

Generative AI has become widely popular with organizations finding ways to drive innovation and business growth. The adoption of generative AI, however, remains low due to ethical implications and unintended consequences that negatively impact the organization and its consumers.

Baxter will discuss ethical AI practices organizations can follow to minimize potential harms and maximize the social benefits of AI.

About the guest

Kathy develops research-informed best practice to educate Salesforce employees, customers, and the industry on the development of responsible AI. She is a member of Singapore’s Advisory Council on the Ethical Use of AI and Data, Visiting AI Fellow at NIST, and on the Board of EqualAI. Prior to Salesforce, she worked at Google, eBay, and Oracle in User Experience Research. She received her MS in Engineering Psychology and BS in Applied Psychology from the Georgia Institute of Technology.

Transcript

Joshua Rubin: Welcome and thank you for joining us today on, uh, AI Explained, on Managing the Risks of Generative AI. Uh, I'm Josh Rubin, Principal AI Scientist at Fiddler AI. Uh, I'll be your host today. I've been working in the responsible AI space for about, about five years now, building tools and algorithms to help companies instrument, uh, their, their AI applications.

Joshua Rubin: We have a very special guest today on AI Explained, and that's Kathy Baxter, Principal Architect of, uh, Ethical AI at Salesforce.

Joshua Rubin: Uh, welcome, Kathy. Uh, would you like to give a little self introduction?

Kathy Baxter: Hi, thank you so much for having me today. Uh, yeah, I'm Principal Architect of, uh, Responsible AI and Tech at, uh, Salesforce. Uh, been at Salesforce since 2015. Uh, our team is part of the larger Office of Ethical and Humane Use. Uh, and in addition to my work with Salesforce, I am also a visiting AI fellow with nist.

Kathy Baxter: Uh, I'm on the board of equal AI and nonprofit, uh, and also, uh, on the advisory, uh, council for Singapore's, uh, ethical use of AI and data in their nonprofit AI Verify foundation. So like to do lots of things with governments and nonprofits outside of my, uh, work with Salesforce.

Joshua Rubin: Very, very cool. Um, so, I don't know, I kind of thought to open this up, we might just talk about what, like, Uh, ethical AI or responsible AI means.

Joshua Rubin: Maybe, I don't know if you want to talk a little bit about some of the harms that you think about, some of the risks that organizations deal with, just at kind of a general level.

Kathy Baxter: Yeah, absolutely. The, I think of generative AI, the broad availability of ChatGPT really brought onto the screen. Uh, or, uh, into everybody's consciousness, um, the, the potential risks of AI.

Kathy Baxter: These are not risks that are unique, they're not brand new, they've existed in the world of predictive AI, but it's kind of amplified with generative AI. Uh, and so it's one of the concerns is accuracy, particularly in a B2B scenario, you know, Salesforce like Fiddler, we're B2B. We have enterprises that demand that whatever it is that we create and offer, it's got to be accurate.

Kathy Baxter: Uh, nobody wants, uh, their, their web engine to hallucinate answers. Um, but when it's in a B2B scenario, it's particularly concerning. So you have to make sure that. The AI is giving you accurate answers, that they're safe, uh, that you, they've been assessed for bias and toxicity. It can never be bias free or toxic free, but you've gotta do, uh, a lot of work to make it as safe as possible.

Kathy Baxter: And it needs to be reliable, robust to security, uh, and, uh, to security violations, um, and sustainable. Um, we are in massive, massive, uh, climate change, and although there's a lot of potential for AI to potentially help us solve some problems, um, we are, we are burning through a lot of carbon and a lot of, Water, every time we train a model or generate, uh, new content.

Kathy Baxter: So, to be responsible, we have to make sure that our, the AI that we are building or implementing is accurate, safe, uh, transparent, empowering of humans, not automating everybody out of jobs, and, and sustainable.

Joshua Rubin: Yeah, that's great. I, you know, I, we, we try to maintain a sort of, uh, a list of, uh, incidents of one kind or another that you might hear about in the news, uh, that, uh, you know, of various organizations having problems with responsible AI, you know, from various things like lack of observability or whatever.

Joshua Rubin: Just carelessness or, uh, you know, insufficient controls around their models, you know, I, I don't know that we have any examples that we, we, the environmental impact I think is a really interesting dimension that I think, uh, is becoming increasingly important, obviously, with data centers full of generative AI models, um, but, you know, I think we think a lot about, you know, like, obviously there's, you know, things that companies are concerned about, reputational risk, you know, actual harm to individuals, Um, you know, revenue left on the table because they didn't realize that, um, the world changed in such a way that caused a model to wander into some region where it was making inefficient predictions, where, you know, somewhere where it might be extrapolating, or somewhere where its training data was, was insufficient, or maybe there's just some concept drift in the world we weren't prepared for.

Joshua Rubin: There's just lots of different categories, and I think in some ways, you know, those are maybe the things that companies care about, but that's, doesn't even bring the kind of ethics question of, you know, are these tools serving all of their human stakeholders in the most appropriate ways? Um, yeah, so, so I don't know, I think one, one thing that you're, uh, You know, one of your main focuses is, you know, how do we create this, like, you know, sort of, uh, conversation between companies and government agencies and civil society, you know, in such a way to get all the stakeholders to a table.

Joshua Rubin: Do you want to talk a little bit about some of your efforts with Um, you know, uh, trying to make policy, trying to come up with best practices around this and why that's important.

Kathy Baxter: Absolutely. Um, I'm, I'm incredibly proud of the, the work that I've been fortunate to do with NIST on the AI risk management framework.

Kathy Baxter: Uh, I can't stress enough the importance of Public private collaboration, uh, as, there's, there's been a lot of criticism over time of the government not understanding this technology, but we're a long way from, you know, testimony where, where somebody asks, how does a social media company monetize, uh, monetize youth, uh, or monetize, um, their, their views Um, and so, we have to think about how do we engage as partners.

Kathy Baxter: It's impossible for any government agency to be able to know how each company is building this technology because It's kept secret. Um, it is in, uh, it is not something that a lot of companies are going to make that publicly available. And so trying to keep up with recommendations, guidelines, much less regulations, as this technology is emerging so quickly, uh, it's, it's virtually impossible.

Kathy Baxter: So you need that public private collaboration to be able to talk about. How we can ensure that this technology really is beneficial for everyone. How we can create guidance. Because right now, we don't quite know what, uh, what it means to be safe in the worlds of medicine or food. There are thresholds for how much of a certain heavy metal or chemical is allowed before it becomes toxic to a human. But we don't know how much bias or how much toxicity could be in a training dataset or a model that above that it becomes too problematic, and more work needs to be done to fix that. So we really need to be able to come together.

Kathy Baxter: One of the things that I'm excited about is to see the, the USAI Safety Institute, um, and the work that, uh, that they will be doing. They're going to be bringing together Private industry, academia, non profit, as well as government, uh, to have different working groups to collaborate on these different questions.

Kathy Baxter: And so that's going to be really exciting to see.

Joshua Rubin: Yeah. Um, you know, I, I think, uh, you know, from a kind of the technology point of view, I, I, you know, one, one challenge about this stuff from my perspective is that Um, you know, when you are dealing with a kind of predictive modeling, discriminative modeling world, um, it's a little easier to measure things.

Joshua Rubin: You tend to have, uh, labels that are closely associated with, you know, some sort of ground truth information. Um, or you can, you can generate those labels. You can go ask humans to Um, assess the performance of the model in a way where it makes it a little easier to measure, um, you know, that the model is behaving in a way that it was intended to behave, you know, and it's, and it doesn't.

Joshua Rubin: Uh, choose to bias in ways that aren't helpful to its objective. Um, I think one thing that we're seeing from the kind of technology perspective is this kind of convergence around, uh, you know, evaluations and suites of evaluation tools on generative models. So, you know, uh, collecting, you know, you know, these things like, uh, like Truthful QA or collecting these, you know, you go on to like Hugging Face and there's these like leaderboards now of, you know, various, um, kinds of things.

Joshua Rubin: You know, uh, carefully curated eval datasets with objectives for generative models, um, that at least give you some sort of benchmark for comparing one model against another. I don't know if, you know, it's, it's, we know yet, kind of, from that, like that, like your F, FDA analogy, um, you know, where the threshold is of, of harm, right?

Joshua Rubin: Um, you know, and, and for that matter, like, You know, I mean, a lot of these models, I mean, a function of machine learning is, is in some abstract sense to bias, right? It's to, to have, it's to be opinionated. Um, I mean, not, not in the, in the, in the negative connotation, but, um, you know, clearly there's a point where it's behaving in a way that is not appropriate for, um, you know, by, by human standards, and it's clear that the models need to, uh, reflect that.

Joshua Rubin: So, so I think there are good efforts. The point was, I think there's good, good efforts in terms of building eval datasets and test benches. Um, for, uh, at least starting to measure these things and being quantitative. Um, you know, I want to, to go back to what you were talking about. I mean, one of the things that I, you know, you hear about The EU or the, you know, the, um, uh, the White House sort of producing guidance on, um, you know, uh, how we're going to, uh, make sure that we've got a close handle on this rapidly evolving world.

Joshua Rubin: And, you know, I, I have to admit that sometimes I kind of roll my eyes, right? It's sort of, uh, It seems like part of what makes tech fun, what makes AI fun, is that you don't know who's going to invent something surprising and new and dramatically different tomorrow morning, right? Um, so, so I guess I'd kind of like to hear from you about, sort of, how you think about developing best practices that You know, are sort of, um, sufficiently specific, you know, that, uh, that, that have enough teeth but aren't overly rigid, um, that they actually can apply for future use cases that we haven't thought of yet.

Kathy Baxter: Yeah, that's, that's a great question. One of the things that I really like about the, the NIST AI RMF, and, and I'll be the first to admit that I'm biased because, uh, you know, I helped, helped contribute to it, but one of the things that I really like about it is that it is about process. It doesn't draw lines and say anything on this side is bad, anything on this side is good.

Kathy Baxter: It's all about the process and how you should be building this soup to nuts, um, from concept all the way to launch and then post launch coming back again. And that really is critical because you can't, if you happen to have good, harmless outcomes. from bad process, that's nothing but sheer luck. You can have good, amazing process, and you may still have harmful outcomes, but it's a lot less likely.

Kathy Baxter: And so having a, a process that everyone follows, and then being transparent. about it is really important for trust. You need to know that, uh, how are you collecting data that trains your model? How are you evaluating that model? What are the safeguards that you have in place to look for things like bias and toxicity?

Kathy Baxter: How do you, how are you testing for Data exfiltration. How are you testing for data leakage and then, and then blocking it? All of those are really tough questions. There is no solid right answer. Everybody must do X, but at least by asking everybody to, uh, you know, do certain processes and then communicate what you're doing, communicate what you're finding, it will drastically increase trust.

Kathy Baxter: And then hide, hide. Rises all boats by a, by being able to learn from each other. We all get better in our practice. We, we shouldn't treat responsible AI as, uh, as a market differentiator. We, all of us should want the products that we use, regardless of whether it's a company that we work at or not. If it's a product we use, we should always want it to be safe and responsible for everyone.

Joshua Rubin: Yeah, so, to kind of follow on there, you know, one thing, so, uh, when we were chatting the other day, we, uh, one topic that came up that I thought was interesting was about, you know, organizations and how they're structured to, you know, start to adapt to kind of, um, policies and best practices. Some of these things that you've described, you You know, in terms of having, following these robust procedures and, you know, I was, you know, one thing that I've seen at times is that, um, you know, you'll have a part of an organization that's responsible for, uh, sort of, uh, governance compliance, right?

Joshua Rubin: And responsible AI becomes part of that and it becomes a sort of adversarial relationship with model developers and, uh, You know, product developers who, uh, sometimes feel like, you know, there's some other team that's hoisting a bunch of extra steps, uh, that may be, uh, totally separate from, you know, their OKRs or their, you know, sort of immediate objectives.

Joshua Rubin: Um, do you have any feelings about how, you know, organizations can start to, you know, internalize some of these rules in a little bit more of a Um, integrated fashion like it.

Kathy Baxter: Yeah, it, uh, we really need to have the concept of an AI safety culture. I recently, uh, published a, a brief, um, blog post about this.

Kathy Baxter: Uh, in regulated industries, you're more likely to see A safety culture. Um, Patrick Hudson is an internationally known, um, uh, safety expert and he published this safety ladder or maturity model for organizations, um, which is very similar to the ethical AI maturity model that I had published a few years back, but he identified five aspects of a safety culture that he identified.

Kathy Baxter: I believe are also very relevant to an ethical AI culture, um, with leadership being the first and foremost, uh, the most important aspect. But there also needs to be a, a level of in account accountability. And so throughout an entire organization, You may have a dozen, hundreds of people that are each responsible for different elements of, uh, safety of a system.

Kathy Baxter: It could be building an airplane, it could be drug testing and manufacturing, or it could be building a large language model. And You end up with this, when it comes to AI in particular, you end up with this responsibility gap when something goes wrong. Who is to blame? There are probably many people that could have some part in that system.

Kathy Baxter: And so you end up with not being able to hold anyone accountable when something happens. In the airline industry, Uh, Dr. Missy Cummings, she's a former fighter jet pilot, she's worked with, uh, the Department of Transportation. She's talked about the need for, uh, uh, chief AI test pilot. Like they have in, um, the airline industry.

Kathy Baxter: All of these people are working together to make sure that, uh, a plane is safe and has, has, all of this work has been done, but at the end of the day, there's one person that takes that plane out for a flight and then signs to attest that this plane is safe. Thanks. Flightworthy. And so she advocates that particularly for AI systems with safety implications like self driving cars, that there should be a chief AI test pilot.

Kathy Baxter: So one person that, that does that final test and, and, um, signs off to a test that this model, this system, this AI app. is, um, is trustworthy. It's, it's safe for use. So having that end to end safety culture and accountability is incredibly important.

Joshua Rubin: That's interesting. I mean, one, one thing, so we work a fair amount with financial services companies, uh, large banks, um, you know, and, you know, in those heavily regulated industries, there, there is already some guidance, right?

Joshua Rubin: You have things like, um, like SR 11-7 from, uh, the Fed. Which, you know, I think in a lot of scenarios feels burdensome in those organizations, um, you know, but at the same time, uh, you know, to kind of go back a little bit to, uh, You know, providing kind of the right level of guidance, you know, kind of lays down a framework that's, uh, I'm going to say specific, but in a general sense, it asks questions like, you know, for this particular model application, first of all, like, you know, what are the, uh, you know, what are the, what are the risks involved in this model?

Joshua Rubin: Is this doing something totally kind of trivial, like, uh, categorizing emails, or is this doing something that. Uh, with all sorts of complicated dynamics involving, uh, you know, that can, that can, you know, physically harm humans or, or at least, um, affect their financial futures in ways that are interwoven with, um, you know, uh, demographics in, in, in awkward ways.

Joshua Rubin: So, you know, it asks, you know, the Model developers to put forth, um, you know, procedures for identifying, uh, you know, what are some changes in the world that could affect your model, right? Like what, how, how could the, how can the, the, the economic climate change, how might that affect, uh, the performance of your model?

Joshua Rubin: How are you going to measure that? Right? Like, so there's sort of like general categories of, you know, here are some things you should think about. Um, define a procedure by which you're going to track, you know, change in the world, change in models performance, um, you know, uh, shifts in the way that it's cutting across demographics in ways that might, you know, uh, amplify, uh, pre existing bias in the, in the, in the record.

Joshua Rubin: Um, do you think differently about, um, AI tools that are designed to be used within organizations versus those that are sort of consumer facing?

Kathy Baxter: With, uh, in enterprise or within a, within a company, I think you can have a lot more transparency, um, a lot more understanding about what's actually happening. I think a lot of, uh, uh, companies are very concerned about showing too much to consumers, either because It's difficult to understand.

Kathy Baxter: There's the whole difference between explainability and interpretability. I can tell you what's happening under the hood for this AI model, but it means absolutely nothing, versus interpretability of being able to understand, um, you know, if you are of a certain gender or race, you're less likely to be recommended for this job.

Kathy Baxter: You don't have to understand what's happening under the hood. But understanding the relationships, the cause and effect, uh, kind of analysis is, is very important. Um, so how do you communicate what's working, what's happening, um, uh, the consequences, uh, that can also have real legal implications as well. So companies, um, uh, are much more likely to have behind the scenes.

Kathy Baxter: A lot more model quality assessment and monitoring that that's just for their eyes only. If you're in a regulated industry, that probably is also going to be visible to auditors. So having, having these kinds of analyses available to provide to regulators at any point in time or to an auditor that comes in and being able to click a button.

Kathy Baxter: We've seen a lot with um, Uh, model cards, or, uh, uh, IBM calls them data sheets, uh, being able to provide a document that lists out the training data, what tests you've done, known biases, intended uses, unintended uses, and, uh, Um, uh, those types of things, there's, you can find a lot of those on HuggingFace, um, uh, and, uh, those can be very helpful for those that are particularly, uh, people in procurement that are thinking about purchasing a product, uh, being able to ask for those model cards and understand what's What's happening here?

Kathy Baxter: Is there, is there any known bias? What, what checks have you done? And if you don't get sufficient answers, then come back and, and ask for that before completing a purchase.

Joshua Rubin: So, so, you know, one, one thing that kind of just sparked in my mind is, uh, you know, there's kind of a difference between what the laws say and sort of guidelines and best practices.

Joshua Rubin: Do you imagine a world where, you know, there is, uh, sort of, the government is, Responsible for an auditing function, or do you see that more as like a sort of, um, you know, uh, internal best practice, a sort of, uh, you know, some internal process that, you know, organizations have to follow or will want to follow a kind of you know, maybe for, for lack of a better, um, descriptor, kind of legal defensibility. Like, I think in some of the, you know, I think, I think in a lot of ways, um, there's been this kind of historical lack of guidance. Uh, but companies know that if they do something egregious, they can get sued for it, right?

Joshua Rubin: Like, it's not just reputational harm, which is huge, but there's also legal repercussions if you can't demonstrate that you had some set of best practices. Uh, defined and followed those in a, in a historical sense. In conversations I've had with lawyers, that's been kind of the, the, the gist of, of, of their recommendation is law, the laws are fuzzy, uh, the best thing you can do is Define a plan and be able to show that you followed it over time.

Joshua Rubin: So I guess I'm kind of curious, like, where you come down in terms of, like, the balance between a, you know, centralized government org versus this kind of, um, I don't know, uh, internal, uh, uh, due diligence process that's just considered to be a best practice.

Kathy Baxter: Well there are a number of departments in the government that have made it clear that new laws are not needed to ensure that AI is being used fairly.

Kathy Baxter: So for example, the EEOC, Equal Employment Opportunity Commission, they've, they've said that whether it's a human or an AI, if you make a biased hiring decision against a protected We're coming after you. Um, so I, I think there are a number of departments where there are areas that are already regulated and, and, um, uh, the FDA is one of them.

Kathy Baxter: There are, uh, a number of AIs that are considered medical devices. It's not a device in a hardware sense, it's, it's still an, an algorithm, but it's regulated in the same way. So I think we, we already see some sector specific of, um, uh, enforcement of AI where, where we haven't necessarily needed new laws, we could just apply what already exists.

Kathy Baxter: Um, I do expect that additional ones are going to be. And so that sector specific approach is going to be really important because you've got those experts in transportation, food, medicine, um, uh, employment. They really know, um, These areas, these domains, and so it would be much easier for them to be able to develop regulations about what, what is fair.

Kathy Baxter: I think in other, broader circumstances, it can be more difficult. Uh, doesn't mean it's impossible. Um, but being able to understand, again, Um, what is necessary from a transparency standpoint? How can we know if what you are doing is safe and fair? we talk a lot internally about detecting signals of veracity with generative AI.

Kathy Baxter: some cases, when content is being created, It's pretty clear as to whether or not something is a right or wrong answer. So in customer service, when somebody asks, my router isn't working, how do I fix it? There is a very specific set of steps that you may need to take. And so the AI can generate that correct summarization or not.

Kathy Baxter: And so there are ways that you might able to cite sources. Where's the knowledge article? That you pulled this, this information from if you're doing RAG, if you're grounding the model in a data set. In other cases, it's incredibly subjective. in our marketing cloud product, Subject Line Generation, There could be dozens of different ways that an AI could write a subject line for a marketing campaign that could all be right, technically, um, but some might be better than others.

Kathy Baxter: Some might be closer to your company's voice and tone. Some might be better at generating a sense of FOMO um, uh, than others. And then it's just really, uh, a subjective assessment, uh, and So how do you give signals then to that end user to help them choose which one of a dozen different subject lines have been generated is the best one for them to, to try?

Kathy Baxter: Or maybe they want to do A B testing. What are the top three that I want to, that I want to do a test on? Uh, and that really takes, again, domain expertise, um, that, in order to come up with rules of thumbs, guidelines as to this is, this is good, this is safe, this is not.

Joshua Rubin: Um, yeah, that's really interesting. That, uh, that, uh, what did you call it? Signals of veracity, I think, is a really, a really great phrase. Um, I hadn't heard that before. Um, you know, coming at this from, like, the, uh, you know, sort of tooling and instrumentation perspective, one thing that we talk a lot about it on the, from the Fiddler side is, Um, how do you, like, uh, is, is, uh, it's trying to gather signals of what kind or another.

Joshua Rubin: We sometimes call them, like, feedback controls. Um, you know, and this kind of gets back to this question of, like, how generative AI is a little bit different than some of the, you know, um, predictive or discriminative models we've dealt with in the past. You know, and the lack, the lack of the obvious label, right?

Joshua Rubin: Like, to, to your point. Um, there may be a thousand different right answers that are right in different shades of right, um, and, uh, you know, being able to quantify that helps you control, um, the behavior of your model. And you'd certainly want to know if, uh, you know, there are some topics for which your generative model comes up with really great answers and other topics for which it comes up with mediocre answers, right?

Joshua Rubin: Because that speaks to model performance. Um, it speaks to, you know, something that could be improved, ways in which the model might be underserving, uh, you know, certain users, certain stakeholders. Um, you know, and of course, you know, just to bring it back to kind of the fairness and bias thing. It's hard to know the ways in which, um, you know, the, there will be interplays between things like, you know, uh, uh, you know, controlled attributes, things like, uh, you know, race, gender, demographic bias, or demographics, and, you know, things like topics, right?

Joshua Rubin: Uh, we sometimes, I'm going to wander off into a little tangent, uh, but. You know, one thing that we were talking about a couple of months ago that I think would make a really fun sort of, you know, research topic is just exploring how, um, different uses of dialect, you know, different, different dialects of subpopulations, you know, even within American English, uh, could lead to different model outcomes.

Joshua Rubin: Like I said, you know, if I had all the time in the world, I would love to spend some time researching that because we know, I mean, you can see from all this work that's coming out now on sort of Um, you know, adversarial attacks on generative AI, how sensitive the model can be to very subtle differences in wording or terminology.

Joshua Rubin: Um, so to out again, you know, having tooling that is, uh, capable of, uh, providing feedback in whatever way is appropriate for the application, you know, whether that's some sort of a human labeled thing, whether that's a thumbs up or a thumbs down on a chat bot that a human can click. Um, that may be other models scoring, you know, the model in question for, uh, you know, uh, how effective it is at meeting its task.

Joshua Rubin: Um, having a framework that kind of brings all of those things in together and tools that allow you to, to gather those things together, um, and aggregate them in a way that lets you actually kind of root cause and isolate problems is, I think, for us, seems like a really key, um, uh, set of design objectives for tooling.

Kathy Baxter: Yeah, I mean, you've touched on a bunch of issues there. I was really fortunate to work for years with Greg Bennett, who is a linguist, and we've talked about these issues many, many times. Very early on, it was making sure that chatbots could understand people that spoke non standard grammars. Um I'm originally from the South.

Kathy Baxter: When I first moved to California, I had a really thick Southern accent. Um, my, my grammar was not the same as other people in California. Um, and, uh, if I were engaging with a chatbot, uh, I would have to code switch, um, uh, African American vernacular English, Ave. Um, there's been a lot of reports and articles that show that people have to change how they speak in order for their, uh, HomeSmart assistant to be able to understand them.

Kathy Baxter: So this is a, this is a huge issue. we talk about this a lot in terms of getting training datasets in different languages. So at Salesforce, we have a trust layer that When our, uh, end user could be a customer support agent, sales rep, marketing, campaign creator, they submit a prompt and then it goes through a number of checks and detectors.

Kathy Baxter: Checks for PII, masks it, strips it out, does a toxicity check, um, goes into the, the LLM, uh, comes back out and does a number of other checks. For toxicity, toxicity is incredibly contextual. It is, it varies by language, by region, by dialect. Humans are immensely horrible at coming up with new racial slurs and insulting and, and, so, when you say that your model works in Spanish, it's not good enough to just get a Training dataset of toxic words and phrases of people that speak Spanish in the U.

Kathy Baxter: S. You need to get a training dataset from Spain and Argentina and Mexico and It's incredibly expensive and time consuming, but if you don't do that, then you could have a chatbot that is completely generative AI, so it's not just giving, it's not just rule based, it's not menu based, it's, it's, you know, natural language processing and generation, and it could generate very insulting things because you didn't have a data set that understood what was offensive or harmful for that language in that region.

Kathy Baxter: So we, we have to think a lot. Beyond our little box of where I live, my lived experience, my language, my values, and have those, those data sets have people from all of those backgrounds participating in this and. it's like a bolted on to the end, we're going to do adversarial testing. So let's find a really diverse group of people to come in and try to break this thing.

Kathy Baxter: Usually that's red teaming and security, but from an ethical standpoint, you can try to have it do really, make it say and do offensive, terrible things. Ideally, you want to have a really diverse team from the very beginning, thinking about what are all of the terrible ways this product could be used, um, and cause harm.

Kathy Baxter: So you're thinking from the beginning, from a set of values, from many different people's point of view, and how do we make sure that your AI is working to support those values, and you are putting all the protections in place to block the harmful ways that it might be used. If you, if you don't do that, if you're only doing adversarial testing at the very end, thank god at least you're doing that, but it ends up just being Band Aids, where you find that your AI is doing all these really terrible things, and you're trying to put Band Aids on each one of those terrible things, as opposed to starting from the beginning and having a truly inclusive, safe product from, from, uh, conception.

Joshua Rubin: Yeah. Yeah. I think it's a, it's a, a big undertaking that you're describing. It's really important and it, it, I, from a technology perspective, I'm actually optimistic in a way, um, because, you know, for years and years we had these sort of, you know, if you think about like, um, detecting toxicity, right? You know, it was not unusual for an L-G-B-T-Q friend on Facebook to have, you know, screen grabbed something where, you know.

Joshua Rubin: A legitimate, you know, not harmful conversation they were having with a friend or a peer, you know, had been flagged by some, uh, you know, um, , uh, you know, over rigid, simplistic ai, uh, for being toxic because of what that model had been trained on. And the model was just not capable of understanding those nuances.

Joshua Rubin: Yeah. So, I mean, it's not, it's not to say that, uh, you know, there's some problem that's magically fixed, but you know, do hold some hope that. You know, with these really exciting large models, you know, with the right human oversight, the, uh, the pliability is there for the models to finally be able to make more nuanced and helpful judgments.

Joshua Rubin: Um, but of course, as you, as you describe, you know, it, it really does take the whole village of, uh, people coming at things from different perspectives. Um, so, yeah. Yeah, yeah. There's a project there, but I think there's some, some, uh, some light that's visible. I

Kathy Baxter: yeah, I, there have been, um, uh, a lot of women, especially women of color that have been at the forefront of trying to raise these alarms.

Kathy Baxter: Um, so of, of course, uh, joy, uh, with, uh, her coded bias. Netflix, every time somebody is like, oh my God, have you seen the social dilemma? And I'm like, oh my God. Have you seen coded bias? Like what? Please watch that one. Um, and, uh, Dr. Hanani, um, she a few years ago. Uh, had, had published, um, a, a warning, had published a paper about image models using CSAM, um, uh, or, or child sexual abuse, um, uh, types of content to train image models.

Kathy Baxter: Uh, Stanford recently, um, published a, a report about that, but she was raising these issues years ago. Uh, and so Um, you, you have all of these women that have been at the forefront raising these alarms. And of course, uh, um, the Stochastic Parrots paper, Emily Bender, Meg Mitchell, Timnit Gabru, um, uh, long before Geoffrey Hinton had said is, this is bad.

Kathy Baxter: There's, there's risks here. They had published a paper saying here's all the things we need to be thinking about. Here's all the potential harms. And so we have to make sure that we are uplifting those voices, that we are paying attention to it. This is, I, I don't believe that we are moving towards a place where AGI is coming to, to kill us.

Kathy Baxter: We know how to do AI safely. We know what needs to be done. If we can pull on our big kid pants and everybody make sure that they are building AI responsibly. We have that transparency. We have that discipline. Both companies that are procuring this technology as well as end consumers. When you decide what product you're going to use, who are you going to give your data to?

Kathy Baxter: Who are you going to give your attention, your money to? If we all make those really hard choices and say, I'm not going to use this product because I know that they're not doing things that are very nice, or I don't know what they're doing, I don't know where they got their training data, I don't know how they're honestly using my data, The market really can have an impact.

Kathy Baxter: It is not enough. It is not sufficient. We have seen that. It's not sufficient, but it can have an, impact, and so it takes everybody paying attention, um, when these issues are raised, and then taking action on it, not just being horrified by the Netflix, um, uh, uh, documentary you just saw, um, but actually taking action on it.

Joshua Rubin: Yeah, I mean, to go back to your FDA, uh, analogy, I mean, I, I think, uh, you know, when we have nutrition labels on these things, it'll, it'll certainly help for consumers to understand, uh, you know, what they're getting and, uh, you know, what, what's in the sausage.

Kathy Baxter: I know whenever I see on a menu the calorie count, I look at it.

Joshua Rubin: Yeah

Kathy Baxter: And it has changed my choices where I think, oh, I'm going to have the salad. It's going to be nice and healthy, and I see it's like 1, 200 calories. I'm like, god, you gotta be kidding me. And so, um, providing people, empowering people with the information. People can make better choices when you give them the information, and sometimes you got to force Companies or organizations to give that information.

Joshua Rubin: So, I, I, we should probably cut over to the Q& A. I feel like I'm dominated, but I do want to, I, I, I think one, one question that is super interesting to me is, uh, is about, um, literacy, like, of, of the general public. Because I, I think to, to the, the thing about the nutrition label, I, I think most consumers of AI would be surprised how many nutrition labels are on every part of the applications they use.

Joshua Rubin: I mean, down to, you know, I think most people would be surprised to know that, you know, there were two different AI models selecting products for them to see on an e commerce site. One that makes a course selection, and then one that orders the items on the screen to, you know, like, optimally, uh, uh, you know, sort of, uh, pique their interest on something that they're most likely to click on.

Joshua Rubin: I wonder what you think about, um, you know, general AI literacy as kind of, you know, going together with that. I mean, otherwise, I think people will just suddenly be shocked when there's a You know, 10 model cards in the first page of their Amazon order or something like that, right?

Kathy Baxter: Yeah, again, going back to explainability versus interpretability, um, you need to make sure that you're communicating the right level of information at the right time to the right person.

Kathy Baxter: And so, um, model cards are not going to make sense for, um, uh, every consumer before they start using a product, um, but understanding why did this marketing company, um, make these recommendations to me. So clicking on that little i. We have, um, a couple of years ago, we published, uh, recommendations for responsible AI marketing.

Kathy Baxter: And we recommend to our customers that they prioritize. And when they're labeling their data, they differentiate between zero party, first party, and third party data. So, for those that might not be familiar, just very quickly, zero party data, that's the data that I give you. I tell, um, my, um, favorite Coffee spot, my birthday, because I want to get that free cup of coffee on my, on my birthday.

Kathy Baxter: So I'm giving you that information. You can trust that. Or I fill out a form and I tell you, maybe it's a makeup site, I tell you my skin tone, my skin, um, issues, my preferences. party data is the behavioral data. So what I search for, what I click on, what I bought. And then third party, that's usually inferences that you make off of what you are predicting about somebody, or maybe you purchase it from a big data broker.

Kathy Baxter: Much less reliable, um, uh, not quite as trustworthy, um, and the user likely did not consent for you to, to make those guesses or have that data sold about you. And so when, um, uh, you, you see that we are making these recommendations for you based on what you told us about your preferences, based on past purchases, and then being able to edit it.

Kathy Baxter: Maybe I purchased this thing because it was for my, um, my coworker who just had a baby. I'm not going to be buying any more of those things. Please don't keep giving me recommendations for baby, baby items. Let me delete that, um, uh, from the, from the algorithm so you don't keep recommending those things.

Kathy Baxter: So, being able to communicate, again, the right level of information. At the right time and empowering users, you can create a much more accurate AI system is going to get more engagement from your consumers. By empowering them, they can trust you, they get a better experience, and you get better ROI.

Kathy Baxter: It just makes sense all the way around.

Joshua Rubin: So I think we just answered one of the popular questions, which is how should companies include human in the loop practices to make sure their AI stays compliant and helpful to their end users? Um, I don't know if there's anything else you want to add there, or we could jump to a different one.

Kathy Baxter: Yes, so, uh, in a moment, I'm gonna start in the Q&A populating a whole bunch of links for folks, and one of them that I'm gonna put in there is a link to, um, uh, one of our user researchers, uh, Holly Prouty, um, published a piece in December on human at the helm. and how critical that empowerment is. A lot of regulations have been proposed to have a human in the loop, a human that makes the final recommendation.

Kathy Baxter: GDPR actually requires that a human make Um, uh, or that an AI cannot automate any decision with legal or similarly significant effects, so you can't use an AI to automatically decide who to hire, who to fire, who to, um, uh, give, um, uh, some other benefit to. A human has to make that final decision, but it doesn't mean anything if you don't empower the human to know, is this an accurate decision?

Kathy Baxter: Fair decision. Um, otherwise the human is just a rubber stamp. And you are, you may be complying with the, the, uh, letter of the law, but not with the spirit of the, of the law. So I will put a, um, a, uh, a link to that in the, in the Q&A in just a moment.

Joshua Rubin: I, um, I don't wanna say about human in the loop stuff.

Joshua Rubin: I sometimes, so one of my favorite things to work on at Fiddler is when we have a customer who is thinking not just about the AI model but about the application, um, and that gives you the opportunity to include things like explainability or other kinds of guidelines like you, or guidance like you've been describing that can help the human interpret the model's prediction, right?

Joshua Rubin: I sometimes think that we sort of tend to, when we think about the model focused sort of AI problem as organizations, we kind of miss that maybe the right unit of Thinking about these things is at the application level where, you know, you can have all this adjacent instrumentation and, uh, supplementation to the model that can help, um, give the human a little bit more.

Joshua Rubin: Like, I think, you know, if model development teams owned more of the application rather than the specific thing, it would maybe empower them to apply more of these, uh, principles of, uh, you know, giving more diagnostic information in a stakeholder appropriate way. Um, let me, okay, so here's another popular question, which is, uh, Okay, so for companies that are starting out with generative AI, what ethical frameworks should they consider following? Um, and are they different based on vertical and company size, like startup versus enterprise?

Kathy Baxter: That is a fantastic question. And, um, I'm putting, um, sorry, while I put in links, so just put in a couple of links there. Um, so I put in a link to our, um, guidelines for responsible generative ai and those, those apply to, uh, any company of any size or any organization of any size.

Kathy Baxter: Um. I also put in a link to, um, our, uh, ethical AI maturity model that I had mentioned, uh, earlier, um, just want to make sure I did put that in there, uh, and so, Depending on the size of the company is how much you can do at each of those stages. Um, you may end up having only one person in your organization that, that is responsible for providing guidance and expertise on how to do a, uh, risk assessment.

Kathy Baxter: Or you may be able to have an entire centralized team or you may be large enough that not only do you have a centralized team, but you also have experts that are embedded in different parts of your organization to so um, uh, the further you are along that maturity, uh, stage, but then also the size of the company is based on how, how you distribute, how much expertise that you have internally.

Kathy Baxter: Right now, there aren't a lot of people with, um, responsible AI expertise experience. I'm seeing more and more people that are graduating, um, that their programs do have, uh, an emphasis on that. And so they're coming straight out of school and they have a lot of enthusiasm, but they might not have, uh, a lot of experience.

Kathy Baxter: And one of the things that I have found is in this role, It really takes a lot of, um, uh, skill to be able to call people in, not call people out. This is one of the discussions that we talk about at Salesforce a lot. It's really easy to call somebody out and tell them they're using non inclusive language, that the idea that they have is Unethical because it's going to harm this other group that they have, they just don't know anything about this other group or their lived experience and so they would have no idea that that what you just proposed would be harmful for them.

Kathy Baxter: If you are viewed as the ethics police, nobody's talking to you. People look for ways to work around you. And so, you have to be a true partner. That you are committed to the success of the teams that you are working with. And they know that you are there to help them to create an even better product than they could on their own.

Kathy Baxter: And That really does take, uh, um, some amount of experience in having those kinds of conversations so that it doesn't feel alienating. It feels, um, inclusive. You're, you're drawing the person in to, um, up level and, and create something that's even better. So, what I recommend in those cases, if you do have trouble hiring from, from experts from outside, You can, you can train people internally.

Kathy Baxter: My background is as a user experience researcher. I worked for two decades as a, as a UX researcher, uh, co authored a couple of books. Um, I have found, and again, showing my bias, I have found that people with a user experience research background Um, they may also have an ethics, like a research ethics background.

Kathy Baxter: They have experience fighting for the user or fighting for the customer and really understanding, um, that customers Context and point of view, and so those can be some amazing individuals that if you can give them the, or you can provide for them, um, additional training in tech ethics or AI ethics, they can be really powerful in this role, and there are some training programs that are out there now, so if you're not able to hire externally, then you can, um, promote from within and have individuals within the company to take on this role.

Joshua Rubin: It does feel like, um, kind of some of the best, sort of, um, ethical AI efforts are sort of, end up just fundamentally being cross functional. Yes. It takes the technical knowledge, and it takes the understanding of the End user, the product design experience, um, and oftentimes that comes from two or three or four different places in an organization, um, which I think, I think is interesting.

Joshua Rubin: It does really often circle back to how do you get more stakeholders voices, uh, heard as part of the, um, the process of constructing an application. Um,

Kathy Baxter: yeah, I mean, I, that kind of touches on the, the last question, I believe, uh, I'm, I apologize if I'm going to mispronounce your name, Emad, um, uh, so, one of the, the main thing that I do in my role is I am that bridge.

Kathy Baxter: I connect all the different parts. I work with product and engineering. Uh, Research Scientist, User Experience Researchers and Designers, Legal, Privacy, Government Affairs, all of us together. And being that glue to make sure that everybody who is responsible for their part of governance are all working together on the same page.

Kathy Baxter: I feel like that's one of the biggest values that I bring when I work with teams. And so we have an, um, Ethical Use Advisory Council and we have representatives from every one of those roles in the company that are part of the council. We have both executives as well as Um, frontline employees, uh, and we have external experts that we bring in, um, uh, as part of our oversight as well.

Kathy Baxter: And so I, I recommend that every company has an internal governance, um, uh, council that is representative of a broad range of roles as well as demographics, lived experiences, expertise, both internally and externally.

Joshua Rubin: Very nice. Um, I think we're pretty much at time. Uh, so maybe we wrap here. Um, so thanks a ton, Kathy.

Joshua Rubin: This is a really interesting conversation. Hope everybody out there has a, has a great day.

Joshua Rubin: Thanks.

Kathy Baxter: Awesome. Thanks, everybody. Bye bye.

‍