On this episode, we’re joined by Parul Pandey, Principal Data Scientist at H2O.ai and co-author of Machine Learning for High-Risk Applications.
Although AI is being widely adopted, it poses several adversarial risks that can be harmful to organizations and users. Listen to this episode to learn how data scientists and ML practitioners can improve AI outcomes with proper model risk management techniques.
Krishnaram Kenthapadi: Welcome everyone. Thank you for joining us in today's AI Explained on Machine Learning for High Risk Applications. My name is Krishnaram Kenthapadi. I'm the Chief AI Officer and Chief Scientist at Fiddler. I'll be your host today. Please feel free to put in your questions in the Q&A at any time during the fireside chat.
Krishnaram Kenthapadi: We will alternate between some questions that I have for our special guest and also the questions from the attendees. Also by the way, the session today will be recorded and shared with all the attendees after the session. We have a very special guest on today's fireside chat, and it's Parul Pandey, a Principal Data Scientist at H2O.ai, and also the co-author of the book Machine Learning for High-Risk Applications.
Krishnaram Kenthapadi: Welcome Parul.
Parul Pandey: Thank you Krishnaram, and so glad to be here. And also very excited to be here on the podcast and to share my experience with others.
Krishnaram Kenthapadi: Let's start with the book itself. I read a few chapters in the book and I found it very insightful for the benefit of our attendees.
Krishnaram Kenthapadi: Could you share perhaps the key takeaways from the book?
Parul Pandey: Yeah, sure. The book name is also Machine Learning for High-Risk Applications, which is also the title of today's fireside chat. And it literally is, it just tells you and gives you ways in guidelines to, for doing machine learning for high risk applications.
Parul Pandey: So the way we've divided this book in two halves. So the first half tells you about the technical aspects of what you could do when you are creating products, AI products or machine learning products, especially which are in high risk domains. The healthcare, financial domains, banking sectors, and the second half of the book has implementations in Pythons.
Parul Pandey: So for the ones who want to actually go and code and see the code and want to replicate the stuff, we also have something for them. The main crux of the book actually is, so we three, three of us who wrote the book, it was me, Patrick Hall, and James Curtis. What we want to actually tell people is machine learning or AI products, like any other technologies can fail.
Parul Pandey: And it's absolutely possible they can fail. Most of the time when we've build AI products, we're always so positive that this is going to change education. This is going change healthcare. This is gonna change everything. We don't see on the negative side of it. Humans on the other side of it always get maybe they're denied a loan, they're denied some healthcare opportunities.
Parul Pandey: So what we are saying is ML can fail. Yes. It could be by intentional abuse or maybe unintentional activities. So we try to give ways to ensure that how you can manage it. Of course, this is not a golden rule or something. It's from our own experiences that we've tried to put in and we sort of put, give certain guidelines so people can follow in order to ensure that whatever they create doesn't harm anyone.
Parul Pandey: So that's the short.
Krishnaram Kenthapadi: Yes. I think this is very relevant in today's context, right? I think maybe if I could add to that or rephrase that. You are emphasizing this as technologists, we should not just design, thinking that everything will go as intended, right? We should also perhaps loop in all the relevant stakeholders, understand what might be the potential negative impacts or unintended ways in which the technology might get used.
Krishnaram Kenthapadi: This is particularly pertinent for the machine learning or AI applications. And so in the first chapter in the book, you discuss the risk management framework. For AI proposed by NIST. So by the way, for anyone who's not familiar with NIST, NIST is the National Institute of, I believe, Science and Technology.
Parul Pandey: That's Standards and Technology.
Krishnaram Kenthapadi: Sorry, National Institute of Standards and Technology in the US. And NIST comes up with lots of standardization frameworks. And one of the recent such frameworks is the AI Risk Management Framework. Could you perhaps give us an overview of this framework and how it can be applied by data scientists and machine learning practitioners?
Parul Pandey: Yes. So we've mentioned NIST a lot of times in the book, so I think this is a good platform to talk about it. So like you correctly said nest is an acronym for National Institute of Standard and Technologies, and it's a body that falls under the US Department of Commerce. So what these people did was they took in the collaborated with the privateness and the public sector and experts in those sectors, and they together came out with a document or a framework to better manage the risk to individuals, to organizations, et cetera. Which is, which comes associated with AI and machine learning. So before this, I think it's interesting that machine learning, so to say, doesn't have those strong guidelines or framework.
Parul Pandey: There's a lot of talk right now going about regulations, but everything is in a very nascent stage as of now. We do have some small things, but I think NIST AI RMF risk management framework was the first such framework which actually put forth broad ideas as to how to make your product more trustworthy.
Parul Pandey: But there's an important thing that I like to say here and also be put in the book that the risk management framework is a voluntary tool. It's not a regulation, and NIST is not a regulator. So that's not a regulator. It's a voluntary tool, which you as a company or as an individual can adopt.
Parul Pandey: But it's very nicely written and they basically have the four broad areas under which they sort of classify the risk. So they have the four categories are governed. How do you create a culture or how do you cultivate a culture of risk management in your organization? How do you map the risks, how you measure the risks, and then how you mitigate those risks?
Parul Pandey: So now what we've done is to follow on our own advice, as well. In the first half of the book, in every chapter, you'll find a call out table where essentially we map every section of that chapter to the risk AI RMF section. So this is helpful in two ways. People who've read the NIST AI RMF can get practical implementation via our book. And the people who haven't read the NIST framework, they can read our book and then they'll get an idea of what are important aspects of it.
Parul Pandey: So we've really worked very hard in, in actually aligning every chapter and every subheading with the NIST framework. So essentially, like I said, it has those four broad pillars of Govern, Map, Mission, and Manage. And everything they try to put under those subheadings.
Krishnaram Kenthapadi: Yeah. Yes. In fact, that's one aspect I really liked about the book, which is this mapping.
Krishnaram Kenthapadi: So it's, it comes across as it's not just maybe abstract guidelines, but it's very practical and hands-on guidelines for data scientists and machine learning engineers and other practitioners. Say along those lines what could you shed some light on what are some of the organizational best practices for machine learning risk management.
Parul Pandey: So I think whenever we talk about any organization and having inculcating practice so that we create good trustworthy products, I think first and most important is it has to be a holistic approach that is, it has to flow from top to bottom.
Parul Pandey: Responsible AI is not something like you know, and then after we see a product, we just sort of ask these few post topic makes sense and say, look, this is, we try to use Chat, we try to use LaMDA, and this for explaining. It has to be whole process. And one of the greatest example that we can borrow from is the banking industry.
Parul Pandey: The banking industry is a highly regulated industry, and they've been, they've all, they've been regulated. And they work pretty well. And they have a concept of something which is called Model Risk Management Framework, which is MRM and it basically, MRM itself derives from the Federal Reserve SR 11-7.
Parul Pandey: And this came into being after the 2008 financial crisis. So they made sure that something that happened in 2008 doesn't replicate. And so there are a lot of good things that we can borrow from there. And I think the first one would be, so we've written about it a lot, but I'll just cherry pick a few of them.
Parul Pandey: So one of them I think is forecasting risk failure mode. Now this is a little funny cause you'll say that you cannot forecast failures beforehand when you're creating a product. How can you do that? But this is where I think if we are able to do that, we're gonna save a company a lot of money and peace.
Parul Pandey: So one way to do that is essentially have an incident database. There is already an AI incident database for the ones who want to look up if you write AI incident database, it's a database of all things wrong that have happened with AI and it is as latest as maybe, I think few months back cause they've shown example.
Parul Pandey: And the some of the, so we always talk about some of the known examples. We talk about ProPublica, we talk about the Tay chatbot, for some recent ones, the Robodebt fiasco that's happened in Australia. There's one recent which happened in Netherlands. So these are learning. So if we are creating a product which is very similar to the ones this is going to be an a red, a lot of red flag.
Parul Pandey: And we'll know that we don't have to repeat what has happened in the past. Essentially, you learn from the past so that you don't repeat it in the future. The other thing I think of moderate risk management would be risk theory. You tier your products into like high risk, low risk, medium risk. So that you don't have to put all your workforce. Of course we understand that not all organizations have such a lot of workforce there to take care of every single risk.
Parul Pandey: So you can tier your product into three risks. Create robust documentation of the models. Something on the terms of model cards or data cards that we see today. I think it's being followed. I see a lot of large language models that are coming out now. The open-source one. It's very nice to see that all of them are coming with model cards.
Parul Pandey: But we need more robust ones actually. Also telling when and when not to use them. Have good model monitoring processes in place. Check for drift whenever drift occur. Check for model decay and so on, I think. And another important thing I think just to add, and I would also to have your opinions on that, Krishnaram.
Parul Pandey: I think when a team is working on any AI tool, what most of the time it happens is they are the ones who create, and they're the ones who test it. On the contrary, if we could have another team, which is working on a separate project, be the ones that test it, I think not only be able to make it more robust, but then also be capable of looking at maybe the darker side of what the team who's creating is not able to see.
Parul Pandey: So these are just a few of the ones.
Krishnaram Kenthapadi: Yeah. Yes, I think that that's a really interesting observation. In fact I have seen that the banking industry does something along these lines. So there is a team that develops machine learning models, or even broadly any analytical models. There are other teams which actually measure the risk or other effects of these models. So I think that creates a nice structure where there are, and these two teams may have slightly different incentives, and there is a, there's a kind of positive tension and a tension in a good way between these teams. So it's not as though people who are developing the models may be malicious.
Krishnaram Kenthapadi: It's unintentionally they may not check for say, biases, or other blind spots. And so having a different team that serves as an auditing or a layer risk management kind of layer before the models are deployed, I think helps a lot. This is a practice perhaps that other industries can also potentially adopt.
Krishnaram Kenthapadi: Like in tech companies, I have seen the structure where there is often a team which develops the machine learning models or the data science or the AI team, but often in sensitive domains, the team needs to get approval from the security legal and or like PR teams. So these are other teams, the security, legal, PR often act as checks on what gets deployed. But the challenge is that often now, they may not have enough bandwidth to review each and every feature or machine learning product updates that happen. So the focus gets confined to some of those highly sensitive domains. At least this was the case when I used to be part of LinkedIn several years back.
Krishnaram Kenthapadi: But I think as you point out, having separate teams focused on risk assessment and focused on auditing and so forth, it's always a great idea. And the key thing in all this is the right incentive mechanisms. At different levels in the company, it doesn't really help if say the leadership is passionate about responsible AI or is aligned on ensuring that the machine learning and AI models are trustworthy, but the data scientists or the other teams in the organization are not sufficiently incentivized. Often I say that the data scientists should get as much incentive finding any issues with the models as they get when, say they result in business metric improvements. I think that's more like in cultural and the organizational aspect that as leaders, the leaders at an organization should inculcate across the organization.
Parul Pandey: Totally. I think and just one example that came to my mind was if you remember the Twitter algorithm, of course, before the Twitter 2.0 that we have seen now.
Parul Pandey: So before that they had this image cropping algorithm, which is essentially that if you put up any pictures on Twitter. So what you see on your timeline is decided by that algorithm in a way so that you know you can get more eyeballs. And one day suddenly one of the person had put up a tweet there that, it was favoring fair people and it more than the darker ones, it was favoring females and a lot of other problems that they had.
Parul Pandey: And at that time, then Twitter actually realized that there is some issue with their algorithm. And so much so that they had to scrap that algorithm and then they organized a bug bounty. Basically, they asked people to come and test their systems. So I think bug bounties are also good if you have an open-source system of course, you can do that for those one, but they organized the bounty. People came and they actually showed flaws in their algorithm. It was a very nice case study actually. I think OpenAI is also doing a bounty. And then yesterday LLaMA 2 was released, the LLM and I was going through the white paper, which is like huge, but it was very nice to see a complete section dedicated to responsible AI. They've created a framework and they are also going to incentivize people who are going to find bugs in their software.
Parul Pandey: So I think these are good ways in which people are doing and this good service potential examples to other players in the company or in the same field who are working. That it's not always about the business.
Parul Pandey: If you are going to make sure that your system is foolproof, it's going to give you better, I say returns over a long period of time. Then you get sued by some person after a few years maybe.
Krishnaram Kenthapadi: Yes, exactly. I think this idea of bug bounties is a really interesting aspect. I believe it generally came out of the security community, right?
Parul Pandey: Yes.
Krishnaram Kenthapadi: Giving more and more adoption in other communities like the data science and machine learning community and in fact taking a step further, often when it comes to machine learning, or especially like more recent large language models or generative AI models, there's often an emphasis on red-teaming. Again, like inspired by the red-teaming in the security community where just like bug bounty is, the focus is also on discovering issues internally before perhaps making the product available externally. I think bug bounty is red-teaming and reviews by perhaps different teams with the proper incentive structure. I think all these are perhaps part of the solution for addressing model risk management. We will get shortly to the elephant in the room, right?
Krishnaram Kenthapadi: Like the large language models, generative AI models. But before that I would encourage all the attendees to post their questions in the Q&A. And it's nice to see the comment from one of the attendee is that they read your book as part of the summer course that the your co-author, Patrick Hall, taught. It's really nice to hear that.
Krishnaram Kenthapadi: So in the book, one of the incidents you discussed is what happened with the Zillows iBuying program? Could you perhaps share some lessons that we can learn from the rise and fall of this project?
Parul Pandey: Yes. We've tried to also give case studies where because case studies always help to understand things better, but in no way, like they're trying to point out that Zillow did this wrong or this could have happened with any company.
Parul Pandey: And I personally know a lot of brilliant engineers who work at Zillow, but this is a great way to understand what can go wrong. If you miss out the red flags. Zillow, as everybody knows, is a real estate tech company, and it changed the whole real estate market in its prime. So in 2018, if I'm correct, I think it entered the business of what is called iBuying which was initially called Zillow Offers, and then it was called iBuying, which was, it started buying houses, under market value houses.
Parul Pandey: And then it started to refurbish them and then it sold it for profit. So they started doing it in 2018. And initially what they did was, they brought in bunch of people whom we could say domain experts or maybe the local real estate people and other domain experts. And they had this ML algorithm, which they call Zestimate.
Parul Pandey: Which is Z in front of estimate. And the use is to predict the price of the houses. So if we see that local real estate agents have a lot of know-how about, what is gonna be the price of this house, or what is tomorrow going to be developed in front of the house, will its property valuation go down, go up.
Parul Pandey: So these people have a lot of knowledge about that. So they started predicting the house prices and but what happened was the house pricing market during that time was pretty much inflated. And because the domain experts take time because we are humans, we start taking a lot of variables into account. Zillow sort of was, they wanted to get the offers fast, and so what they did was, to scale up they got rid of the first red flag. They got rid the humans in the loop, so they solely started relying on their algorithm to predict their price. Now they were in such a rush, or they wanted to scale so fast that they actually acquired properties at the rate of about 10,000 homes per quarter, which is huge.
Parul Pandey: Now you acquire such a lot of houses, then you also have to flip them. You have to sell them. But houses, unlike any other products, even, let's say, used car, it takes time. You have to refurbish. You have to renovate a house. You only then can sell it. The problem is the local contractors couldn't keep up. It used to take time for the house and then COVID struck, pandemic struck.
Parul Pandey: So again, they did not have those local renovators. The house prices market also started to slow down. And then what happened was a total, you could say what you, a company like Zillow would've never expected was, a lot of houses inventory just started going up. They were not able to sell the houses, and the situation went so bad that literally they had to post their house by a lot, even lower than what they bought it for. Not only that, they had to write off $500 million. And the worst part, and this is where I say that, at the end of wrong decisions is always human, they had to let go 2000 of their workforce which is like one fourth part of that.
Parul Pandey: So ultimately, yes, the company incurred losses. But I think the ultimate is a lot of people lost their jobs and they might pay from many other departments also. They could be from human resource, they could be from marketing, they could be from anyone. Now, this case study, what does it show? I think the first guess last for me is you should always validate with human expert.
Parul Pandey: Human expert might not be always as fast as the, but their experience is very important. Especially for high risk situations, especially for this, I would consider high risk. Secondly, again, the word that I said risk forecast failure mode. They should have thought what will happen if maybe the price bubble burst, the housing price is burst, and what would happen?
Parul Pandey: So they should have ended it beforehand, before buying so many houses. And I think model monitoring here should have, if they had a proper model monitoring framework, this would have actually raised the red flags the model accurately should have gone down because the price that their model or therein got it was predicting was not the price that was now in the market.
Parul Pandey: So that could have been a red plan. If your model is decaying, there is a drip. And that is something, which for me is very difficult to digest it, how they to do that. So essentially, I think, and it's also said that the governance is also gone because Zillow is actually, the management there is always the same as risk taking.
Parul Pandey: But this one went too far, I think. So for me, these are the two or three takeaways.
Krishnaram Kenthapadi: Yes, I think like these are all really insightful lessons, right? Again, from the spirit of what can we learn from this incident as opposed to, we are not necessarily pointing fingers at this company or any one company.
Krishnaram Kenthapadi: I think, as I mentioned, just like in this case of real estate, in any domain, the models might degrade over time. Often the conditions under which the models are trained may not necessarily hold during deployment. So it's very important to make sure that we not only test the models before deployment, but continuously keep auditing and monitoring them post deployment.
Krishnaram Kenthapadi: Yeah. The other highlight, I remember when reading this section in the book is, you emphasize that unlike perhaps other incidents, AI related incidents can often occur at scale. And so that's also a huge risk. This analog discussion I recall reading is in the book by Cathy O'Neil, the Weapons of Math Destruction.
Krishnaram Kenthapadi: She emphasizes that when the data driven models have some biases, they have the potential to cause bias at a huge scale, unlike say, let's say a human judge or a human expert who might have some their own biases. So likewise, even with AI incidents, I think this fact that they can occur at this huge scale and then that might result in huge impact in terms of business impact or impacting the people and so forth. I think that's something to always keep in mind.
Parul Pandey: Yes, totally. I think it's the dangerous aspect of this, like you said, is the scale and the scale can be huge, and so the ramifications of it is also huge.
Krishnaram Kenthapadi: So I, I see a question from Apoorv. Would love to hear your opinion on risk management in banks. Often the risk management is driven by regulation yet the banks seem to fail quite often. Where do you see the state of regulation in AI?
Parul Pandey: So I think this is what I also started with. We don't have any regulation in AI as of now. There's been a lot of voices around and the EU AI Act. And but we don't have, and I think it's not very clear. So you have different groups now. There's one group who talks about existential risks.
Parul Pandey: There's another group which is actually talking about that we are not worried about the existential risk, we are worried about the existing risks. Which I mean like the biases and and those risks. So to say, even I don't have any answer. Because I don't know. And the weird thing is that a lot of people who actually are experts in that are not involved in those talks of regulation.
Parul Pandey: If you've also seen, there are people who are meeting the threats against everyone. But I have not seen anybody apart from the big companies. So for me, I think a good way is this, for me, I think it's a good first way, and if regulations are based on something, what risk is also proposed. It's something that I find easy to implement, easy to follow, because again, regulations should also be something that should be easy to follow and should be practical.
Parul Pandey: So we have to, I think in today's world and look at the pace at which things are going on in AI. Like I was saying when we started writing the book, there was no talk of LLM. LLMs for us were like BERT and things like that. And by the time we ended the book, it was crazy. Initially we would talk about millions of parameters, but now billions of parameters has become the new normal.
Parul Pandey: So we have to accept the fact that we are not going to go back from here and we have to accept this. And now create regulations in a way, keeping humans at the center stage and seeing profit away from, what we are trying to achieve. And we are going to have voices from every sector here. It has to be a diversified panel who are going to make the final call.
Parul Pandey: And this is what happened in NIST, so why I'm quoting a lot about NIST is because these people were working for about one year or something. They were, they opened up their draft to people. The people then also gave their recommendations, suggestions, everybody was welcome. And then they finally created the whole AI RMF. So to be honest, even I don't know where this is going to go.
Parul Pandey: We've had the EU AI Act there, let's see. And then the weird part is, every country also has to adopt. So every country will then adopt and then go from there. Honestly it's a long road.
Krishnaram Kenthapadi: Yeah. Yeah. Yes. I think as you mentioned with the rapid pace of developments in AI, makes some of the regulations a little bit harder compared to say, a sector like banking. Even in the sector like banking, I do think that the regulations have helped a lot. The regulations, whether they are from an end user point of view, like the fair credit lending related regulations or regulations from a risk management point of view, which are more like SR 11-7, I think these regulations have definitely helped.
Krishnaram Kenthapadi: Even though we see a few bank failures, especially recent failures like the Silicon Valley Bank. Yeah. The regulations still seem to have played a big role. In fact, one of the reasons perhaps this failure happened is because they were not covered by the regulation. If the regulation was more, perhaps more comprehensive, then these kinds of issues may have been directed early on.
Krishnaram Kenthapadi: So likewise, I think in the case of AI the challenge is to craft regulations at the right level. Like the regulations should not be too granular or too specific to stifle innovation. At the same time, the regulations there have to be at the level where, it addresses different types of risks, right?
Krishnaram Kenthapadi: I think the the aspects like the NIST AI risk management framework or the White House of Science and Technology AI guidelines. I think all those are kind of rights steps in that direction. Hopefully like in sometime over this year or next year, we will see a set of such regulations and of course, the regulators, those who are deciding the regulations have to collaborate with the people who are working in this field so that the regulations can be done at the right level. So I see another question, which is for high risk machine learning applications or model development standards or their model development standards on how to build a predictive model for high risk machine learning applications, do you think it's necessary to have model development standards on how to build a predictive model? For many people in the tech field, the main joy of doing data science is the experimentation aspect of the work. But in regulated industries, it may not be advisable.
Krishnaram Kenthapadi: How do we align following the instructions from regulations and at the same time maintaining creativity and innovation?
Parul Pandey: So that's a great question. I think experimentation is always allowed. But what the regulators in the end want and it's if you also put yourself in the shoes of the people.
Parul Pandey: Let's say you applied for a loan and you denied that, you would want to know why I was denied this loan so anybody would want to know that. Now the problem with some of the predictive models is that they're black box and there's no way you can tell even for it experts, even for the people who might have just written that they got them, that how the model is making its predictions.
Parul Pandey: So in that case, we have to go for model which are explainable. Now, initially I used to read a lot of forums and in lot of you could say articles that it was said that, that a trade off between accurate explainability, which means that if you want a highly accurate model, then you will not get explainable model and vice versa. But we've also addressed this thing in a book that sometimes, and then there's a great paper by Cynthia Rubin who says that stop explaining high stake decision with black box models. Essentially says that no, there is no trade off. Today. We have models which are both explainable and as accurate, if not maybe more than your best model.
Parul Pandey: Look at GAMs, which is generalized additive models. So these models, they also preserve the interactions between the features. They also give you high core, high accuracy, but you can explain that the questions made by these models. There are other, so there's something by Microsoft, which is called Explainable Boosting Machines.
Parul Pandey: Again, those are also at par with your XGBoost or LightGBM, but they're explainable. So if I give you a choice of, let's say, do you want an explainable accurate model or you want a black box accurate model, which one would you choose? Of course I think the answer would be to choose an explainable model because then that makes it easy for you to also give explanations and reasons which are also required by regulators.
Parul Pandey: So that is something you really should keep in mind. Of course if you're gonna say about image and text, we can't change those models so that we have to go for deep neural nets. Then we also have to take help from some of the explaining techniques to apply for that. But if you are working for a domain where you have the choice between any explainable model and a black box model, I would advise to go for an explainable model.
Krishnaram Kenthapadi: Yes I think these are really fantastic pointers. I would encourage all of you to take a look at the paper from Cynthia Rudin in the Nature Machine Intelligence perspectives and also the other work from Microsoft Research. It's not as though there is always a trade off. Often in the process of building interpretable models or in the process of fixing biases of the models we might tend to be improving the performance of the model as a whole.
Krishnaram Kenthapadi: It may not necessarily be a trade off between accuracy and fairness or accuracy and or interpretability and so forth. So this might be a good time to shift a little bit to the large language models and generative AI models. So as we discussed they have been evolving at such a rapid pace.
Krishnaram Kenthapadi: As you mentioned in the book, the risk management for these systems is not as well understood as it is for say, supervised models. What may be some of the guardrails we can consider in the meantime?
Parul Pandey: So yes, I think this field is still evolving. We are getting every new model, bigger models, small models, and if I go through my Twitter timeline, I keep seeing people experiment, experimenting with this, but then they're also one set of people who are actually trying to jail break this.
Parul Pandey: So I really follow them and so I think very basic things would be for any person to do today, as of this, as of today, maybe is very simple things like don't copy please directly from there and just put it out because firstly, hallucinations, we don't know if that what they're just spitting out is true or not.
Parul Pandey: Certainly we don't even know if that is plagiarize, right? So you don't want to put yourself in a situation. Let's say you copy and exit from there and next day you see some copyright claim to your work. So check the generated content properly. And what I've seen is there's a lot of plugins that are coming out, so for example, for ChatGPT people are creating like don't waste your time on this.
Parul Pandey: Plugins for everything, plugins for reading your emails, plugins for reading papers. So be cautious of those plugins because there's a very nice example which is given by Simon Wilson, actually if you're not following him, please follow him on Twitter. He is actually doing a lot of work on jailbreaking systems and testing these LLMs.
Parul Pandey: So he said that people are creating plugins for reading their emails and then also replying them automatically. And he said, think of scenario where there there's an automated plugin and assistance, which reads your email, forwards them to some other email, and then deletes all your emails. Think of such possibilities also.
Parul Pandey: So just don't blindly go for automating everything. Like I see some of the posts on my LinkedIn see also, you're using ChatGPT. You're using this wrong, that wrong. Do this. Do this. And other than that, also I think there's a very nice, that people are doing some research on data poisoning of large language models.
Parul Pandey: And it's come to light that because large language models are instruction tuned. And so if you poison even very few samples of that training data, you might be able to poison your model. And you can make it give an output that you want. So data poisoning is something which is coming out and it just little alarming also because it's the strength of these large language models lies on the data on with their train.
Parul Pandey: But then this data is out. It's crowdsourced. And if there's an adversity which poisons this data, and then you blindly download this and use this for your application then. So you to be aware that this is not Oracle. Be very careful in using them. And again, I think there's this whole talk about open-source and of course proprietary.
Parul Pandey: So it needs for open-source you can download on your system. Proprietary, you'll have to send your data to them. I think there's an example of something where an employee actually put their documents they wanted some help, so they copied, pasted some of their documents on ChatGPT.
Parul Pandey: And it went into the data, and then it was actually giving out the details. So it had to be banned ultimately. So I think these very common, I think these are not some very, I don't know specific ones, but just these are just common sense. I'll just say guardrails.
Krishnaram Kenthapadi: Yeah. Yeah. Yes, absolutely. I think a as, as you highlighted there are several challenges, right?
Krishnaram Kenthapadi: One is like data poisoning and this is something recently people have shown that it's fairly easy to poison these large language models. And the challenge is that if the model is models are poisoned because of the huge size of the models, it may be hard to detect them even if they are available as open-source models. Even if you have access to the models, it may still be very hard to detect those.
Krishnaram Kenthapadi: The other dimensions you touched upon are dimensions like privacy from the perspective of privacy of the person who is querying the model, right? Like the Samsung example. Or there may also be concerns from the perspective of say, copyright, like the model might be trained on content that might contain copyrighted information or information from perhaps the competing companies that are competitive to your organization.
Krishnaram Kenthapadi: So it's often important as you pointed in the book, to not just copy, paste the response. It may be while we are developing tools to detect all these kinds of issues. It may be good to at least rephrase in the case of generated text. So take the response from the LLM as inspiration but rephrase or don't just use it as is because there might be, it's possible that it may be just mimicking content which is copyrighted.
Krishnaram Kenthapadi: So another dimension here is to ensure that these models are robust, it's often these models made very brittle. So before building an LLM-based application, it's important to stress test the models for robustness. This is one, one aspect that we, we have been working on recently.
Krishnaram Kenthapadi: We recently open-sourced the tool called Fiddler Auditor, which is a way to measure how sensitive the model is to minor perturbations in the input. If the input has not semantically changed, but synthetically it has been modified, we would expect the output of the model to ideally not change substantially. So that's the intuition underlying the robustness checking that we are doing.
Krishnaram Kenthapadi: Yeah if you're interested in more pointers I would encourage you to also take a look at tutorial that we have been presenting at a few research conferences. We will be, we presented at the ACM Fairness, Accountability, and Transparency Conference a few weeks back. We are going to be presenting the same tutorial at ICML and KDD conferences over the next few weeks.
Krishnaram Kenthapadi: We'll be happy to share the link to the tutorial and we love to hear your thoughts and feedback. So another comment I hear actually before going to the next question. So what, related to the previous discussion on how to kind of balance creativity, innovation, and following the instructions in the regulations.
Krishnaram Kenthapadi: One aspect I could think of is, it might be helpful to perform experimentation in a Sandboxed environment, so that way if you think of say, let's say you are at a large bank and you want to consider a new machine learning model for deciding whom to provide loans or credit to. So it may be good to simulate the effect of that in a sandboxed environment before actually deploying it on real people.
Krishnaram Kenthapadi: So that way it's, it balances performing innovation, performing some kind of creative ways of doing things, and testing the effect of those before actually deploying. So that might be a middle ground that might balance both these aspects, the following regulations and also supporting innovation or creativity.
Parul Pandey: Yeah, I think this is a great point and I think this could also apply to the products. I mean something pre as for pre beta stage where you just open it up to few people and then in a more of a testing phase and if the company has that bandwidth. And then I think this could, I think the way you talked about robustness in a way this could be stress testing, robustness, and ultimately then the product that comes out, at least you'll be sure that at least you've taken care of some of the main issues that could have come up because ultimately opening it up to people and then testing on people is again, I think not a very good idea.
Krishnaram Kenthapadi: Yeah, of course there is a, there is yeah, tricky aspect with regulations as so I think one of the attendee points, so that often regulation raises the barrier for new players or for even investment. And as a result, it often helps the dominant incumbent players. So that's always a, yeah challenging aspect of regulations. And I don't think there is any easy answer. At least I'm not aware of. Yeah, technical answer to this broader problem.
Krishnaram Kenthapadi: So another question I see is I didn't catch the discussion about model cards earlier. What are the best practices in creating model cards? What should be included and how can I use them to align with other teams?
Parul Pandey: So model cards basically is a document that gives all the information about your model. So it's an analogy that could be, would would be like you have labels at the back of the food items that you buy from the market.
Parul Pandey: And so you have, what is the ingredients, when to use, when not to use it. Before that. So we have very good templates. I think from Google, I think it was painted by Margaret Mitchell who came up with the idea of model cards and model cards can be as you can say, you can put as much of detail as you want, right?
Parul Pandey: From what are the libraries used, when you should use it, when you should not use it? What are the data that was used? Which data you think is sensitive? Who was involved in, in, let's say, which was the cloud, was the data crowdsource, whether from where you acquired the data. So every little detail, and it actually depends upon you how much you want to, and then you can also give, where it could be used, where it should not be used.
Parul Pandey: So it, I think there are great examples. If you go to the website of model cards, you'll find some great detailed examples. Now, if you even see on Hugging Face, all the models, they do come with model cards. So if you go click on them, but it of course depends upon the person who's filling them.
Parul Pandey: So it essentially depends upon you if you're the creator of the model or your company, how detailed you want it to be. And I also think that model cards are there also data card, which is technically meant for data. I also think there could be something called model inventory if you are as a company.
Parul Pandey: To have a clear idea of how many models you've deployed at this moment, so how many models are in production, how many models are not in production how many models your customers are using. They fall into which field. So instead, somebody was to ask, okay, how many models, how many of your models are there in production at this moment? Okay, there are hundreds of them at this moment and stuff like that.
Parul Pandey: So it's essentially just more information as much as you can get. Because that makes it easier for people, let's say, who have not been involved in creating the model. But then read the model class and get an understanding. And when you also try to describe things, some of the red flags or some of the problems that the models can also be caught at that way you're defining something.
Krishnaram Kenthapadi: Yes I think these are really nice points and thanks to Dimple for sharing your link to an example model card from the project that they've done. And if you search for model cards on any search engine, you can find a set of links including the link to the original paper title Model Cards for Model Reporting, as well as as Parul mentioned model cards provided by Hugging Face and several other players.
Krishnaram Kenthapadi: So I know that you, you are based in India and you have been very much plugged in into the AI and machine learning in Indian settings. Are there unique aspects and challenges or anything interesting you have come across when it comes to applications of LLMs and other generative AI models in Indian settings?
Parul Pandey: There's one at the back of my mind is and I think it's something that we've been hearing everywhere now that most of the large language models today are focused on English language. But what about countries and places whose native languages not English? Like especially for India who has so many different languages, we don't even have one.
Parul Pandey: So how do we make these language, large language models or these services useful for others? So that's you could say a group called AI4Bharat which has been working, the people are from, some them are from Microsoft, and they're working with IIT Madras. And they've actually come up with repositories where they have languages and dataset curated in Indian languages.
Parul Pandey: And I think this is really a great thing that they're doing because it's very difficult to find a dataset in these specific languages. And they've open-sourced it. They put it on their website. So for if people in India, they want to create a product which is localized AI product then we can use their datasets because I think getting data is the hardest hurdle that one facest.
Parul Pandey: And essentially I think going further, we should see more collaborations from countries who, whose native languages not English, to come out with such large language models, which are like fine-tuned for their own specific languages. Because this is essentially when we say democratization of AI. So AI should not be limited to the few people, a few countries or few geographies.
Parul Pandey: But it should be in a way be accessible to everyone and getting the data in your local languages. I think it's a great place to start with.
Krishnaram Kenthapadi: Yeah. Yes. I think when I first got to know about the AI4Bharat initiative from our earlier discussion, it I found really fascinating.
Krishnaram Kenthapadi: And they have developed an app called Jugalbandi which provides a way for people, even people who maybe in some remote village to communicate through the app and get access to government services. It's a really interesting way in which they're leveraging generative AI and large language models.
Krishnaram Kenthapadi: I would encourage all of us to check this out. And also I'm really excited and looking forward to yeah, an event, a virtual event that the ACM India KDD group is organizing on August 6th. So I'll be part of a panel on generative AI in India, and one of the other panelists is a researcher who has been working on AI4Bharat.
Krishnaram Kenthapadi: So if you're interested please take it registered for that event, and would love to have everyone join there. I think we are, we have a few minutes left and I'd like to take another question which is, what are some things we can consider or implement to reduce the risk probability before deploying a model?
Parul Pandey: If we have to, we want to ensure, and I'll just dive this little bit and just give an example. We've all used mobile phones today and I've had I've visited a lab where these are actually created and made, and every mobile phone undergoes a series of stress tests about hundred or thousand.
Parul Pandey: So they are let from, let's say, some feet. Continuously get switched on and off mic every time. So they are stressed tested so much they put music, they're thrown from there. So to test whether they break or not. And so that before they land on our hand it's ensure that, there are robust have gone and they work as intended before. AI in the same way, aI models, AI products are also products.
Parul Pandey: And so before releasing them, make sure they're continuously stress test. Check for robustness, check for Cybersecurity is an element which should be now become an integral part of our machine learning lifecycle, red-teaming, bug bounty. So all these stuff, just make sure that you are going to ship a product and this product, it will go in the hands of consumers and the consumer should not be affected.
Parul Pandey: I think if you go with that mentality I'm sure you wanna create products which are safe. So ultimately, as data scientists, as machine learning engineers, Our ultimate responsibility is to create products that are safe. And you have to ensure safety as creators of it.
Krishnaram Kenthapadi: Yeah. Yes, absolutely.
Krishnaram Kenthapadi: I think to just to add to that, as technologists, we all have this shared responsibility of ensuring that various machine learning best practices, risk management frameworks, and so forth are followed. And the onus is on us as much as everyone else to create awareness about these dimensions and work with other stakeholders, whether these they are, they may be from engineering product, or legal, or security, and several other teams including the, those who may be impacted by the machine learning models to collectively arrive at consensus and arrive at ways to address these issues beforehand.
Krishnaram Kenthapadi: So thank you all for joining today's session. And as, as we mentioned earlier, like we will share the recording afterwards. And if you are interested more, please take a look at Parul's book and Parul I remember you may also have the link for that your trial access to the entire O'Reilly collection, right?
Parul Pandey: Yes. So I think for the first 10 days also you can just log into the O'Reilly account. And then I also have the link for additional 30 days. So that makes it 40 days. So you have access to the entire platform where you can read books including ours but you can also read all the other O'Reilly books also, so we'll also share that link.
Parul Pandey: But I think I had shared it already but I'm happy to share it again in case it's not in the chat. So you do get access to it, but if somebody wants the other free eBooks please let me know. Because O'Reilly can also email some books. So if somebody's really interested to read that or I can talk to O'Reilly and they can send you the other eBooks as well.
Krishnaram Kenthapadi: Sounds great. On that note, thank you all again for joining and we look forward to seeing you all in the next AI Explained session in a few weeks. Thank you all.