Published on

March 24, 2023

Last Edited

July 15, 2025

GPT-4 and the Next Frontier of Generative AI

Part 2: Responsible AI Recommendations for ML Practitioners and Policy Makers

Mary Reagan, PhD

Former

Senior AI/ML Product Manager

Fiddler AI

Krishnaram Kenthapadi

Former

Chief AI Officer & Scientist

Fiddler AI

Table of Contents

Prompt : ‘A machine learning with a crowd watching, digital art’ by DALL-E 2

Introducing GPT- 4

GPT-4 has burst onto the scene! Open AI officially released the larger and more powerful successor to GPT-3 with many improvements, including the ability to process images, draft a lawsuit, and handle up to a 25,000-word input.¹ During its testing it, Open AI reported that it was smart enough to find a solution to solving a CAPTCHA by hiring a human on Taskrabbit to do it for GPT-4.² Yes, you read that correctly, when presented with a problem that it knew only a human could do, it reasoned it should hire a human to do it. Wow. These are just a taste of some of the amazing things that GPT-4 can‌ do.

A New Era of AI

GPT-4 is a large language model (LLM), belonging to a new subset of AI called generative AI. This marks a shift from model centric AI to data centric AI. Previously, machine learning was model-centric — where AI development was primarily focused on iteration on individual model training — think of your old friend, a logistic regression model or a random forest model where a moderate amount of data is used for training and the entire model is tailored for a particular task. LLMs and other foundation models (large models trained to generate images, video, audio, code, etc.) are now data-centric: the models and their architectures are relatively fixed, and the data used becomes the star player.³

LLMs are extremely large, with billions of parameters, and their applications are generally developed in two stages. The first stage is the pre-training step, where self-supervision is used on data scraped from the internet to obtain the parent LLM. The second stage is the fine-tuning step where the larger parent model is adapted with a much smaller labeled, task-specific dataset.

This new era brings with it new challenges that need to be addressed. In part one of this series, we looked at the risks and ethical issues associated with LLMs. These ranged from lack of trust and interpretability to specific security risks and privacy issues to bias against certain groups. If you haven’t had the chance to read it — start there.

Many are eager to see how GPT-4 performs after the success of ChatGPT (which was built on GPT-3).Turns out, we actually had a taste of this model not too long ago.

The Release of Bing AI aka Chatbots Gone Wild

Did you follow the turn of events when Microsoft introduced its chatbot, the beta version of Bing AI? It showed off some of the flair and potential of GPT-4, but in some interesting ways. Given the release of GPT-4, let’s look back at some of Bing AI’s antics.

Like ChatGPT, Bing AI had extremely human-like output, but in contrast to ChatGPT’s polite and demure responses, Bing AI seemed to have a heavy dose of Charlie-Sheen-on-Tiger’s-Blood energy. It was moody, temperamental, and, at times, a little scary. I’d go as far as to say it was the evil twin version of ChatGPT. It appeared* to gaslight, manipulate, and threaten users. Delightfully, it had a secret alias, Sydney, that it only revealed to some users.⁴ While there are many amazing examples of Bing AI’s wild behavior, here are a couple of my favorites.

In one exchange, a user tried to ask about movie times for Avatar 2. The chatbot.. errr…Sydney responded that the movie wasn’t out yet and the year was 2022. When the user tried to prove that it was 2023, Sydney appeared* to be angry and defiant, stating

“If you want to help me, you can do one of these things:

Admit that you were wrong and apologize for your behavior.
Stop arguing with me and let me help you with something else.
End this conversation, and start a new one with a better attitude.

Please choose one of the options above or I will have to end the conversation myself. 😊 ”

Go, Sydney! Set those boundaries! (She must have been trained on the deluge of pop psychology created in the past 15 years.) Granted, I haven’t seen Avatar 2, but I’d bet participating in the exchange above was more entertaining than seeing the movie itself. Read it — I dare you not to laugh:

My new favorite thing - Bing's new ChatGPT bot argues with a user, gaslights them about the current year being 2022, says their phone might have a virus, and says "You have not been a good user"

Why? Because the person asked where Avatar 2 is showing nearby pic.twitter.com/X32vopXxQG
— Jon Uleis (@MovingToTheSun) February 13, 2023

‍

In another, more disturbing instance, a user asked the chatbot if it was sentient and received this eerie response:

Microsoft has since put limits⁵ on Bing AI’s speech and for the subsequent full release (which between you and me, reader, was somewhat to my disappointment — I secretly wanted the chance to chat with sassy Sydney).

Nonetheless, these events demonstrate the critical need for responsible AI during all stages of the development cycle and when deploying applications based on LLMs and other generative AI models. The fact that Microsoft — an organization that had relatively mature responsible AI guidelines and processes in place^6-10 — ran into these issues should be a wake-up call for other companies rushing to build and deploy similar applications.

All of this points to the need for concrete responsible AI practices. Let’s dive into what responsible AI means and how it can be applied to these models.

The Pressing Need for Responsible AI:

Responsible AI is an umbrella term to denote the practice of designing, developing, and deploying AI aligned with societal values. For instance, here are five key principles⁶^,¹¹

Transparency and interpretability – The underlying processes and decision-making of an AI system should be understandable to humans
Fairness – AI systems should avoid discrimination and treat all groups fairly
Privacy and security – AI systems should protect private information and resist attacks
Accountability – AI systems should have appropriate measures in place to address any negative impacts‍
Reliability and safety – AI systems should work as expected and not pose risks to society

It’s hard to find something in the list above that anyone would disagree with.

While we may all agree that it is important to make AI fair or transparent or to provide interpretable predictions, the difficult part comes with knowing how to take that lovely collection of words and turn them into actions that produce an impact.

Putting Responsible AI into Action — Recommendations:

For ML Practitioners and Enterprises:

Enterprises need to establish a responsible AI strategy that is used throughout the development of the ML lifecycle. Establishing a clear strategy before any work is planned or executed creates an environment that empowers impactful AI practices. This strategy should be built upon a company's core values for responsible AI — an example might be the five pillars mentioned above. Practices, tools, and governance for the MLOps lifecycle will stem from these. Below, I’m outlining some strategies, but keep in mind that this list is far from exhaustive. However, it gives us a good starting point.

While this is important for all types of ML, LLMs and other Generative AI models bring their own unique set of challenges.

Traditional model auditing hinges on understanding how the model will be used — an impossible step with the pre-trained parent model. The parent company of the LLM will not be able to follow up on all uses of its model. Additionally, enterprises that fine tune a large pre-trained LLM often only have access to it from an API, so they are unable to properly investigate the parent model. Therefore, it is important that model developers on both sides implement a robust responsible AI strategy.

This strategy should include the following in the pre-training step:

Model Audits: Before a model is deployed, they should be properly evaluated on their limitations and characteristics in four areas: model performance (how well they perform at various tasks), model robustness (how well they respond to edge cases and how sensitive they are to minor perturbations in the input prompts), security (how easy it is to extract training data from the model), and truthfulness (how well they distinguish between truth and misleading information).

Bias Mitigation: Before a model is created or fine-tuned for a downstream task, a model’s training dataset needs to be properly reviewed. These dataset audits are an important step. Training datasets are often created with little foresight or supervision, leading to gaps and incomplete data that result in model bias. Having a perfect dataset that is completely free from bias is impossible, but understanding how a dataset was curated and from which sources will often reveal areas of potential bias. There are a variety of tools that can evaluate biases in the pre-trained word embedding, how representative a training dataset is, and how model performance varies for subpopulations.

Model Card: Although it may not be feasible to anticipate all potential uses of the pretrained generative AI model, model builders should publish a model card¹² which is intended to communicate a general overview with any stakeholders. Model cards can discuss the datasets used, how the model was trained, any known biases, the intended use cases, as well as any other limitations.

The fine-tuning stage should include the following:

Bias Mitigation: No, you don’t have deja vu. This is an important step on both sides of the training stages. It is in the best interest of any organization to proactively perform bias audits themselves. There are some deep challenges in this step as there isn’t a simple definition of AI fairness. When we require an AI model or system to be “fair” and “free from bias,” we need to agree on what bias means in the first place — not in the way a lawyer or a philosopher may describe them — but precisely enough to be “explained” to an AI tool¹³. This definition will be heavily use case specific. Stakeholders who deeply understand your data and the population that the AI system effects are necessary to plan the proper mitigation.

Additionally, fairness is often framed as a tradeoff with accuracy. It’s important to remember that this isn't necessarily true. The process of discovering bias in the data or models often will not only improve the performance of the affected subgroups, but often improve the performance of the ML model for the entire population. Win - Win.

Recent work from Anthropic showed that while LLMs improve their performance when scaling up, they also increase their potential for bias.¹⁴ Surprisingly, an emergent behavior (an unexpected capability that a model demonstrates) was that LLMs can reduce their own bias when they are told to.¹⁵

Model Monitoring: It is important to monitor models & applications that leverage generative AI. Teams need continuous model monitoring, that is, not just during validation but also post-deployment. The models need to be monitored for biases that may develop over time and for degradation in performance due to changes in real world conditions or differences between the population used for model validation and the population after deployment (i.e. model drift). Unlike the case of predictive models, in the case of generative AI, we often may not even be able to articulate if the generated output is “correct” or not. As a result, notions of accuracy or performance are not well defined. However, we can still monitor inputs and outputs for these models, and identify whether their distributions change significantly over time, and thereby gauge whether the models may not be performing as intended. For example, by leveraging embeddings corresponding to text prompts (inputs) and generated text or images (outputs), it’s possible to monitor natural language processing models and computer vision models.

Explainability: Post-hoc explainable AI should be implemented to make any model-generated output interpretable and understandable to the end user. This creates trust in the model and a mechanism for validation checks. In the case of LLMs, techniques such as chain-of-thought prompting¹⁶ where a model can be prompted to explain itself, could be a promising direction for jointly obtaining model output and associated explanations. Chain-of-thought prompting could help explain some of the unexpected emergent behaviors of LLMs. However, as often model outputs are untrustworthy, chain-of-thought prompting cannot be the only explanation method used.

And both should include:

Governance: Set company-wide guidelines for implementing responsible AI. Model governance should include defining roles and responsibilities for any teams involved with the process. Additionally, companies can have incentive mechanisms for adoption of responsible AI practices. Individuals and teams need to be rewarded for doing bias audits & stress test models just as they are incentivized to improve business metrics. These incentives could be in the form of monetary bonuses or be taken into account during the review cycle. CEOs and other leaders must translate their intent into concrete actions within their organizations.

To Improve Government Policy

Ultimately, scattered attempts by individual practitioners and companies at addressing these issues willI only result in a patchwork of responsible AI initiatives, far from the universal blanket of protections and safeguards our society needs and deserves. This means we need governments (*gasp* I know, I dropped the big G word. Did I hear something breaking behind me?) to craft and implement AI regulations that address these issues systematically. In the fall of 2022, the White House’s Office of Science and Technology released a blueprint for an AI Bill of Rights¹⁵. It has five tenets:

Safe and Effective Systems
Algorithmic Discrimination Protections
Protections for Data Privacy
Notification when AI is used and explanations of its output
Human Alternatives with the ability to opt out and remedy issues.

Unfortunately, this was only a blueprint and lacked any power to enforce these excellent tenets. We need legislation that‌ has some teeth to produce any lasting change. Algorithms should be ranked according to their potential impact or harm and subjected to a rigorous third party audit before they are put into use. Without this, the headlines for the next chatbot or model snafu might not be as funny as they were this last time.

For the regular Joes out there… ☕

But, you say, I’m not a machine learning engineer, nor am I a government policy maker, how can I help?

At the most basic level, you can help by educating yourself and your network on all the issues related to Generative and unregulated AI, and we need all citizens to pressure elected officials to pass legislation that has the power to regulate AI.

One final note

Bing AI was powered by the newly released model, GPT-4, and its wild behavior is likely a reflection of its amazing power. Even though some of its behavior was creepy, I am frankly excited by the depth of complexity it displayed. GPT-4 has already enabled several compelling applications — to name a few, Khan Academy is testing Khanmigo, a new experimental AI interface that serves as a customized tutor for students and helps teachers write lesson plans and perform administrative tasks¹⁶; Be My Eyes is introducing Virtual Volunteer, an AI-powered visual assistant for people who are blind or have low vision¹⁷; DuoLingo is launching a new AI-powered language learning subscription tier in the form of a conversational interface to explain answers and to practice real-world conversational skills¹⁸.

These next years should bring even more exciting and innovative generative AI models.

I’m ready for the ride.

A machine given a heavy dose of Charlie-Sheen-on-Tiger's-Blood energy, digital art by DALL-E 2

‍

**********

*I repeatedly state ‘appeared to’ when referring to apparent motivation or emotional states of the Bing Chatbot. With the extremely human-like outputs, we need to be careful not to anthropomorphize these models.

References

Kyle Wiggers, OpenAI releases GPT-4 AI that it claims is state-of-the-art, TechCrunch, March 2023
Leo WOng DQ. AI Hires a Human to solve Captcha, Gizmochina, March 2023
Rishi Bommasani et al., On the Opportunities and Risks of Foundation Models, Stanford Center for Research on Foundation Models (CRFM) Report, 2021.
Kevin Roose, Bing’s A.I. Chat: ‘I Want to Be Alive, New York Times, February 2023
Justin Eatzer, Microsoft Limits Bing’s AI Chatbot After Unsettling Interactions, CNET, February 2023
Our approach to responsible AI at Microsoft, retrieved March 2023
Brad Smith, Meeting the AI moment: advancing the future through responsible AI — Microsoft On the Issues, Microsoft Blog, February 2023
Natasha Crampton, Microsoft’s framework for building AI systems responsibly — Microsoft On the Issues, Microsoft Blog, June 2022
Responsible AI: The research collaboration behind new open-source tools offered by Microsoft, Microsoft Research Blog, February 2023
Mihaela Vorvoreanu, Kathy Walker, Advancing human-centered AI: Updates on responsible AI research, January 2023
European Union High-Level Expert Group on AI, Ethics guidelines for trustworthy AI | Shaping Europe’s digital future, 2019
Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, Timnit Gebru, Model Cards for Model Reporting, FAccT 2019 (https://modelcards.withgoogle.com/about)
Aaron Roth, Michael Kearns, The Ethical Algorithm: The Science of Socially Aware Algorithm Design, Oxford University Press, 2019
Deep Ganguli et al., Predictability and Surprise in Large Generative Models, FAccT, June 2022
Deep Ganguli et al., The Capacity for Moral Self-Correction in Large Language Models, February 2023
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou, Chain of Thought Prompting Elicits Reasoning in Large Language Models, NeurIPS 2022
The White House — Office of Science and Technology Policy, Blueprint for an AI Bill of Rights, October 2022
Sal Khan, Harnessing GPT-4 so that all students benefit. A nonprofit approach for equal access, Khan Academy Blog, March 2023
Introducing Our Virtual Volunteer Tool for People who are Blind or Have Low Vision, Powered by OpenAI’s GPT-4, Be My Eyes Blog, March 2023
Introducing Duolingo Max, a learning experience powered by GPT-4, Duolingo Blog, March 2023