AI Safety in Generative AI | Fiddler AI Blog

Published

June 30, 2023

Last Edited

April 15, 2025

Shohil Kothari

Director of Growth

Fiddler AI

AI icon Peter Norvig, Distinguished Education Fellow at Stanford’s Human-Centered Artificial Intelligence Institute, joined us on AI Explained to explore how organizations can preserve human control to ensure transparent and equitable AI. Watch the fireside chat on-demand now and check out some of the key takeaways below.

AI safety needs to be a top priority

Safety should be a top priority in all AI endeavors. The focus on chatbot vulnerabilities and the resurfacing of existing harmful information may be misplaced, as this information is already accessible through regular search engines. The true threat lies in AI systems synthesizing hard-to-find disruptive information. Defense measures should aim to deter casual exploitation, while acknowledging that determined individuals may be harder to stop. Red teaming and ongoing system refinement are crucial in identifying weaknesses and enhancing resilience. Overall, continuous improvement and prioritization of safety are vital in addressing emerging threats and ensuring the secure development of AI systems.

Building a safe system requires a dedicated team to thoroughly test and try to break it, involving cybersecurity experts. Open-source models pose challenges in terms of monitoring and controlling their usage, potentially allowing malicious actors unrestricted access. Offering models through APIs allows for better oversight and the ability to monitor and promptly address misuse or attacks. The concern is that unconstrained access to open-source models may hinder the ability to control or prevent misuse.

Safety is a crucial concern when dealing with neural networks. While neural nets can be difficult to explain due to their complex matrix computations, other techniques like decision trees offer relatively easier model explanations. However, the fundamental challenge lies in understanding the problem itself, regardless of the chosen solution. Many AI problems lack a definitive ground truth, making it harder to determine correctness. Comparing neural nets with the simplest possible decision trees can help assess their performance. It is worth noting that bugs in software, including those involving IF statements, often stem from overlooked exceptions or conditions during problem understanding. Ultimately, the key lies in comprehending the problem rather than fixating solely on the solution.

AI fairness is complex

Defining AI fairness is a complex task. While many organizations have established AI principles, there is a need for more detailed guidelines, particularly regarding surveillance, facial recognition, and data usage. Achieving global consensus on these principles may be challenging, but it is essential to determine goals and implement systems that ensure AI compliance. Designing AI systems requires considering society as a whole, beyond just the user, with stakeholders such as defendants, victims, and broader societal impacts taken into account. Optimization for the user alone is insufficient, and attention must be given to the wider implications and fairness considerations.

It is crucial to measure performance and maintain awareness of biases in AI systems. Building diverse teams that encompass different groups, nationalities, and cultures helps in recognizing and addressing model bias effectively. Examples such as search engine improvements demonstrate the value of diversity in providing more inclusive and accurate results. Diversity also adds unique information, preventing repetition and enhancing quality. Biases can arise from data sources, societal biases, and the need for more examples in machine learning models. Enterprises must consider customer inclusivity, although limitations and trade-offs may result in some individuals or minority groups receiving less attention.

Creativity versus accuracy

LLM hallucinations and creativity can be viewed as synonymous. AI systems require clear instructions on when to be creative versus when to provide factual information. For example, falsely generated legal precedents can be deemed illegal, highlighting the need to define boundaries. To enhance accuracy, AI systems should have access to knowledge bases or consult expert systems. Just like humans, these systems may need to rely on external sources to expand their knowledge. The architecture of AI systems should separate creativity from factual reporting and ensure proper documentation of argument sources. As these systems grow in complexity, it becomes increasingly important to strike a balance between creativity and factual reporting, while incorporating external knowledge and maintaining transparency in the decision-making process.

Responsible AI is cross functional

Ensuring responsible AI practices requires a multifaceted approach. While AI regulations are important, it often lags behind technological advancements and may be limited by the lack of technical expertise among regulators. Internal self-regulation by tech companies is motivated by ethical considerations and the desire to prevent misguided external regulation. Technical societies can contribute by establishing codes of conduct and promoting education. Optional certification for software engineers could enhance professionalism and accountability. Third-party certification, similar to historical examples like Underwriters Laboratory, can provide independent verification and assurance in AI systems.

The control of technology, including AI, should encompass measures to prevent malicious uses. Many technologies have both positive and negative potentials, necessitating a balance between their benefits and risks. Implementing preventive measures such as API restrictions and safeguards against casual misuse can help mitigate risks. However, preventing determined and professional users from exploiting technology's capabilities is challenging. It is important to recognize that AI may not significantly exacerbate the potential for misuse, as many of these risks existed prior to AI's emergence, although it might slightly facilitate certain tasks.