Background

OpenAI Introduces 'Model Spec,' Which Outlines The Rules Its AI Should Obey

OpenAI Model Spec

The so-called black box of AI, is like the mysterious inner workings that aren't readily understandable or interpretable by humans.

AI involves complex algorithms, layers of neural networks, and vast amounts of data that come together to make decisions or predictions. Black box in AI happens because humans only know how to input data into these systems and receive outputs, without understanding exactly how those outputs are generated.

Another way of saying it, humans only know how to toss the ingredients into the pan and see what can be cooked, but not quite knowing the exact process that made it happen.

OpenAI is the pioneer in creating generative AI products, and that it made headlines around the world after particularly releasing ChatGPT.

The company knows more than a few things about decoding the black box, and this time, the company is offering a limited look at the reasoning behind its own models’ rules of engagement, like whether it’s sticking to brand guidelines or declining to make pornographic content.

Generative AI-powered conversational chatbots, like ChatGPT, can often refuse to answer users' questions.

"Sorry, I can’t do that," or by saying some other polite refusal.

This can happen because Large language models (LLMs) don’t have any naturally occurring limits on what they can or will say. This is part of the reason why they're versatile, and also capable of hallucinating.

It's also the reason why they can be tricked into doing what they're not supposed to.

AI tools that are released for use by the general public must have guardrails on what it should and shouldn’t do, but again, defining these, let alone enforcing them, is difficult.

AI developers are still trying to find the best ways to efficiently implement the guardrails, but rarely do they share exactly how they do it.

Read: Researchers Taught AI To Explain Its Reasoning: Solving The 'Black Box' Problem

OpenAI Model Spec

OpenAI wants to be a bit more open, by introducing what it calls the 'Model Spec', which is literally a collection of high-level rules that indirectly govern ChatGPT and other models.

Among them, include meta-level objectives, some hard rules, and some general behavior guidelines.

Model Spec gives a deeper look into how OpenAI sets its priorities.

In a post on its website, OpenAI said that:

"To deepen the public conversation about how AI models should behave, we’re sharing the Model Spec, our approach to shaping desired model behavior."

"We are sharing a first draft of the Model Spec, a new document that specifies how we want our models to behave in the OpenAI API and ChatGPT. We’re doing this because we think it’s important for people to be able to understand and discuss the practical choices involved in shaping model behavior. The Model Spec reflects existing documentation that we've used at OpenAI, our research and experience in designing model behavior, and work in progress to inform the development of future models. This is a continuation of our ongoing commitment to improve model behavior using human input, and complements our collective alignment work and broader systematic approach to model safety."

OpenAI Model Spec

For example, there is a rule that forces the AI to comply with applicable laws, follow the chain of command, be as helpful as possible without overstepping, ask clarifying questions when necessary, and don’t try to change anyone’s mind.

The rules are essentially instructions that "address complexity and help ensure safety and legality."

The thing is, drawing the line about what is right and what is wrong, isn't that simple, and nor is creating the instructions that cause the AI to adhere to the resulting policy.

Regardless, Model Spec is a start, and is still in its infancy.

While OpenAI isn’t showing its whole hand here, it's indeed helpful to users and developers who wish to see how these rules and guidelines are set and why, set out clearly if not necessarily comprehensively.

Published: 
08/05/2024