OpenAI 'Sora' Is A Video Generation AI That Create Videos From Text: World Simulators

OpenAI Sora

AI is restricted to its training data, and that it must not be able to imagine the world the way humans would. That, is a past tense.

And that, is also an understatement, because since generative AI comes into play, the world is in the eyes of computers, is becoming a simulator. And as time advances, that simulator becomes increasingly difficult to tell apart from reality.

Since OpenAI introduced DALL·E, and later DALL·E 2 and DALL·E 3, the company is trying to give AI the ability to imagine the world without having to even experience it.

This time, the company goes a step further, by introducing what it calls the 'Sora'.

Back in 2022, Meta introduced an AI that can create videos from text prompts, and then in 2023, the creator of Stable Diffusion also launched an AI that can generate video clips from text prompts. Later, Meta again released a set of AIs that can edit videos from text prompts.

Among some others, these generative AIs tend to create results that are somehow cartoonish, or dreamlike.

Results tend to be blurry, choppy, distorted, and at certain times, blatantly disturbing.

But in just months later, the San Francisco start-up OpenAI has unveiled a similar system that creates videos that look as if they were lifted from a Hollywood movie. OpenAI's Sora is able to make footage that seems to be either taken by professional cameramen, or created by CGI artists.

In short, the woolly mammoths trotting through a snowy meadow, a monster gazing at a melting candle and a Tokyo street scene, are too canny to be uncanny.

To use the AI, users can simply give it a brief, or even detailed description or a still image.

Sora then can generate 1080p movie-like scenes with multiple characters, different types of motion and background details. What's more, Sora can also "extend" existing video clips, with the ability to also fill in missing details.

For starters, Sora can generate videos in a range of styles (e.g., photorealistic, animated, black and white) up to a minute long.

"Sora has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions," OpenAI said. "The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world."

While samples are cherry-picked, meaning that only the best results are shared, but still they do look impressive, at least when compared to the other text-to-video technologies.

"Our results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world," said OpenAI in a technical paper report.

OpenAI calls its system Sora, after the Japanese word for sky.

The team behind the technology, including the researchers Tim Brooks and Bill Peebles, chose the name because it “evokes the idea of limitless creative potential.”

As for the training data, OpenAI declined to say how many videos the system used or where the data sets came from. The company only said that the training included both publicly available videos and videos that were licensed from copyright holders.

While results can be mind-blowing, there are some things that need to be addressed before Sora becomes 'perfect.'

For example, some results can have a video game-like quality. Results can still include some AI weirdness, like objects moving in physically impossible directions.

OpenAI also admitted some of the issues as well, acknowledging that the model isn't perfect.

"[Sora] may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark. The model may also confuse spatial details of a prompt, for example, mixing up left and right, and may struggle with precise descriptions of events that take place over time, like following a specific camera trajectory," the company said.

Among other reasons, this is why OpenAI isn't releasing the AI to the public.

Instead, OpenAI is sharing the technology with a small group of academics and other outside researchers who will “red team” it, a term for looking for ways it can be misused.

The company said that it's still working to understand the issues, and also address some of the dangers..

For starters, this kind of technology can open the path towards more online disinformation and malinformation.

With Sora, pretty much anyone can create and disseminate fakery, and that they can be so convincing that it's hard to tell what's real on the internet.

OpenAI correctly points out that bad actors could easily misuse a model like Sora in myriad ways.

OpenAI said that it's working with experts to probe the model for exploits and building tools to detect whether a video was generated by Sora.

The company also said that, should it choose to build the model into a public-facing product, it’ll ensure that provenance metadata is included in the generated outputs.

"We’ll be engaging policymakers, educators and artists around the world to understand their concerns and to identify positive use cases for this new technology," OpenAI said. "Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it. That’s why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time."

OpenAI is also tagging videos produced by the system with watermarks that identify them as being generated by AI.

But the company acknowledges that these can be removed.

With the internet as a learning ground, generative AI products that are powered by Large Language Models are becoming increasingly better in doing what they do.

They have improved so quickly that they can produce images and videos that nearly indistinguishable from the real things.

While technologies like this can help people do their work, and speed up projects, many digital artists are complaining that it has made it harder for them to find work.

At first, people were laughing when AI image creators came out, and that they were amused by it.

This time, people are scared of it because they're losing their jobs.

Published: 
16/02/2024