OpenAI 'Sora' Is A Video Generation AI That Create Videos From Text: World Simulators

AI is restricted to its training data, and that it must not be able to imagine the world the way humans would. That, is a past tense.

And that, is also an understatement, because since generative AI comes into play, the world is in the eyes of computers, is becoming a simulator. And as time advances, that simulator becomes increasingly difficult to tell apart from reality.

Since OpenAI introduced DALL·E, and later DALL·E 2 and DALL·E 3, the company is trying to give AI the ability to imagine the world without having to even experience it.

This time, the company goes a step further, by introducing what it calls the 'Sora'.

Introducing Sora, our text-to-video model.

Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. https://t.co/7j2JN27M3W

Prompt: “Beautiful, snowy… pic.twitter.com/ruTEWn87vf
— OpenAI (@OpenAI) February 15, 2024

Back in 2022, Meta introduced an AI that can create videos from text prompts, and then in 2023, the creator of Stable Diffusion also launched an AI that can generate video clips from text prompts. Later, Meta again released a set of AIs that can edit videos from text prompts.

Among some others, these generative AIs tend to create results that are somehow cartoonish, or dreamlike.

Results tend to be blurry, choppy, distorted, and at certain times, blatantly disturbing.

But in just months later, the San Francisco start-up OpenAI has unveiled a similar system that creates videos that look as if they were lifted from a Hollywood movie. OpenAI's Sora is able to make footage that seems to be either taken by professional cameramen, or created by CGI artists.

In short, the woolly mammoths trotting through a snowy meadow, a monster gazing at a melting candle and a Tokyo street scene, are too canny to be uncanny.

Prompt: “Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance… pic.twitter.com/Um5CWI18nS
— OpenAI (@OpenAI) February 15, 2024

Prompt: “A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.” pic.twitter.com/0JzpwPUGPB
— OpenAI (@OpenAI) February 15, 2024

Prompt: “A gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures.” pic.twitter.com/gzEE8SwP81
— OpenAI (@OpenAI) February 15, 2024

Prompt: “A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. she wears a black leather jacket, a long red dress, and black boots, and carries a black purse. she wears sunglasses and red lipstick. she walks confidently and casually.… pic.twitter.com/cjIdgYFaWq
— OpenAI (@OpenAI) February 15, 2024

Prompt: “Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle. the art style is 3d and realistic, with a focus on lighting and texture. the mood of the painting is one of wonder and curiosity, as the monster gazes at the flame with… pic.twitter.com/aLMgJPI0y6
— OpenAI (@OpenAI) February 15, 2024

here is sora, our video generation model:https://t.co/CDr4DdCrh1

today we are starting red-teaming and offering access to a limited number of creators.@_tim_brooks @billpeeb @model_mechanic are really incredible; amazing work by them and the team.

remarkable moment.
— Sam Altman (@sama) February 15, 2024

To use the AI, users can simply give it a brief, or even detailed description or a still image.

Sora then can generate 1080p movie-like scenes with multiple characters, different types of motion and background details. What's more, Sora can also "extend" existing video clips, with the ability to also fill in missing details.

For starters, Sora can generate videos in a range of styles (e.g., photorealistic, animated, black and white) up to a minute long.

"Sora has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions," OpenAI said. "The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world."

While samples are cherry-picked, meaning that only the best results are shared, but still they do look impressive, at least when compared to the other text-to-video technologies.

"Our results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world," said OpenAI in a technical paper report.

OpenAI calls its system Sora, after the Japanese word for sky.

The team behind the technology, including the researchers Tim Brooks and Bill Peebles, chose the name because it “evokes the idea of limitless creative potential.”

As for the training data, OpenAI declined to say how many videos the system used or where the data sets came from. The company only said that the training included both publicly available videos and videos that were licensed from copyright holders.

https://t.co/SOUoXiSMBY pic.twitter.com/JB4zOjmbTp
— Sam Altman (@sama) February 15, 2024

While results can be mind-blowing, there are some things that need to be addressed before Sora becomes 'perfect.'

For example, some results can have a video game-like quality. Results can still include some AI weirdness, like objects moving in physically impossible directions.

OpenAI also admitted some of the issues as well, acknowledging that the model isn't perfect.

"[Sora] may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark. The model may also confuse spatial details of a prompt, for example, mixing up left and right, and may struggle with precise descriptions of events that take place over time, like following a specific camera trajectory," the company said.

Among other reasons, this is why OpenAI isn't releasing the AI to the public.

Instead, OpenAI is sharing the technology with a small group of academics and other outside researchers who will “red team” it, a term for looking for ways it can be misused.

The company said that it's still working to understand the issues, and also address some of the dangers..

https://t.co/WJQCMEH9QG pic.twitter.com/Qa51e18Vph
— Sam Altman (@sama) February 15, 2024

For starters, this kind of technology can open the path towards more online disinformation and malinformation.

With Sora, pretty much anyone can create and disseminate fakery, and that they can be so convincing that it's hard to tell what's real on the internet.

OpenAI correctly points out that bad actors could easily misuse a model like Sora in myriad ways.

OpenAI said that it's working with experts to probe the model for exploits and building tools to detect whether a video was generated by Sora.

The company also said that, should it choose to build the model into a public-facing product, it’ll ensure that provenance metadata is included in the generated outputs.

"We’ll be engaging policymakers, educators and artists around the world to understand their concerns and to identify positive use cases for this new technology," OpenAI said. "Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it. That’s why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time."

OpenAI is also tagging videos produced by the system with watermarks that identify them as being generated by AI.

But the company acknowledges that these can be removed.

https://t.co/rPqToLo6J3 pic.twitter.com/nPPH2bP6IZ
— Sam Altman (@sama) February 15, 2024

here is a better one: https://t.co/WJQCMEH9QG pic.twitter.com/oymtmHVmZN
— Sam Altman (@sama) February 15, 2024

https://t.co/P26vJHlw06 pic.twitter.com/AW9TfYBu3b
— Sam Altman (@sama) February 15, 2024

https://t.co/uCuhUPv51N pic.twitter.com/nej4TIwgaP
— Sam Altman (@sama) February 15, 2024

https://t.co/rmk9zI0oqO pic.twitter.com/WanFKOzdIw
— Sam Altman (@sama) February 15, 2024

https://t.co/qbj02M4ng8 pic.twitter.com/EvngqF2ZIX
— Sam Altman (@sama) February 15, 2024

With the internet as a learning ground, generative AI products that are powered by Large Language Models are becoming increasingly better in doing what they do.

They have improved so quickly that they can produce images and videos that nearly indistinguishable from the real things.

While technologies like this can help people do their work, and speed up projects, many digital artists are complaining that it has made it harder for them to find work.

At first, people were laughing when AI image creators came out, and that they were amused by it.

This time, people are scared of it because they're losing their jobs.

Published:

16/02/2024

Search form

OpenAI 'Sora' Is A Video Generation AI That Create Videos From Text: World Simulators