Creator Of Stable Diffusion Releases 'Gen-2' AI That Can Create Videos From Text

Gen-2

The AI field was dull and quiet, and the buzz it created mostly happened within its own field, and rarely reach far beyond its own audience.

But when OpenAI introduced ChatGPT as a AI chatbot tool, the internet was quickly captivated. The AI is able to do a wide range of tasks, including writing poetry, technical papers, novels, and essays.

Let's not forget how other OpenAI's AIs before ChatGPT paved the way towards this hype.

Examples include DALL·E and DALL·E 2.

These two are systems that can create realistic images and art from a description in natural language. Google also has what it calls the Imagen.

Stable Diffusion is another similar technology, and this time, the creator of the AI is taking things to a whole new level.

Runway unleashed 'Gen-2', which can turn text into videos.

"The late afternoon sun peeking through the window of a New York City loft" prompt.

Runway, the startup that co-created the popular Stable Diffusion AI image generator, announced an AI model that can take any text description.

But instead of generating images, the AI generates short video clips from the description.

This 'Gen-2' AI model, is simply an improved version of its existing Gen-1 AI that debuted in February.

According to the company, Gen-2 can generate higher-fidelity clips than its predecessor.

Moreover, the model provides more customization options for users.

For example, Runway’s original Gen-1 neural network takes an existing video as input along with a text prompt that describes what edits should be made. A user could, for example, provide Gen-1 with a video of a green car and a text prompt that reads “paint the car red,” and that the AI will then change the color of the car to red.

Gen-1 can also modify a video by adapting it to the style of a reference image provided by the user.

Gen-2 however, takes things further.

According to Runway, it adds another way of generating clips. For example, it doesn’t require a source video or reference image and allows users to create videos simply by entering a text prompt.

The New York-based company wants the AI to help creative professionals in editing videos.

Runway detailed the technology that powers the model in an academic paper, saying that the model uses an AI method known as diffusion to generate videos.

With the diffusion method, the researchers add a type of error called Gaussian noise to a file.

After that, they trained a neural network to remove the Gaussian noise and restore the original file. By repeating this process many times, the neural network learns how to analyze the input data it receives and turn it into a new file that matches the user’s specifications.

Runaway helped with the creation of this AI, by training the model with a dataset that comprised of 240 million images and 6.4 million video clips.

According to its tests that evaluated Gen-2’s capabilities, Runway found that Gen-2 significantly outperformed two of the most advanced AI models in the same category.

Besides being a text to video generator, the AI is also a text+image to video generator, in which it can generate a video using a driving image and a text prompt; and an image to video generator, in which it can generate video using just a driving image.

It can also stylize videos, in which it can transfer the style of any image or prompt to every frame of a video. The AI can also turn mockups into fully stylized and animated renders, isolate subjects in a video and modify them with simple text prompts, turn untextured renders into realistic outputs by applying an input image or prompt, and more.

Initially, Runway isn't releasing Gen-2 AI model to the public, nor publishing it as an open-sourced project, like Stable Diffusion.

The startup cited safety and business reasons for its decision.

Instead, the text-to-video model shall be available on Discord via a waitlist on the Runway website.

Runway is not the only company developing AI models capable of generating videos.

Meta Platforms Inc. for example, has a similar model it calls the Make-A-Video AI.

Just like Gen-2, it can generate clips based on text prompts.

Published: 
22/03/2023