Google DeepMind Unveils AI's Ability To Generate Playable Virtual Worlds From Text Prompts

Genie 2

With just one sentence, a new world can spring into existence, proving that AI consistently amazes—because it’s both artificial and undeniably intelligent.

It took the technology just a few years, before being capable of generating text responses, to then create 2D imagery, to then creating fully 3D videos. Google, the tech giant, is taking this a bit further with what it calls the 'Genie 2.'

DeepMind Technologies Limited, also known by its trade name Google DeepMind, is an AI research laboratory which serves as a subsidiary of Google after it was acquired.

This time, the company shows that generative AI not only can create still images or just videos, because Large Language Models can also be made to create an entire new world from nothing.

All it takes is just a text prompt.

With Genie 2, Google has reached a groundbreaking milestone: the ability to generate playable 3D game worlds instantly.

In a blog post, Google DeepMind said that:

"Today we introduce Genie 2, a foundation world model capable of generating an endless variety of action-controllable, playable 3D environments for training and evaluating embodied agents. Based on a single prompt image, it can be played by a human or AI agent using keyboard and mouse inputs."

From Nvidia's GET3D to OpenAI's Point·E and Shap·E to Intel's LDM3D among many others, AI has came a long way before really understanding the real world.

For DeepMind, games play a key role in its various researches.

From pioneering Atari gameplay to groundbreaking achievements like AlphaGo, which bested the world’s top Go player, and AlphaStar, which attained Grandmaster status in StarCraft II, games have always been central to the evolution of the company’s AI innovations.

And Genie 2 here can generate a "vast diversity of rich 3D worlds."

And the result is more than just astounding.

Genie 2, which is the successor to Genie, which was released earlier this 2024, was trained on videos.

This is to make the AI capable of simulating object interactions, animations, lighting, physics, reflections, and the behavior of NPCs.

From just a single image and text description, Genie 2 can generate an interactive, real-time scenes.

But what makes it on a league on its own is that, it create worlds where users can take actions like jumping and swimming by using a mouse or keyboard.

The AI can also generate diverse trajectories from the same starting frame, which means it is possible to simulate counterfactual experiences for training agents, capable of remembering parts of the world that are no longer in view and then rendering them accurately when they become observable again, create content on the fly with different perspectives, create objects that can interacted with, animate characters, and more.

"As we show, Genie 2 could enable future agents to be trained and evaluated in a limitless curriculum of novel worlds. Our research also paves the way for new, creative workflows for prototyping interactive experiences," said DeepMind.

Published: 
05/12/2024