This AI Diffusion Model Generates An Immersive 3D Environments In 360°, From Text Input Only

All it takes is a single sentence, and behold, a new, previously non-existing world shall appear.

Artificial Intelligence never fails to impress because it's artificial, and it's intelligent. And this time, the team at Intel Labs, has created an AI Diffusion Model that can turn simple text prompts into a new 360° images.

According to its press release, there is a wide range of potentially valuable uses for this technology.

For example, it can be used to significantly enhance game design, entertainment, architecture, and design, and it’s poised to dramatically change the landscape of content creation and digital experiences.

The technology can also be used for depth mapping, which is a crucial addition to product design and urban planning, as it allows users to visually traverse a rich AI-generated environment rather than a simple, flat rendering.

This could also find its way into training tools of all sorts.

Intel Labs partnered with Blockade Labs to create the technology it calls Latent Diffusion Model for 3D (LDM3D), and showcased at the IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR).

This unique diffusion model is innovative because it can create a 3D visual representation of text prompts.

Using generative AI, the technology popularized by OpenAI's ChatGPT, it allows users to further augment results, and allow the enhancement of human creativity.

Unlike existing latent stable diffusion models, LDM3D allows users to generate an image and a depth map from a given text prompt using almost the same number of parameters, and provides more accurate relative depth for each pixel in an image compared to standard post-processing methods for depth estimation.

The AI then creates results using diffusion process, in vivid and immersive 3D images with a complete 360° view.

This should save developers significant time to develop scenes, as explained by Vasudev Lal, AI/ML research scientist at Intel Labs.

Pioneering in its field, LDM3D is considered the first AI model capable of accomplishing this feat.

"This research paper proposes a Latent Diffusion Model for 3D (LDM3D) that generates both image and depth map data from a given text prompt, allowing users to generate RGBD images from text prompts," the researchers said.

To create this AI, the researchers use a dataset that consists of a subset of 10,000 samples from the LAION-400M database, comprising over 400 million image-caption pairs.

The Dense Prediction Transformer (DPT) large-depth estimation model, previously developed at Intel Labs, was used to annotate the training corpus.

It's this DPT-large model which provides LDM3D its highly accurate relative depth perception.

"We also develop an application called DepthFusion, which uses the generated RGB images and depth maps to create immersive and interactive 360°-view experiences using TouchDesigner. This technology has the potential to transform a wide range of industries, from entertainment and gaming to architecture and design," said the researcher.

The potential impact of this research is far-reaching, because it has the potential to allow even the most ordinary people to visualize practically anything, in entirely new ways.

LDM3D enables the transformation of text descriptions of a tropical beach, a modern skyscraper, or a science-fiction universe into a detailed 360° panorama, using only text prompts.

And the AI's ability to capture depth information can drastically enhance realism and immersion, opening up new applications for a wide range of industries.

In the end, the ability to build AI-assisted worlds can enter the fast track, as immersive simulation opportunities to create something that look and feel more realistic than human programmed ones, is becoming easier.

The project has been made open-sourced on GitHub.

Published:

23/06/2023

Search form

This AI Diffusion Model Generates An Immersive 3D Environments In 360°, From Text Input Only