Google Introduces 'Veo 2' And Improves 'Imagen 3': A State-Of-The-Art Pair Of Presents

Google Veo 2 example

2024 is coming to an end, and presents are pouring.

Welcoming the magic of year-end festivities, the AI research laboratory Google DeepMind has announced 'Veo 2' and 'Imagen 3,' which are essentially the company's next-generation AI tools that could propel the Google-owned company ahead of most of its rivals.

In a blog post, Google said that:

"Earlier this year, we introduced our video generation model, Veo, and our latest image generation model, Imagen 3. Since then, it’s been exciting to watch people bring their ideas to life with help from these models: YouTube creators are exploring the creative possibilities of video backgrounds for their YouTube Shorts, enterprise customers are enhancing creative workflows on Vertex AI and creatives are using VideoFX and ImageFX to tell their stories."

"Together with collaborators ranging from filmmakers to businesses, we’re continuing to develop and evolve these technologies."

First, of, Veo 2 is the successor of Veo, the company's flagship video-generation tool, which excels at producing high-quality videos across diverse subjects and styles.

According to Google DeepMind, the AI is capable of higher realism and an improved understanding of movement, physics, and cinematic techniques.

And better, it can also generate 4K videos and handle complex prompts — like specific camera lenses.

This is possible because Veo 2 has a deeper understanding of real-world physics and the subtleties of human movement and expression, and as a result, Veo 2 can enhance detail and realism.

Its grasp of cinematographic language allows creators to specify genres, lenses, and cinematic effects for customized results. Veo 2 delivers content at resolutions up to 4K and extends video lengths to several minutes.

What this means, Veo 2 should be considered one of the most powerful text-to-video generator, capable of setting a new benchmark is generative video creation.

Read: Google Introduces 'Veo', Its 'Most Capable Generative Video Model', As Well As 'Imagen 3'

Next, is Imagen 3, which has been improved. According to Google, it can:

  • Produce diverse art styles: realism, fantasy, portraiture and more.
  • More faithfully turn prompts into accurate images.
  • Generate brighter, more compositionally balanced visuals.

In other words, Imagen 3 can now generate better images with greater accuracy — from photorealism to impressionism, from abstract to anime.

Following these two "state-of-the-art" AI tools, is 'Whisk, which is a playful new experiment empowers users to input or create images that reflect the subject, scene, and style they envision.

For example, users can merge and remix visuals to craft something uniquely their own, from a digital plushie to an enamel pin or sticker.

Behind the scenes, Whisk integrates the advanced Imagen 3 model with Gemini’s visual understanding and description capabilities.

Here, Gemini automatically generates detailed captions for the images, to then feed these descriptions into Imagen 3.

This process enables effortless remixing of subjects, scenes, and styles, unlocking endless creative possibilities.

Whisk is purposefully designed to remix visuals in creative ways.

Published: 
18/12/2024