Google Introduces 'Veo 2' And Improves 'Imagen 3': A State-Of-The-Art Pair Of Presents

2024 is coming to an end, and presents are pouring.

Welcoming the magic of year-end festivities, the AI research laboratory Google DeepMind has announced 'Veo 2' and 'Imagen 3,' which are essentially the company's next-generation AI tools that could propel the Google-owned company ahead of most of its rivals.

In a blog post, Google said that:

"Earlier this year, we introduced our video generation model, Veo, and our latest image generation model, Imagen 3. Since then, it’s been exciting to watch people bring their ideas to life with help from these models: YouTube creators are exploring the creative possibilities of video backgrounds for their YouTube Shorts, enterprise customers are enhancing creative workflows on Vertex AI and creatives are using VideoFX and ImageFX to tell their stories."

"Together with collaborators ranging from filmmakers to businesses, we’re continuing to develop and evolve these technologies."

Today, we’re announcing Veo 2: our state-of-the-art video generation model which produces realistic, high-quality clips from text or image prompts.

We’re also releasing an improved version of our text-to-image model, Imagen 3 - available to use in ImageFX through… pic.twitter.com/h6ejHaMUM4

— Google DeepMind (@GoogleDeepMind) December 16, 2024

First, of, Veo 2 is the successor of Veo, the company's flagship video-generation tool, which excels at producing high-quality videos across diverse subjects and styles.

According to Google DeepMind, the AI is capable of higher realism and an improved understanding of movement, physics, and cinematic techniques.

And better, it can also generate 4K videos and handle complex prompts — like specific camera lenses.

This is possible because Veo 2 has a deeper understanding of real-world physics and the subtleties of human movement and expression, and as a result, Veo 2 can enhance detail and realism.

Its grasp of cinematographic language allows creators to specify genres, lenses, and cinematic effects for customized results. Veo 2 delivers content at resolutions up to 4K and extends video lengths to several minutes.

What this means, Veo 2 should be considered one of the most powerful text-to-video generator, capable of setting a new benchmark is generative video creation.

Read: Google Introduces 'Veo', Its 'Most Capable Generative Video Model', As Well As 'Imagen 3'

[block:block=87]

Veo 2 is able to:
Create videos at resolutions up to 4k
Understand camera controls in prompts, such as wide shot, POV and drone shots
Better recreate real-world physics and realistic human expression

In head-to-head comparisons of outputs by human raters, it was… pic.twitter.com/doC3GwY30z

— Google DeepMind (@GoogleDeepMind) December 16, 2024

Next, is Imagen 3, which has been improved. According to Google, it can:

Produce diverse art styles: realism, fantasy, portraiture and more.
More faithfully turn prompts into accurate images.
Generate brighter, more compositionally balanced visuals.

In other words, Imagen 3 can now generate better images with greater accuracy — from photorealism to impressionism, from abstract to anime.

Imagen 3’s capabilities are now in ImageFX in 111 countries - and we're excited to roll out Veo 2 into VideoFX as well, expanding availability next year.

Sign up now https://t.co/AJn6K1zScG pic.twitter.com/SCXLb1MnUs

— Google DeepMind (@GoogleDeepMind) December 16, 2024

Following these two "state-of-the-art" AI tools, is 'Whisk, which is a playful new experiment empowers users to input or create images that reflect the subject, scene, and style they envision.

For example, users can merge and remix visuals to craft something uniquely their own, from a digital plushie to an enamel pin or sticker.

Behind the scenes, Whisk integrates the advanced Imagen 3 model with Gemini’s visual understanding and description capabilities.

Here, Gemini automatically generates detailed captions for the images, to then feed these descriptions into Imagen 3.

This process enables effortless remixing of subjects, scenes, and styles, unlocking endless creative possibilities.

Whisk is purposefully designed to remix visuals in creative ways.