Background

Google Updates Veo 3.1 So It Create Vertical Videos, Upscale To 4K, And More

Veo 3.1

The digital landscape has shifted dramatically, and the competition has never been this fierce.

Since the launch of OpenAI's ChatGPT that ignited what many now call the great LLM war, this sudden surge in generative AI forced the world’s largest tech giants to accelerate years of research into public-facing products. Google’s initial response came in the form of Bard, and later, Gemini.

As the competition intensified, Google unified its efforts under the Gemini brand, moving beyond simple text interfaces to build a truly multimodal ecosystem capable of understanding and generating text, images, and audio with unprecedented reasoning capabilities.

While the first phase of this rivalry focused on conversation, the frontier quickly moved toward motion.

The industry watched as OpenAI teased Sora, and later, released Sora 2, a model that promised to turn text into highly realistic video, signaling a new era of generative cinema.

Google countered this progression by unveiling Veo, its own high-fidelity video generation model. Rather than just matching the competition, Google aimed to integrate these creative powers into the tools people already use, leading to the rapid iteration that has now culminated in the release of Veo 3.1.

Now, Veo 3.1 is getting some updates.

"Today, we are releasing updates to Veo 3.1 in the Gemini API and Google AI Studio, giving developers greater creative control and production-ready quality," said Google in the announcement.

The updates to Veo 3.1 in the Gemini API include:

  1. Enhanced Ingredients to Video: The updated model can now synthesize users' inputs to preserve character identity and background details, ensuring that the generated characters and settings remain consistent across videos.
  2. Native Vertical Format Videos for Ingredients to Video (Portrait Mode): Generate social media-ready 9:16 videos directly. Designed for mobile applications, this mode delivers faster results and optimized composition by generating full-frame vertical video rather than cropping from landscape.
  3. New 4K and Improved 1080p definition: Unlock professional fidelity directly within workflows. The model can now generate clearer 1080p, and creates 4K videos suitable for the big screen.

The latest evolution of Veo 3.1 through the Gemini API represents a significant leap toward production-ready AI.

One of the most impactful additions is the 'Ingredients to Video' feature, which allows creators to use up to three reference images to maintain strict character and background consistency. This solves one of the biggest hurdles in AI video: the tendency for subjects to change appearance between clips.

Developers and filmmakers can now ensure that a protagonist or a specific setting remains identical across an entire sequence, providing the creative control necessary for professional storytelling.

Recognizing the shift in how media is consumed, Veo 3.1 now supports native vertical video generation in a 9:16 aspect ratio.

This is a game-changer for mobile-first platforms like YouTube Shorts, as it eliminates the need to crop landscape footage and allows the AI to compose shots specifically for the vertical frame.

Combined with state-of-the-art upscaling to 1080p and 4K, the output is no longer just a grainy experimental clip but high-definition content suitable for large screens and professional workflows.

Beyond the visuals, Veo 3.1 bridges the gap between sight and sound with native audio generation.

The model can now synthesize synchronized sound effects and even dialogue that matches the context of the scene, creating a complete cinematic experience from a single prompt. For developers using the Gemini API and Vertex AI, these updates include advanced tools like "Scene Extension" and "First and Last Frame" transitions, which allow for the creation of longer, seamless narratives.

By embedding SynthID digital watermarking into every video, Google is also establishing a standard for transparency, ensuring that as AI video becomes indistinguishable from reality, it remains traceable and responsible.

Despite these monumental strides, the journey toward perfect AI video generation is not without its hurdles.

The computational cost remains immense, often leading to significant wait times for rendering high-resolution clips compared to the near-instantaneous nature of text generation. There are also persistent technical challenges, such as "hallucinations" where physics, like the way a liquid pours or how limbs move during complex choreography, can still appear uncanny or distorted.

Furthermore, the ethical landscape remains a minefield; as these tools become more accessible, the potential for high-quality deepfakes necessitates an ongoing arms race between generative power and detection technologies like SynthID.

Veo 3.1

Looking ahead, what's certain is that, the barrier between imagination and visual reality virtually disappears.

In the meantime, the phase is a focus on real-time interactivity, where users can direct a digital scene as it unfolds, changing lighting or camera angles on the fly. As Veo and its competitors evolve, the democratization of filmmaking will likely reach a point where a single creator can produce a feature-length film with the production value of a major studio.

The focus will eventually shift from the novelty of the technology itself to the quality of the stories being told, ushering in a new era of human-AI collaborative artistry.

Published: 
15/01/2026