'Wan2.5' Released: Alibaba Goes Hardcore To Rival Google Veo 3

Reality is more fragile than people like to believe.

We trust our eyes, ears, even our sense of touch to tell us what’s real. That, until technology bends those senses. On a screen, what seems real is often just pixels and sound waves tricked into behaving like life. As visual media becomes more advanced, the line between "what’s happening" and "what’s generated" blurs.

Enter the so called "LLM war."

What started with OpenAI's release of ChatGPT and exploded into a battle over who controls the future of content, the rivalries include companies trying to outpace one another in creating AIs that can generate text, images, code, and video.

While OpenAI, Google, and others raced to dominate predictive language models, Alibaba quietly shifted focus toward the next frontier: synthetic media.

Their answer to the video generation challenge is the Wan-Animate family of models, which aim to bring animation and deepfake-level manipulation into the hands of creators everywhere.

In the same week after releasing Wan2.2-Animate, Alibaba introduces 'Wan2.5,' which steps up video generation.

Today, we're officially launching Wan2.5-Preview! It's set to reshape the future of visual generation with a new architecture and powerful features.
• Architectural Features: Native Multimodality, Deep Alignment
∘ Native Multimodal Architecture: Adopts a new, unified framework…
— Wan (@Alibaba_Wan) September 24, 2025

WAN 2.5 is a significant new development in the world of AI video generation, and that is because of good reasons.

First of, this state-of-the-art AI model represents a major leap forward, focusing on creating hyper-realistic, high-quality videos from simple text or image prompts. What's generating the most buzz is its unique ability to produce an entire video, including synchronized audio, in a single pass.

Wan2.5: One Prompt, Perfect 'Vibe PSing'!
Wan 2.5-Preview is now live with image editing.
Instruction-based Image Editing.Supports a wide range of image-editing tasks and reliably follows instructions.
Visual Elements Consistency.Supports generation from single- or… pic.twitter.com/LyMEuDNv3Y
— Wan (@Alibaba_Wan) September 26, 2025

Unlike many other models that only generate silent video clips, Wan2.5 can create clips with coiceovers and sound effects perfectly matched to the visuals, all from one prompt.

This integrated audio-visual generation is a game-changer, saving creators countless hours of post-production work.

Then, there is the fact that Wan2.5 can create longer videos than many of its predecessors, up to 10 seconds, and in higher resolutions of 480p, 720p, and even 1080p.

One Prompt, Master Image Magic: Meet Wan 2.5-Preview!⁰Wan 2.5-Preview is now live with upgraded image generation capabilities!
Enhanced Aesthetic Quality⁰Realistic lighting, refined details, excels in diverse aesthetic styles and design expressions.
Stable Text Generation… pic.twitter.com/PkMRmhzwxr
— Wan (@Alibaba_Wan) September 26, 2025

Users are also noting a marked improvement in subject consistency, which is crucial for creating coherent stories and characters that don't change from frame to frame.

The model also offers greater control over camera movements, motion, and overall aesthetics, allowing for more precise and creative output. Furthermore, its affordability compared to some of its rivals is making high-quality video generation more accessible to a wider range of users, from independent artists to small businesses.

The impact of Wan2.5 is quickly felt on various online platforms, where creators are using it to quickly producepolished marketing videos with synchronized product descriptions, creating short-form animated stories, and even developing engaging content with clear lip-sync.

The model's multilingual capabilities are also a major plus, as it can reliably generate synchronized videos from prompts in various languages, including Chinese.

The conversations on social media and in online forums are filled with users sharing their impressive results, discussing the best prompts to use, and debating how Wan2.5 stacks up against other AI models.

It's a clear sign that this technology is not just a passing trend but a powerful new tool that is democratizing professional-quality video production.

Where Alibaba’s wan line seems to push hard is in character replacement, fine expression control, and environmental integration, and not just generating a new video from a prompt, but reworking existing images/people into motion.

Google's Veo 3, by contrast, is more about turning text (or images) into short video scenes with synced audio.

In a sense, one is “animate this image by copying motion,” while the other is “create a new scene from your prompt (with sound).”

If Wan2.5 succeeds at integrating high-fidelity audio, voices, ambient sound, and full scene realism, all while maintaining control over characters and consistency with environments, it could narrow the differentiation with Veo 3 and challenge Google’s lead.

The broader implications of Wan2.5's release are significant.

It signals a shift in the generative AI landscape toward more integrated, all-in-one solutions that address the entire creative workflow, from text to a final, ready-to-share video.

The accessibility of such a powerful tool means that the barrier to entry for video production is lower than ever. Whether for marketing, entertainment, or education, tools like Wan2.5 is a catalyst for the next wave of digital content, empowering more people to transform their ideas into visually and aurally compelling stories.

With the ongoing competition and innovation in this space, models like Wan2.5 is one of those that lead the charge, and is rapidly turning what was once a complex, resource-intensive process into a simple, prompt-driven task, forever changing how content is made and consumed.

The war is no longer just about text models; it’s about who owns the future of generative AI.

With Wan2.5, Alibaba is clearly aiming to be a serious contender.

Wan2.5: Let Sound Take the Director’s Chair!
Today, we’re excited to unveil another major feature in our powerful Wan 2.5 Preview: Native Audio-Driven Video Generation.
Now you can use audio input directly for both text-to-video and image-to-video generation. Combine audio… pic.twitter.com/9wQZq4zEEz
— Wan (@Alibaba_Wan) September 28, 2025

Published:

24/09/2025

Dark Mode

Search form

'Wan2.5' Released: Alibaba Goes Hardcore To Rival Google Veo 3