Success depends on how far and high one dares to venture .
For a while, everything was calm—until disruption struck. When OpenAI unveiled ChatGPT, it ignited an arms race among tech giants. Sensing the massive potential of Large Language Models, companies rushed to develop and release their own AI systems, pushing out upgrades at a breakneck pace.
The world of generative AI is evolving at lightning speed.
Initially, the competition was centered in the West, dominated by industry leaders like OpenAI, Google, Meta, and Apple, along with smaller players such as Perplexity and Anthropic. But as the AI revolution gained momentum, the East took notice.
Seeing the opportunity, tech firms across Asia entered the race, eager to carve out their place in this rapidly shifting landscape.
Following text generators, follow text-to-image generators, text-to-video generators also image-to-video generators.
Since OpenAI announced Sora, and released it a bit too late, Chinese companies were already picking up their pace to catch up.
From Luma AI's Dream Machine, Kuaishou with its Kling AI, researchers from Tsinghua University and Zhipu AI with CogVideoX, MiniMax with Video-01, ByteDance with OmniHuman-1 and Goku, Alibaba is not far behind.
After introducing MIMO, the tech titan from China came up with 'Wan 2.1'.
Alibaba has made its advanced image- and video-generating AI model, Wan 2.1, open source, allowing developers to access, modify, and enhance its code and architecture.
The company introduced four versions—T2V-1.3B, T2V-14B, I2V-14B-720P, and I2V-14B-480P—each designed to enhance accuracy in image and video generation.
The "14B" designation indicates models utilizing 14 billion parameters, enabling them to process vast datasets and generate high-quality visuals. The I2V-14B and T2V-14B variants produce videos in 480P and 720P, with T2V-14B being the only version supporting both Chinese and English.
For consumer accessibility, according to its GitHub page, the T2V-1.3B is optimized to run on standard hardware (consumer-grade GPUs), requiring 8.19 GB of VRAM on an RTX 4090 to generate a five-second 480P video in four minutes.
These models support a range of applications, including text-to-video, image-to-video, video editing, text-to-image, and video-to-audio capabilities.
Competing directly with existing AI video generators, Wan 2.1 managed to outperform Sora on the VBench Leaderboard, which evaluates video generation quality across 16 dimensions, including subject identity inconsistency, motion smoothness, temporal flickering, and spatial relationships.
According to its Hugging Face page, Wan 2.1 is considered an "advanced large-scale video generative models."

Alibaba attributes the technical advancements in Wan 2.1 to several key innovations, including a new spatio-temporal variational autoencoder (VAE), scalable pre-training strategies, large-scale data construction, and automated evaluation metrics. These improvements enhance the model’s ability to generate high-quality images and videos with greater efficiency and accuracy.
“We propose a novel 3D causal VAE architecture specifically designed for video generation,” the company said. The model implements a feature cache mechanism, reducing memory usage and preserving temporal causality.
Wan 2.1 also employs the Flow Matching framework within the Diffusion Transformer (DiT) paradigm, which integrates the T5 encoder to process multi-language text inputs with cross-attention mechanisms.
"Our experimental findings reveal a significant performance improvement with this approach at the same parameter scale," the company said.

Wan2.1’s data pipeline involved curating and deduplicating 1.5 billion videos and 10 billion images.
Alibaba’s move follows a growing trend in China, with companies like DeepSeek also open-sourcing their AI models.
In addition to Wan 2.1, Alibaba also previewed QwQ-Max, an upcoming reasoning model in its Qwen AI family, set to be open-sourced upon its full release.
To reinforce its AI and cloud computing ambitions, Alibaba announced a massive 380 billion yuan ($52 billion) investment over the next three years, signaling its commitment to the future of AI innovation.