The AI industry is trending more than ever, thanks to the rise of Large Language Model-powered generative AIs.
Soon after OpenAI popularized this with the introduction of OpenAI, two could play this game. Then three, four, five, and more followed, each developing their own products and solutions.
While most AI products come from the West, the East isn't falling behind.
China knows well how lucrative this generative AI trend is, and one of the leading tech companies in the country has debuted a text-to-video generative AI to compete directly against the likes of OpenAI Sora, Luma AI's Dream Machine and more.
Following the likes of the more recent Kuaishou with its Kling AI, researchers from Tsinghua University and Zhipu AI have unleashed what they call 'CogVideoX.'
CogVideoX-5B - an open weights Text-to-Video model competitive to Runway, OpenSora, Pika, and Luma!
Build locally: https://t.co/BC1O9JiRdJ@huggingface Gradio app: https://t.co/WSjkMLDedZ— Gradio (@Gradio) August 27, 2024
Hot New Release: CogVideoX-5B, a new text-to-video model from @thukeg group (the group behind GLM LLM series)
- More examples from the 5B model in this thread
- GPU vram requirement on Diffusers: 20.7GB for BF16 and 11.4GB for INT8
- Inference for 50 steps on BF16: 90s on… pic.twitter.com/GAyWmst5GW— Gradio (@Gradio) August 27, 2024
What makes CogVideoX unique in the crowded and the noisy sphere of AI, the AI is able to generate high-quality, coherent videos up to six seconds long from text prompts, outperforming well-known models like OpenAI's Sora across various benchmarks, according to the researchers.
And this time, the team has released CogVideoX-5B, which features 5 billion parameters and delivers videos with a resolution of 720×480 at 8 frames per second.
CogVideoX is also quite speedy.
This is possible because the team uses what's called the 3D Variational Autoencoder (VAE) for efficient video compression and introduced an “expert transformer” to enhance the alignment between text and video.
Although some of CogVideoX's specifications may not rival the cutting-edge proprietary systems, the true innovation of CogVideoX lies in its open-source nature.
By releasing its source code, the Tsinghua team has democratized a technology that was once limited to only a handful of well-funded tech giants.
Prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually.… pic.twitter.com/n0Nh5Vm9pf
— Gradio (@Gradio) August 27, 2024
The idea, is to be able to speed up advancements in AI-generated video by tapping into the collective expertise of the global developer community.
As detailed in a research paper, CogVideoX literally puts advanced video generation capabilities into the hands of the users themselves.
With the strategy, the team is able to make this text-to-video model a threat to some of its Western-based counterparts, with the capacity to disrupt the overall AI landscape.
CogVideoX is initially introduced to only a select number of users through invitations.
Prompt: A white and orange tabby cat is seen happily darting through a dense garden, as if chasing something. Its eyes are wide and happy as it jogs forward, scanning the branches, flowers, and leaves as it walks. The path is narrow as it makes its way between all the plants. the… pic.twitter.com/9AHlzlROIo
— Gradio (@Gradio) August 27, 2024
CogVideoX enters an already crowded AI landscape, adding yet another option to the mix.
While its open-source nature broadens access to powerful generative AI technology, this widespread availability is not without risks.
One of the most concerning issues is the potential misuse of such tools in creating deepfakes or misleading content. As AI-generated video becomes increasingly accessible and sophisticated, we're venturing into uncharted territory in digital content creation.
However, CogVideoX’s open-source approach could be a game-changer, potentially shifting the balance of power from large tech players to a more distributed model of AI development.