This 'CogVideoX-5B' Wants To Disrupt The Text-To-Video AI Industry Using Open-Source

CogVideoX

The AI industry is trending more than ever, thanks to the rise of Large Language Model-powered generative AIs.

Soon after OpenAI popularized this with the introduction of OpenAI, two could play this game. Then three, four, five, and more followed, each developing their own products and solutions.

While most AI products come from the West, the East isn't falling behind.

China knows well how lucrative this generative AI trend is, and one of the leading tech companies in the country has debuted a text-to-video generative AI to compete directly against the likes of OpenAI Sora, Luma AI's Dream Machine and more.

Following the likes of the more recent Kuaishou with its Kling AI, researchers from Tsinghua University and Zhipu AI have unleashed what they call 'CogVideoX.'

What makes CogVideoX unique in the crowded and the noisy sphere of AI, the AI is able to generate high-quality, coherent videos up to six seconds long from text prompts, outperforming well-known models like OpenAI's Sora across various benchmarks, according to the researchers.

And this time, the team has released CogVideoX-5B, which features 5 billion parameters and delivers videos with a resolution of 720×480 at 8 frames per second.

CogVideoX is also quite speedy.

This is possible because the team uses what's called the 3D Variational Autoencoder (VAE) for efficient video compression and introduced an “expert transformer” to enhance the alignment between text and video.

Although some of CogVideoX's specifications may not rival the cutting-edge proprietary systems, the true innovation of CogVideoX lies in its open-source nature.

By releasing its source code, the Tsinghua team has democratized a technology that was once limited to only a handful of well-funded tech giants.

The idea, is to be able to speed up advancements in AI-generated video by tapping into the collective expertise of the global developer community.

As detailed in a research paper, CogVideoX literally puts advanced video generation capabilities into the hands of the users themselves.

With the strategy, the team is able to make this text-to-video model a threat to some of its Western-based counterparts, with the capacity to disrupt the overall AI landscape.

CogVideoX is initially introduced to only a select number of users through invitations.

CogVideoX enters an already crowded AI landscape, adding yet another option to the mix.

While its open-source nature broadens access to powerful generative AI technology, this widespread availability is not without risks.

One of the most concerning issues is the potential misuse of such tools in creating deepfakes or misleading content. As AI-generated video becomes increasingly accessible and sophisticated, we're venturing into uncharted territory in digital content creation.

However, CogVideoX’s open-source approach could be a game-changer, potentially shifting the balance of power from large tech players to a more distributed model of AI development.

Published: 
28/08/2024