The AI industry was quiet and dull. It rarely made ripples that disrupt anything outside its own realm.
But this abruptly changed after OpenAI released ChatGPT. Since then, tech companies, large and small, race to develop the best AI to serve various purposes and solve different use cases.
While most products come from the West, the East is apparently not far behind.
Alibaba is one of the most prominent tech companies from China, and here, its researchers from Alibaba Group's Intelligent Computing Research Institute have revealed what they call 'MIMO.'
Short for 'Mimic anyone anywhere in complex Motions with Object interactions,' it uses AI to transform a photo of a character into an animated version of it, through video synthesis.
The idea is to use AI to imagine what a character looks like in motion, through just a 2D photo of that character.
The goal is to be able to create complex movements of that intended character.
On MIMO's GitHub page, the researchers wrote that:
"As a fundamental problem in the computer vision and graphics community, 3D works typically require multi-view captures for per-case training, which severely limits their applicability of modeling arbitrary characters in a short time. Recent 2D methods break this limitation via pre-trained diffusion models, but they struggle for pose generality and scene interaction."
"To this end, we propose MIMO, a novel generalizable model which can not only synthesize character videos with controllable attributes (i.e., character, motion and scene) provided by simple user inputs, but also simultaneously achieve advanced scalability to arbitrary characters, generality to novel 3D motions, and applicability to interactive real-world scenes in a unified framework."
In the examples provided, MIMO can animate human, cartoon or personified ones from just a single image, and put that character to a targeted video.
For example, by supplying a basketball video and a cartoon character image, users can effortlessly swap the real player in the video with a paper-cut character, bringing it to life instantly.
Even when replacing one character with another, even when it's a real person, a cartoon, or an anthropomorphic figure, MIMO ensures a seamless, natural transition without too much visible awkwardness.
To do what it does, MIMO is able to understand an image's 2D frame pixels in order to transform it into 3D using monocular depth estimators.
It can also decompose the target video clip into three spatial components (i.e., main human, underlying scene, and floating occlusion) in hierarchical layers based on the 3D depth.
"These components are further encoded to canonical identity code, structured motion code and full scene code, which are utilized as control signals of synthesis process," the page added.
MIMO's core advantage lies in its simplicity and efficiency.
Users only need to provide a reference image or video, and the system swiftly generates a controllable animated character. Unlike traditional 3D character creation, which often requires multi-angle shots or extensive training, MIMO smartly combines 2D video data with 3D spatial modeling, dramatically speeding up the character creation process.
Users can also mix and match different elements—whether it's a single character image, action sequences, or full scene videos—to produce diverse animation effects.
This flexibility simplifies the creative process, making animation production more accessible and enjoyable.
The potential applications for MIMO are vast, spanning virtual human animation, movie special effects, and game character design.
Not only does MIMO handle basic motion control, but it can also extract complex movements from real-world videos and apply them to virtual characters.
This allows for natural interactions between virtual characters and objects in real-world environments, managing occlusion, depth of field, and delivering highly realistic animation effects.
The launch of MIMO represents a major breakthrough in virtual character creation.
It not only streamlines the creative process but also greatly boosts efficiency, enabling more users to quickly turn their ideas into vivid animations.
As this technology becomes more widespread, it promises to revolutionize the animation, gaming, and film industries, fueling new advancements in the digital creative space.