
Computers cannot dream because they don't have senses to understand their surrounding. But with AI, this is changing.
By feeding AIs a bunch of data gathered from various datasets and the internet itself, AIs have become increasingly smarter, and seemingly gain a better knowledge about how the real world. And with generative AI becoming the hype, Meta doesn't want to be left behind.
This time, the tech titan is announcing two new AI-powered imaging tools for Facebook and Instagram that can work with both videos and images.
The features are built on Emu, the AI software at the heart of Meta’s AI offerings, which according to the company in a blog post, "underpins many of our generative AI experiences."
Today we’re sharing two new advances in our generative AI research: Emu Video & Emu Edit.
Details https://t.co/qm8aejgNtd
These new models deliver exciting results in high quality, diffusion-based text-to-video generation & controlled image editing w/ text instructions.
pic.twitter.com/1wF7r773yc— AI at Meta (@AIatMeta) November 16, 2023
First, is 'Emu Video'.
What it does, is leveraging the Emu model to generate video from text prompts and also from still images.
Meta had previously made an AI video generator called Make-A-Video. But Emu Video here is a big improvement over Make-A-Video.
"Our state-of-the-art approach is simple to implement and uses just two diffusion models to generate 512×512 four-second long videos at 16 frames per second," explained Meta.
"In human evaluations, our video generations are strongly preferred compared to prior work—in fact, this model was preferred over Make-A-Video by 96% of respondents based on quality and by 85% of respondents based on faithfulness to the text prompt."
Emu Video
This new text-to-video model leverages our Emu image generation model and can respond to text-only, image-only or combined text & image inputs to generate high quality video.
Details https://t.co/88rMeonxup
It uses a factorized approach that not only allows us… pic.twitter.com/VBPKn1j1OO— AI at Meta (@AIatMeta) November 16, 2023
Second, is 'Emu Edit'.
This AI tool allows users to alter images based on text inputs.
While this is extremely similar to what Adobe Photoshop’s Generative Fill can do, but what differentiates Emu Edit is that users don’t have to actually select the element they want to change.
According to Meta, all users have to do, is describe what they want to change, and the AI will understand that request and comply.
For example, the user can just write “remove the person” and without selecting anything, and have the AI remove the person from the image automatically.
"Emu Edit is capable of free-form editing through instructions, encompassing tasks such as local and global editing, removing and adding a background, color and geometry transformations, detection and segmentation, and more," said Meta.
"Our key insight is that incorporating computer vision tasks as instructions to image generation models offers unprecedented control in image generation and editing."
Meta said that it trained Emu on "10 million synthesized samples, each including an input image, a description of the task to be performed, and the targeted output image," and believes it to be the largest dataset of its kind.
"Current methods often lean towards either over-modifying or under-performing on various editing tasks," added Meta.
"We argue that the primary objective shouldn’t just be about producing a ‘believable’ image. Instead, the model should focus on precisely altering only the pixels relevant to the edit request."
Emu Edit
This new model is capable of free-form editing through text instructions. Emu Edit precisely follows instructions and ensures only specified elements of the input image are edited while leaving areas unrelated to instruction untouched. This enables more powerful… pic.twitter.com/ECWF7qfWYY— AI at Meta (@AIatMeta) November 16, 2023
While the world is kept busy and fascinated by how generative AI products are blurring the line between what's real and fake, the advancements in AI that are opening up new possibilities, also raises concerns about the potential misuse.
With generative AI, its extremely easy for anyone to create realistic content that never exists.
This makes it challenging to distinguish between what's genuine and what's artificially generated. And not only that the fakery comes in the form of text, because with AI tools like Emu Video and Emu Edit, fakery also comes from videos and images.
So here, while generative is a double-edged sword, offering incredible possibilities but also posing potential risks like deepfakes or misinformation.
Meta didn't say when exactly the two AIs will be released, other than saying that it is "purely fundamental research" for the moment but the "potential use cases are clearly evident."