Baidu is the tech giant from China, and it is showing how AI has improved tremendously over the years that the technology may soon take away human element in journalism.
It was only back in 2018 the Chinese-state-run media company Xinhua unveiled the world's first two AI-powered news anchor.
Then in early 2019, Xinhua introduced another news anchor called "Xin Xiaomeng", considered the first-ever female AI-powered news anchor.
And in 2020, Xinhua just created the world's first 3D AI-powered news anchor that is capable of mimicking human gestures.
While the progress of AI development for journalism suggests that the industry may soon take away the human element or take people out of their jobs, the fast and focused development of AIs in news production came as researchers think that the technology might help media houses produce more news in a better format with minimal effort.
And this time, there is another advancement in AI, where text-based news can be turned into a video clip, with a single click.
Called the 'VidPress', the AI from Baidu can bring together video and text together by creating a clip based on articles.
To make use of this tool, the AI needs an URL.
After given an URL, it will analyze the content to understand its context, to then automatically fetch all related articles from the internet and creates a summary. For instance, if the input is a story about Apple launching a new iPhone, the AI will fetch all the details needed about the launch, including the specs of the new phone, the price and so forth.
After gathering all the needed text data, the AI will then search for related pictures and clips in the users' media library and on the web.
The AI then cuts and chooses the clips that fit the topics by analyzing the semantics of the clips.
This is possible because the researchers have applied multiple techniques to the pipeline, including computer vision techniques like facial recognition, object detection, optical character recognition, video understanding, natural language understanding (NLU), and speech synthesis.
And to put everything together, the AI uses a self-developed attention-based timeline alignment algorithm, to segment a chunk of text into meaningful anchors, rank clips by their relevance to the anchors, and move high-ranked clips into the timeline first.
The last step is to render the timeline into a video file.
According to Baidu in a blog post:
VidPress is simply an AI can feeds on an URL, to then analyze the web page using NLU models to help find matched media content, to then enriches the story by aggregating relevant news from a wide range of sites.
To create this VidPress AI, Baidu has trained it with thousands of articles online to understand and extract context of a news story. Additionally, the company had to train several AI models for voice and video generation separately.
And in the final step, the algorithm in the AI is designed to sync both streams to create a smooth final video.
Baidu said its AI can also detect social media trends and create videos on related topics. The company said that it aims to provide the AI to news agencies and creators so that they can turn their posts into videos with synthesized narratives.
The company has deployed VidPress on its short videos app Haokan and only works with Mandarin language.
Baidu claims that the AI can produce up to 1,000 videos per day, which is a whole lot more than the 300-500 its human editors are capable of.
VidPress takes up to nine minutes to create a news video, with an average of 2.5 minutes for a two-minute 720p video, compared to 15 minutes by human editors.
The search giant said the AI only looks at most trusted sources to create content, but didn’t provide a list of those sources. The results of the test came from Baidu in running the model in a controlled environment, meaning that the company may need to adjust or tweak some of the algorithms before releasing it to more people.