Aspect ratio describes the proportional relationship between its width corresponding to its height.
Typically, films are created to be best viewed in landscape using aspect ratios of 16:9 or 4:3. But in the modern days of technology where people are creating various forms of video content and edit them as they wish, the resulting aspect ratios don’t always fit the display being used for viewing.
The same goes for manufacturers and software. As they compete with one another relentlessly for consumers, results can have a wide varieties of aspect ratios.
To make videos appear like they should, people usually reframe them to match the different aspect ratios needed for the viewers. This can be done using cropping by removing any excess elements beyond that frame.
Unfortunately, these method often lead to unsatisfactory results due to the variety of composition and the camera motion and object position.
Another solution, is to manually identify the content of focus, to then crop them on each frame. Again, this may not be the best way since the process can be time consuming.
To address this problem, Google develops a tool it calls 'Autoflip'. What it does, is using neural network to crop videos automatically.
AutoFlip is built by utilizing the MediaPipe framework, which enables the development of pipelines for processing time-series multimodal data.
To do this, the AI must first analyze the video content, to then develop the optimal tracking and cropping strategies. It needs to also know the output dimension. After having all the data, the AI goes to work by detecting changes in the composition that signify scene changes in order to isolate scenes for processing.
It analyzes each and every frame in the video, computes their histogram, to choose the camera mode and the best path it thinks suit best for the content (stationary, panning or tracking).
Based on which of these three reframing strategies the algorithm selects, AutoFlip then determines an optimal cropping method for each frame, and create an output video with the same duration, in the desired aspect ratio.
And as for face and object recognition, AutoFlip integrates them through MediaPipe, which uses TensorFlow Lite.
The tool is meant to be extensible, meaning that developers can add their own detection algorithms for different use cases or scenarios. Google said that:
"The ability to adapt any video format to various aspect ratios is becoming increasingly important as the diversity of devices for video content consumption continues to rapidly increase. Whether your use case is portrait to landscape, landscape to portrait, or even small adjustments like 4:3 to 16:9, AutoFlip provides a solution for intelligent, automated and adaptive video reframing."
While AutoFlip can improve its ability as it uses machine learning technology, it does have some issues. For example, if the input video has overlays on the edges, like text or logo, AutoFlip generally crops them out from view.
For the next step of the project, the team wants to improve AutoFlip's object tracking in interviews and animation films. The team also wants to use text detection and image inpainting techniques to better place foreground and background objects in one frame.
In the meantime, Google has released AutoFlip in a dedicated GitHub page to "encourage contributions from developers and filmmakers in the open source communities."