It has long been in Hollywood science-fiction and action movies, where the protagonists have computers capable of enhancing images when zoomed in extremely close.
With that ability, they can reveal faces from a distance, see number plates, or reveal any key detail. In movies, it seems like cameras don't use pixels, as blurry images can be deblurred very easily. Google is the tech giant of the web, and the company is trying to bring that very well-known fiction to reality, with the help of Artificial Intelligence.
Here, Google Research’s Brain Team has introduced two AI engines, which work on what is known as diffusion models.
With the two AIs, Google tries to add details to images that the camera didn't originally capture, using computerized guesswork that is based on computers learning from similar-looking images.
In a blog post, Google called the technique "natural image synthesis," and it is a very difficult process to master, even for Google.
The first AI is called the SR3, or Super-Resolution via Repeated Refinement.
Using this AI, Google starts from a small and blocky, pixelated image. The AI processes it to end up with another image that is a lot sharper, clear and natural-looking. The AI does this by 'imagining' what objects inside the image are, and recreate it using its knowledge.
While the image may not match the original, the results can be close enough to look very real.
The SR3 does this by adding noise to an image it is working on, and reverses the process by taking it away, in a controlled corruption method.
"Diffusion models work by corrupting the training data by progressively adding Gaussian noise, slowly wiping out details in the data until it becomes pure noise, and then training a neural network to reverse this corruption process," explained research scientist Jonathan Ho and software engineer Chitwan Saharia from Google Research.
By adding noise until there is nothing by noise, the AI can understand where to act because it only added controlled noise. It then passes that through a series of probability calculations and machine learning technology, to reimagine what a full-resolution version of a blocky low-resolution image looks like.
The AI does this by comparing the image with other low-resolution images as a reference.
In the end, the model could convert a 64 x 64 input image into a 1024 x 1024 image.
The second AI tool, is called the CDM.
Or Cascaded Diffusion Models, according to Google, the AI acts as the "pipelines" through which diffusion models – including SR3 – can be directed for high-quality image resolution upgrades.
What it does, is taking the enhancement models and makes larger images out of it.
When dealing with images, upsizing them will make edges and the colors blurred. This is because the pixels are 'stretched' to accommodate the increasing width and height.
Through CDM, the AI can prevent loss of details when images are upscaled, by chaining multiple generative models over several spatial resolutions.
Google tested both SR3 and CDM, and with the help of 50 human volunteers, SR3-generated images of human faces were mistaken for real photos around 50% of the time.
While the AI engines are far from perfect, Google said that the diffusion approach apparently produced better results than other approaches, including Generative Adversarial Networks (GAN), which pit two neural networks against each other to refine results.
Google is so confident that the team said that they have pushed the limit of diffusion models to create a state-of-the-art AI capable of working on super-resolution and class-conditional ImageNet generation benchmark.
With that knowledge, Google promises further development from these two AI engines and their technologies, which would go beyond just upscaling or deblurring images.
"We are excited to further test the limits of diffusion models for a wide variety of generative modeling problems," the team explained.