The image remastering approach known as Super-Resolution (SR) has numerous applications in many fields such as medical imaging in healthcare, autonomous driving systems and analyzing security footage.
SR technologies have been rapidly growing since the introduction of Deep Learning (DL) techniques like Convolutional Neural Networks (CNNs) and Generative Adversarial Networks (GANs). New approaches to SR, based on deep learning techniques, have significantly increased the level of detail in high-resolution images they generate compared to earlier techniques.
What Is Super-Resolution?
Super-Resolution refers to a class of techniques designed to create a high-resolution image from a low-resolution image. The main task of Super-Resolution is to increase the size of an image with the lowest possible reduction to quality. To enable this upscaling process, an algorithm fills in the missing details to create a larger output image.
Some of the challenges in this process include:
- The extra details needed to create the high-resolution output are produced based on an estimation of what the low-resolution input will look like in the high-resolution output, but these details are unknown.
- Since we are filling in missing details, we can create multiple high-resolution outputs based on the same low-resolution input.
Deep Learning Super-Resolution Methods
We can use deep learning convolutional neural networks to train a model for super-resolution. To do so, we need to overcome the fact that the convolution operation reduces the size of inputs. Thus, we need to use a deconvolution or a similar layer to multiply the input size by a certain factor to achieve the result we desire. To train such a model for SR, we need to download high-resolution images off the Internet and scale them down to a low resolution. We can then feed these low-resolution inputs to the network and train it to create a high-resolution version of the images, which we can compare to the original high-resolution versions.
We can measure the success of the network by how well it reduces the Mean-Squared Error (MSE) between the pixels of the output and the original version.
The MSE equation is:
f= original image matrix
g= matrix of the generated (output) high resolution image
M= number of pixel rows
i= index of the row
N= number of pixel columns
j= index of the column
Theoretically, the best result is an MSE of 0. In that case, the original high-resolution image and the high-resolution version generated by the network are identical.
Peak Signal-to-Noise Ratio (PSNR)
This metric is used to determine the quality of the generated image. We can measure PSNR on a logarithmic scale by comparing the highest pixel value (peak signal), which is equal to the MSE, with the pixel values of the generated image. The PSNR equation is:
maxvaule= the maximum value existing in the input image.
The purpose of this equation is to calculate the trade-off between MSE and the maximum pixel value. Higher PSNR equals higher quality generated images.
However, since we are trying to mimic real-world scenarios, high PSNR doesn’t guarantee the best result, as it can result in overly smooth images that may look unreal. To make images more perceptually pleasing, we can use various CNN architectures , such as ResNet and GoogleNet, as feature extractors for content loss, which will be used as the loss function of the network. Content loss is defined as the difference between the representation (feature map) of the original (ground-truth) image and the generated image.
Generative Adversarial Networks
This is the most commonly used deep learning architecture for training deep learning-based SR models. The architecture of GAN is based on unsupervised learning. A GAN is comprised of two neural networks residing in a single framework and competing in a zero-sum game.
GANs are capable of artificially generating various artifacts such as audio, video, and images that mimic their human-made counterparts. The goal of this process is to take a simple input and use it to generate a complex output with a high level of accuracy.
Super-Resolution with Generative Adversarial Networks
Nowadays, most SISR techniques are quick and accurate. However, most of them don’t fare so well when it comes to recovering fine-grained texture and details from low-resolution images to generate a high-resolution image without distortion. This is mainly due to the fact that most work, up to this point, focused on minimizing the MSE, which is equivalent to achieving high PSNR and resulting in overall good image quality.
While the images generated by SISR are of higher resolution, they are often blurry, lack high-frequency, fine-grained details and look dull in comparison to true high-resolution images.
To build a model capable of creating more perceptually satisfying and less blurry images, we need to use a model that can capture the perceptual differences between the original image and the generated one.
To achieve this, we can use Super-Resolution with Generative Adversarial Networks (SRGAN), which produces high-resolution images by applying a combination of an adversarial network and a deep network.
The steps to train an SRGAN are as follows:
– Take a set of high-resolution images and down-scale them to low resolution.
– Input the images into the generator and let it produce an output of SR images.
– Use the discriminator to distinguish between the original high-resolution and SR images, and then use back-propagation to train the generator and the discriminator
– Repeat stages until you reach satisfactory results