GAN — Super Resolution GAN (SRGAN)

Summary

Super Resolution GAN (SRGAN) is a deep learning approach that uses generative adversarial networks to produce high-resolution images from low-resolution inputs, focusing on human-perceived visual quality.

Abstract

The Super Resolution GAN (SRGAN) is designed to enhance the resolution of images while improving the perceptual quality for human viewers. This technique involves downsampling a high-resolution (HR) image to create a low-resolution (LR) counterpart, then using a GAN-based generator to upsample the LR image to a super-resolution (SR) output. The generator's architecture includes convolutional layers, batch normalization, and parametric ReLU activations, with skip connections inspired by ResNet. The discriminator assesses the realism of the SR images compared to HR images, and the adversarial training process is guided by a loss function that combines content loss (measured by feature similarity using a VGG-19 network) and adversarial loss. The SRGAN method is shown to produce more detailed and appealing images than traditional methods like SRResNet, which do not incorporate GANs.

Opinions

The SRGAN approach is considered more effective in generating visually appealing high-resolution images compared to non-GAN methods.
The use of a perceptual loss function based on features extracted by a VGG-19 network is believed to yield images that are more appealing to humans, as opposed to using mean square error (MSE) alone.
The network design, including skip connections and parameterized ReLU, is thought to contribute positively to the model's ability to generate detailed images.
The adversarial training process, involving both a generator and a discriminator, is seen as crucial for the network to learn to produce realistic high-resolution images.

GAN — Super Resolution GAN (SRGAN)

Super-resolution GAN applies a deep network in combination with an adversary network to produce higher resolution images. As shown above, SRGAN is more appealing to a human with more details compared with the similar design without GAN (SRResNet). During the training, A high-resolution image (HR) is downsampled to a low-resolution image (LR). A GAN generator upsamples LR images to super-resolution images (SR). We use a discriminator to distinguish the HR images and backpropagate the GAN loss to train the discriminator and the generator.

Below is the network design for the generator and the discriminator. It mostly composes of convolution layers, batch normalization and parameterized ReLU (PRelU). The generator also implements skip connections similar to ResNet. The convolution layer with “k3n64s1” stands for 3x3 kernel filters outputting 64 channels with stride 1.

Loss function

The loss function for the generator composes of the content loss (reconstruction loss) and the adversarial loss.

The adversarial loss is defined as:

We can compute the content loss pixel-wise using the mean square error (MSE) between the HR and SR images. Nevertheless, while it determines the distance mathematically, it is not necessarily more appealing to a human. SRGAN uses a perceptual loss measuring the MSE of features extracted by a VGG-19 network. For a specific layer within VGG-19, we want their features to be matched (Minimum MSE for features).

To train the discriminator, the loss function uses the typical GAN discriminator loss.

GAN — Super Resolution GAN (SRGAN)

Loss function

Further readings

GAN — A comprehensive review into the gangsters of GANs (Part 1)

Are we there yet? In this GAN series, we identify a general pattern on how GAN is applied to deep learning problems and…

GAN — GAN Series (from the beginning to the end)

A full listing of our articles covers the applications of GAN, the issues, and the solutions.

References