avatarAkshay Deshpande

Summarize

What is GAN? The First Blog Post You Should Read — Simple Explanation for a Layman

An example of synthetic faces generated by a GAN (Source: https://readmedium.com/artificial-intelligence-gans-can-create-fake-celebrity-faces-44fe80d419f7)

Just stare at the above image for a few seconds. Believe it or not, these people you see above do not exist. They have never existed, nor will they ever. Is that too disturbing?

Whether you’re a businessperson, an analyst, a data scientist, a researcher, or just curious to know what the hype is all about, this article is a great way to get started in GANs.

By the end of this article, you will understand what Generative Adversarial Networks are, what they’re used for (through a simple example), and a few interesting applications of GANs with visualizations at the end. If you aren’t familiar with probability distributions or neural networks, you will still be able to understand GANs at a high level in this article.

Generative Adversarial Networks (GANs) are a type of Generative Model under the umbrella of Machine Learning.

Generative Models basically “generate” something — it could be text, audio, images, video, etc. While generative models have existed for a long time, the output of such models have been low-resolution, blurry, and often, with a lot of noise.

Let us understand what GANs are through a simple example. The MNIST dataset, as shown below, is a database of handwritten digits. The resolution of these images is 28x28 pixels. Each pixel can have values between 0 (black), 255 (white), and everything in between (gray). Flatten the 28x28 image into a very long 784 list of numbers. The first row would be the first 28 values, the second row would be the next 28 values, etc.

MNIST Database of handwritten digits

Imagine a coordinate space, if you will, of 28 x 28 = 784 dimensions. If you’re having trouble, imagine 784 coordinates (x1, x2, x3 ….. x784). A single image would correspond to a single point in this high-dimensional space. If the first two values of the first row are 255, and the last value of this image is 0, the vector of length 784 would correspond to a point in space with coordinates (255, ….. , 0).

Now, this is a very high dimensional space. Every single 28x28 grayscale image you can ever imagine would be a point in this space. However, the MNIST images are similar to each other in various aspects, and hence you can imagine that all these images would occupy a small part of this space (say, a cluster).

Synthetic MNIST Images

If you imagine this cluster, it would be dense (number of points = number of images that you have). Say, you pick a new point within the blank spaces of this cluster, chances are, it would make some sense, corresponding to one of the digits, or perhaps, could have some features of one digit and some features of another.

The points I’m referring to can be visualized in the image on the left (labeled ‘Synthetic MNIST Images’). Like I mentioned, some of these make sense (like the 1s, or 3s, or 9s), while others don’t. If you observe closely, many of these contain features from multiple digits.

So how can GANs help me here?

Pix2Pix — A variation of GAN for Image to Image Translation (Source: https://affinelayer.com/pixsrv/)

The synthetic images you visualized above are fascinating, aren’t they? They’re completely made up! Imagine the impressive applications — of creating new environments in games, for unique designs in a home for interior designers, design of new Anime characters, for automatic coloring of grayscale images, and even to turn your terrible sketch of a cat into a realistic cat! Even though GANs are an unsupervised machine learning technique, they can be used for supervised learning tasks (to increase accuracy by augmenting your dataset with unlabeled data) and for reinforcement learning tasks (creating new environments). And guess what, you can generate new music as well! In addition, you can use convolutional and recurrent layers within the networks that make up a GAN. These are only a few of the endless exciting applications of GANs, and I urge you to find utilization of GANs in the domain of your interest!

These images could be said to come from a probability distribution i.e., probability at and near the cluster would be high, and low everywhere else in space. Finding this probability distribution (often described as the underlying structure of a dataset) is an extremely challenging task!

GANs find this probability distribution through a function (something that takes in an input to produce an output). What that means is, it develops a function wherein as you input a random noise vector (or list of random values) iteratively, the outputs would have more values at some specific place in space (dense) and fewer values elsewhere. In a two-dimensional space, you could think of a normal distribution (lots of values near the peak, less elsewhere). But in a high-dimensional space like this, this would correspond to a dense cluster somewhere in space, with little to no values everywhere else in space.

The Architecture of a Generative Adversarial Network

So what function do GANs use? A Neural Network! As you might know, neural networks with at least one hidden layer with non-linearity have the ability to approximate any continuous function, also referred to as the Universal Approximation Theorem. And as you may already know, Probability Distributions are just functions with uncertainty modeled into them.

So great! GANs can use a neural network. But how does it learn this function? The noise inputs do not necessarily have a specific output, so supervised learning cannot be used here.

What GANs do, is train this network (Generator Network, because this is responsible for generation, simply called Generator) to approximate the probability distribution mentioned above using a second network (Discriminator Network). The Discriminator network (simply called Discriminator) has only two outputs — Real or Fake. Hence, it is a classifier.

The Battle of the Generator and Discriminator

The input to the Generator network is a random vector (also called the latent vector) that passes through the network to try and create an output like the real data (synthetic, but realistic). The Discriminator alternates between 2 steps. In step 1, it takes in the actual data and tries to label that as Real. In step 2, it takes in the fake (also called generated) data (from the generator output), and tries to label that as Fake.

Of course, the weights of the two networks are randomly initialized. Therefore initially, Generator outputs very bad data (you can imagine the first row of the above image labeled ‘Synthetic MNIST Images’) that do not look like actual images, and Discriminator too has a hard time figuring out what’s real and what’s fake.

Source: https://developers.google.com/machine-learning/gan/gan_structure

The two networks are adversaries of each other (which is where the term Adversarial in GANs comes from). They start out poorly at their tasks. However, the Discriminator is supervised by a loss function to output 1 for real data, and 0 for fake. What this basically means is, that the network updates itself to strive towards an output of 1 for a real input data point and an output of 0 for a fake datapoint.

The Generator, on the other hand, is supervised by a loss function that aims to output a Discriminator value of 1 for its generated datapoints or images. Similarly, this means that the generated network strives to update itself to fool the discriminator (since it has its own goal to output a value of 0 for generated data).

Hence, the Generator and the Discriminator are in a constant fight with each other, starting off at a bad place, but updating itself at every step (weight update). As the Generator tries to fool the Discriminator by outputting data that resembles the real data, the Discriminator has its own goal that opposes the goal of the Generator, and both end up becoming better at their own goals. When both the networks cannot further improve (after a certain number of epochs) by updating themselves (keeping the other network constant), it has said to have achieved Nash Equilibrium (a term from Game Theory — the two networks here are rational agents with a set of actions, trying to improve themselves iteratively). Essentially here, the discriminator outputs a 0.5 since it cannot distinguish between the distribution of real data and the distribution of generated data. This is the aim of the GANs, but you can even generate realistic images without achieving equilibrium, which is rarely achieved.

Probability Distribution of the Generator

Recall that we discussed that we want to find the probability distribution (or a function that does that) to generate realistic data or images like the original MNIST?

The probability distribution of the Generator network shifts towards the distribution of real data (Source: https://towardsdatascience.com/training-a-gan-to-sample-from-the-normal-distribution-4095a11e78de)

The Generator starts off outputting values all over the place (random points in the high-dimensional space), however, as the training of the two networks progresses, the random inputs to the Generator will start to output more and more points in and around the actual cluster of the real images. This is when we say that the Generator network has learned the probability distribution of the real data. If you would like to generate more images, throw away the Discriminator, and the Generator will output realistic data each time you supply it with a random vector.

And as in all things in Machine Learning, the reason why we use this is because it works! And you’ve now understood GANs!

A Few Interesting Applications

  1. Generate new realistic images such as numbers, faces, animals, etc. — https://www.csoonline.com/article/3293002/deepfake-videos-how-and-why-they-work.html

2. Sketches to Realistic Images

3. Text descriptions to Realistic Images

4. Photos to Emojis

5. Photo Inpainting

6. Image to Image Translation of Horses into Zebras and the other way around

The Source for the Above Images — https://readmedium.com/gan-some-cool-applications-of-gans-4c9ecca35900 and https://machinelearningmastery.com/impressive-applications-of-generative-adversarial-networks/

Thank you so much for reading the article. I had struggled for a very long time in understanding what GANs actually do, and coming up with this visualization took time, but it helped me immediately, and I knew this was so valuable that I had to share it with everyone getting started with GANs! I have also recently worked with GANs in the context of document analysis as a researcher at IIIT Hyderabad, India.

Do like and share if you found this useful. For any feedback, feel free to get in touch with me at [email protected]. Have fun implementing GANs, Cheers!

Neural Networks
Artificial Intelligence
Machine Learning
Data Science
Deep Learning
Recommended from ReadMedium