LATENT SPACES (Part-2): A Simple Guide to Variational Autoencoders
In the previous tutorial (https://readmedium.com/latent-space-representation-a-hands-on-tutorial-on-autoencoders-in-tensorflow-57735a1c0f3f) we learned about latent spaces, autoencoders and their implementation in TensorFlow. In this tutorial, we shall extend the concept of autoencoders and look at one of the special cases of autoencoders called variational autoencoders.

1. Variational Autoencoders (VAE):
The main strength of autoencoders resides in their ability to extract the abstract representation of the data space which is supposed to handle unseen instances. This opens possibilities where one could generate new images that have not already been seen using the latent space. The general autoencoder architecture, however, does not allow much freedom in traversing the latent space. This can be circumvented by Variational Autoencoders (VAE) which learn a latent distribution instead of a latent vector and therefore, make it possible to interpolate in the latent space. More specifically, a Variational Autoencoder models a Multi-variate Gaussian distribution that assumes that data can be approximated as a normal distribution. As seen in Figure 2, the Variational Autoencoder enforces a prior on the latent vector; a Multi-Variate Gaussian distribution, which maps the input onto the latent space in contrast to a single multi-dimensional point in latent space as in general autoencoders.

2. Objective Function:
Just like autoencoders, VAE also learn by optimizing an objective function which is a loss function computed for every data instance. However, the loss function for VAE is different from autoencoders in that they not only minimize the reconstruction error but also enforce the constraint that the latent vector comes from a normal distribution. This is achieved by adding an additional term in the loss function as shown in the following equation.

The reconstruction error is computed as usual between the input image and the output of the decoder and can be modeled as a mean-square error as follows:


The divergence error can be computed using Kullback–Leibler Divergence ( KL Divergence ) which is a measure of how one probability distribution differs from another one. Since the assumption is that the data comes from a normal distribution, so, we can compute the divergence loss as follows:

3. Implementation:
Just like autoencoder, VAE consists of three sections, an encoder, a latent vector and a decoder. However, since latent vector in VAE comes from a normal distribution, there is an additional layer which maps the encoder output to the mean and variance of the probability distribution. The latent vector is then sampled from that learned latent distribution.
3.1. Data Preparation
Similar to our previous tutorial on autoencoders, we will use an opensource dataset from Kaggle datasets. You can download it from the following link: https://tinyurl.com/4k5zhsey
This tutorial will require Tensorflow >=2.6 and jupyter notebook installed, so, if you haven’t had it setup, you can either use google Colab or set it up on your computer using Anaconda.
Since the data preparation process is the same as with autoencoders, the details of loading the data can be found from the previous tutorial as well.

3.2. Model Building
We build the model of VAE similar to autoencoder. First we define the layers of the encoder, then we introduce a sampling function to construct the latent vector and then we decode the sampled latent vector by projecting it onto the latent space. The encoding layers are as follows:










