Understanding Domain Adaptation

Learn how to design a deep learning framework enabling them for domain adaptation

Note — I assume the reader has some basic knowledge of neural networks and their working.

Domain adaptation is a field of computer vision, where our goal is to train a neural network on a source dataset and secure a good accuracy on the target dataset which is significantly different from the source dataset. To get a better understanding of domain adaptation and it’s application let us first have a look at some of its use cases.

We have a lot of standard datasets for different purposes, like GTSRB for German traffic sign recognition, LISA and LARA dataset for traffic light detection, COCO for object detection, and segmentation etc. However, if you want a neural network to work nicely for your task e.g. traffic sign recognition on Indian Roads, then you will have to first collect all types of images of Indian Roads, and then do the labeling for those images, which is a laborious and time taking task. Here we can use domain adaptation, as we can train the model on GTSRB (source dataset) and test it on our Indian traffic sign images (target dataset).
There are many cases where it is difficult to gather datasets, which have all the required variations and diversity to train a robust neural network. In this case, with the help of different computer vision algorithms, we can generate large synthetic datasets that have all the variations we require. Then train the neural network on the synthetic dataset (source dataset) and test it on real-life data(target dataset).

To get a better understanding I am assuming that we do not have annotation available for the target dataset/domain, however, it is not the only case. Please continue reading for further explanation.

So in domain adaptation, our goal is to train a neural network on one dataset (source) for which label or annotation is available and secure good performance on another dataset (target) whose label or annotation is not available.

Classification pipeline (Image by author)

Let us now see how do we achieve our goal. Consider the above case of image classification. To adapt from one domain to another, we would want our classifier to work well on the features extracted from the source as well as the target dataset. Since we have trained the neural network on the source dataset, the classifier must perform well for the source dataset. However, to make classifiers perform well on the target dataset, we would want the features extracted from the source and target dataset to be similar. So while training we enforce feature extractor to extract similar features for source and target domain images.

Successful Domain Adaptation (Image by author)

Type of Domain adaptation based on Target domain

Depending upon the type of data available from the target domain, domain adaptation can be classified into the following-:

Supervised — You have labeled data from the target domain and the target dataset size is much smaller as compared to source dataset.
Semi-Supervised — You have both labeled as well as unlabelled data of target domain.
Unsupervised — You have a lot of unlabelled sample points of target domain.

Techniques for Domain Adaptation

Majorly three techniques are used for realizing any domain adaptation algorithm. Following are the three techniques for domain adaptation-:

Divergence based Domain Adaptation
Adversarial based Domain Adaptation
Reconstruction based Domain Adaptation

Let us now look at each technique part by part.

Divergence based Domain Adaptation

Divergence-based domain adaptation works on the principle of minimizing some divergence-based criterion between source and target distribution, hence leading to domain invariant features. Common divergence based criterion used is Contrastive Domain Discrepancy, Correlation Alignment, Maximum Mean Discrepancy (MMD), Wasserstein etc. To get a better understanding of this algorithm, let’s first look at some divergences.

In Maximum Mean Discrepancy (MMD), we try to find whether the given two samples belong to the same distribution or not. We define the distance between two distributions as distances between the mean embedding of features. If we have two distribution say P and Q over set X. Then MMD is defined by a feature map 𝜑: X→H, where H is called a reproducing Kernel Hilbert Space. The formula of MMD is described below

For a better insight of MMD please look at the following description — Two distributions are similar if their moments are similar. By applying a kernel, I can transform the variable such that all moments (first, second, third etc.) are computed. In the latent space, I can compute the difference between the moments and average it.

In Correlation Alignment, we try to align correlation (second-order statistics) between the source and target domain as compared to mean using linear transformation in MMD.

The architecture above assumes that the source and target domain have the same classes. In the above architecture, during training, we are minimizing two losses, classification loss, and divergence-based loss. Classification loss makes sure to get good classification performance by updating the weights of both feature extractor and classifier. Whereas divergence-based loss assures that features of source and target domain are similar by updating weights of feature extractor. During the inference time, we just have to pass the target domain image from our neural network.

All the divergences are usually non-parametric and handcrafted mathematical formula which are not specific to the dataset or our problem be it classification, object detection, segmentation etc. Hence this formula-based approach is not very personalized to our problem. However, if the divergence can be learned based on dataset or problem then it is expected to perform better as compared to conventional pre-defined divergences.

Adversarial based Domain Adaptation

For realizing Adversarial based domain adaptation we use GANs. In case if you don’t have much idea about GANs, please have a look here.

Here our generator is simply the feature extractor and we add new discriminator networks which learn to distinguish between source and target domain features. As it is a two-player game, the discriminator helps the generator to produce features that are indistinguishable for source and target domain. Since we have a learnable discriminator network we learn to extract features (specific to our problem and dataset) which could help to distinguish between source and target domain, and hence helping generator to produce more robust features i.e. feature which cannot be distinguished very easily.

During Training — For Source (Image by author)

During Training — For Target (Image by author)

Assuming the problem of classification, we are using two losses, classification loss and discriminator Loss. The purpose of classification loss is explained earlier. Discriminator loss helps discriminator correctly classify between source and target domain features. Here we use the gradient reverse layer (GRL) to realize adversarial training. GRL block is a simple block that multiplies the gradient with -1 or a negative value while back-propagating. During training, to update the generator we have gradients coming from two directions, first from the classifier and second from the discriminator. The gradients from the discriminator gets multiplied by a negative value because of GRL, thus leading to an effect of training generator in the opposite direction as the discriminator. For example, if computed gradient to optimize loss function for discriminator is 2 then we update the generator with -2 (assuming negative value to be -1). By this, we are trying to train generator in such a way that even the discriminator is not able to distinguish between the features of source and target domain. GRL layer is very commonly used in many literatures on domain adaptation.

Reconstruction based Domain Adaptation

This works on the idea of Image-to-Image translation. One of the simple approaches could be to learn the translation from the target domain images to source domain image and train a classifier on the source domain. There can multiple approaches which we can be introduced using this idea. The simplest model for Image-to-Image translation could be an encoder-decoder-based network and use a discriminator to enforce an encoder-decoder network to produce images that are similar to the source domain.

Another way is we can use Cycle GANs. In Cycle GAN we use two encoder-decoder-based neural network. One is used to transform target to source domain and another is used to transform source to target domain. We simultaneously train two GANs that generate the images in the two domains (both source and target). To ensure consistency, a cycle consistency loss is introduced. This makes sure that the transformation from one domain to another and back again results in roughly the same image as the input. Thus, the full loss of the two paired networks is a sum of the GAN losses of both discriminators and the cycle consistency loss.

Conclusion

We have seen three different techniques that could help us to realize or implement different domain adaptation approaches. It has its great applications in different tasks such as image classification, object detection, segmentation etc. In some aspects, we can say that this approach is similar to how humans learn to visually recognize different things. I hope this blog gives you a basic idea of how we think of different domain adaptation pipelines.

Become a Medium member to unlock and read many other stories on medium. Follow us on Medium for reading more such blog posts.