Deep Learning using Transfer Learning

In this article series, we will explore what is Transfer Learning and what is the objective of Transfer learning. Understand different strategies to implement Transfer Learning. In the next article, we will write code to apply Transfer Learning using ResNet50 .

If you know how to row a boat, how to swim then can we learn water rafting?

If you have learned the basics, then you only need to learn things specific to the water rafting, The basic concept of boat rowing and swimming are part of your knowledge base.

Can we apply this technique for Machine Learning and Deep Learning?

But we create a new Convolution Neural network(CNN) every time for identifying different classes of objects. We have one CNN to identify animals like dogs and cats. We will have a different CNN for identifying digits and another for identifying apparel objects.

Common Assumption on Machine Learning and Deep Learning

If the training and test data are drawn from the same feature space and the same distribution we can reuse the already built model however when the distribution changes, we need to rebuild the model from scratch. This requires us to collect new training data.

It is expensive to recollect the needed training data and rebuild the models. What if we reduce the need and effort to recollect the training data and can use knowledge transfer or transfer learning between task domains.

What if we have one CNN that we can use to learn the basics of the images like corners, shape, illumination and then tweak it slightly to learn the specifics of other classes of images?

Welcome to Transfer Learning!

What is the objective of Transfer Learning?

Objective of Transfer Learning is to take advantage of data from the ﬁrst setting to extract information that may be useful when learning or even when directly making predictions in the second setting

-Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville

Motivation for Transfer learning

Machine Learning models have been traditionally developed under the assumption that a model will work well if the training and test data are drawn from the same feature space and the same distribution.

If the feature space or distribution of data changes, then we would need to build a new model. Developing a new model every time from the ground up and every time collecting a new set of training data is expensive. Transfer Learning reduces the need and effort to recollect the massive amounts of training data.

Motivation for Transfer learning used for Machine Learning and Deep Learning is based on the fact that people can intelligently apply knowledge learned previously for a different task or domain that can be used to solve new problems faster or with better solutions.

What are the key considerations for Transfer Learning?

To effectively apply Transfer Learning we need to answer three main questions

What to transfer
When to transfer
How to transfer

What to transfer — We need to understand what knowledge is common between the source and target task. What knowledge can be transferred from source task to target task that will help improve the performance of the target task
When to transfer or when not to Transfer- When the source and target domains are not related at all we should not try to apply transfer learning. In such a scenario the performance will suffer. This type of transfer is called Negative Transfer. We should apply Transfer learning only when source and target domains/tasks are related
How to transfer: Identifying different techniques to apply transfer learning when the source and target domain/task are related. We can use Inductive transfer learning, Transductive transfer learning or unsupervised transfer learning.

What are these different types of transfer learning?

Different Types of Transfer Learning

Inductive Transfer learning -Same Source and Target domain but different Task

If we want the child to identify fruits then we start showing apples of different colors like red apples, green apples, pale yellow apple, etc. We show the child different variety of apples like Gala, Granny smith, Fuji apples, etc. We show these apples in different settings so that the child is able to identify apples in most of the scenarios. The same logic is used to identify different fruits like grapes, Oranges, Mangoes, etc. Here we use the knowledge acquired in learning apples applied to learning to identify other fruits. Our Source and Target domains are related to the identification of fruits but one task involves identifying apples and one task involves identify Mangoes.

The goal of Inductive transfer learning is to improve the performance of the target predictive function.
Inductive transfer learning requires a few labeled data in the target domain as the training data to induce the target predictive function
If the source and target domains both have labelled data then we can perform multi-tasking transfer learning
If the source has labelled data and target task does not have labelled data then we can perform self-learning transfer learning

Transductive Transfer learning -Different Domain but similar Task Transfer Learning

Let’s extrapolate this learning and now we want the child to learn about household objects like chair, table, bed, etc. The child will utilize the knowledge acquired for identifying fruits to identify household objects.

The child might not have been shown enough household objects but will use the knowledge of shapes, colors, etc. learn to identify fruits to identify household objects.

Transductive transfer learning, no labeled data exists in the target domain while a lot of labeled data exists in the source domain

Transductive Transfer learning can be applied when

Feature spaces between the source and target domains can be different or
Feature spaces between domains are the same but the marginal probability distributions of the input data are different. This is also referred to as Domain adaptation.

Unsupervised Transfer Learning

Unsupervised transfer learning is similar to inductive transfer learning where the target task is different from but related to the source task. The domain of the source and target task is the same. We have no labeled data for source-target task

It focuses on solving unsupervised learning tasks in the target domain, such as clustering, or dimensionality reduction

Can I apply these Transfer Learning strategies to Deep Learning?

Deep Learning requires significant training data and training time compared to Machine Learning like for computer vision or sequential text processing or audio processing. We can save the weights of our trained models and share for others to use. We also have pre-trained models today that are extensively used for Transfer Learning referred to as Deep Transfer Learning.

Common strategies for Deep Transfer Learning

Use the pre-trained model as feature extractors
Fine-tune the pre-trained models

Pre-trained deep neural networks for Computer Vision

Pre-trained deep neural networks for Natural Language Processing tasks

Pre-trained models can be used for prediction, feature extraction, and fine-tuning

Let’s get in the details of each of these strategies

Use the pre-trained model as feature extractors

To implement Transfer learning, we remove the last predicting layer of the pre-trained model and replace them with our own predicting layers. FC-T1 and FC_T2 as shown below
Weights of these pre-trained models are used as a feature extractor
Weights of the pre-trained model are frozen and are not updated during the training

Fine-tune the pre-trained models

We can use deep neural networks like VGG-16, VGG-19, Inception V3, ResNet-50, Xception as pre-trained model
To implement Transfer learning with fine-tuning, we remove the last predicting layer of the pre-trained model and replace them with our own predicting layers. FC-T1 and FC_T2 as shown below.
Initial lower layers of the network learn very generic features from the pre-trained model. To achieve this initial layers weights of pre-trained models frozen and not updated during the training
Higher layers are used for learning task-specific features. Higher layers of pre-trained models are trainable or fine-tuned
Improves performance with less training time

We will see code implementation using ResNet50 in the next article. ResNet is short for Residual Network. It is a 50 layer Residual Network.

References:

A Survey on Transfer Learning by Sinno Jialin Pan and Qiang Yang

A comprehensive hands-on-guide-to-transfer-learning-with-real-world-applications-in-deep-learning

GitHub for Hands on Transfer Learning with Python

Summarize