Transfer Learning for Image Classification — (1) All Start Here

Transfer learning techniques have gained popularity in recent years. It effectively builds image classification models for any specialized use cases. This book guides you to apply transfer learning techniques to build your own models. Transfer learning and image classification have built on a vast amount of knowledge. If you are new to data science and image modeling, this book may be a good starter. It explains the concepts and their fascinating development history. This book also wants to be a handbook for data scientists from other subjects. It provides the nuts and bolts, and the tutorials in Python code. By following it, you will apply the transfer learning technique to your image classification model successfully. This book is particularly useful as a step-by-step guide for the completion of a semester-long capstone project.

Let me show you the organization of this book. In “Chapter 1: What is transfer learning?”, I explain it to anyone who does not necessarily have the background. In “Chapter 2: The stories of pre-trained models”, I believe you may be inspired by the fascinating development history. In “Chapter 3: Let’s understand a convolutional neural network”, I walk you slowly through a Convolutional Neural Network (CNN).

The CNN framework may still be an abstract and mathematical operation. How does CNN interpret an image? In order to make sense of a CNN, I visualize what a CNN sees in “Chapter 4: Let’s visualize the pre-trained VGG-16 model layer-by-layer”. A good understanding of pre-trained models will make transfer learning really easy. That’s why I dedicate Chapter 3 and 4 for CNNs.

In “Chapter 5: Get image data, ready, and go”, I show you how to organize and annotate images programmatically. This paves the road to build and fine-tune your model in “Chapter 6: Build and fine-tune your transfer learning model”. The above chapters let you navigate to specific topics. If you are already familiar with image data and want to understand transfer learning, you can go directly to Chapters 4 and 6. If you are not sure about a Convolutional Neural Network (CNN), you can cover Chapters 1, 3, 4, and 6.

Let’s begin!

(A) Why Do We Need Image Recognition Technology?

In today’s digital world we take photos, and monitoring devices even generate more video images. How to make sense of the objects in an image? We humans can recognize and distinguish an object in an image effortlessly. But it is a waste of talent if we merely use human eyes to scan through millions of images. Therefore image recognition technology is invented and has been providing immense help in many areas. The first application is the security industry. A smart indoor camera can start recording video only when the system detects any unknown objects. Detectors in remote areas or on drones can recognize images and objects. This technology is important in the healthcare industry. It helps detect brain tumors, cancer, or any abnormal symptoms, and visually impaired people. A good example is high-tech walking sticks for blind people to recognize objects. This technology is applied to the retail industry. Customers can take a photo of an item they like and upload it to a brand’s website to see if it has something visually similar in stock.

(B) How Does an Image Classification Algorithm Work?

We humans have seen images repeatedly, so we can recognize a new image of the same kind. How do we develop an algorithm to do what we humans do? How can it tell an image of “flamingo” or “zebra”? We have to “train” the model with thousands of images with the label “zebra” and thousands of flamingo images, so the model can remember any special features about a zebra and a flamingo. The model can classify and recognize different images by guessing probabilities. Image classification is the task to recognize an image. So it is also called image recognition.

The core of image recognition technology is a set of machine learning algorithms. Training such a machine learning algorithm is a very challenging job. It requires advanced algorithms and a voluminous amount of image data. Its training time can be hours or even days.

(C) How Does Transfer Learning Help?

Because building an image classification model from scratch is extremely costly, the process becomes a challenge to enterprises and individual researchers. The transfer learning technique overcomes the obstacles. It needs fewer data and resources and can result in a better-targeted outcome.

The term “transfer learning” is human nature and should not be limited to any discipline. We are creative in transferring knowledge from one area to another. An Italian chef can transfer his cooking procedure to Chinese cuisine. A basketball player can transfer his agility to play volleyball. Although the learning curve is steep in the beginning, once learned, we can expand our learning to another related area. We seem to do this freely and even derive joyful moments.

Let’s see how models can leverage their learning. One image classification model can specialize in identifying the flags of nations. Another model can specialize in identifying the logos of car brands. To build a flag image model, you extract the basic features from images and do the modeling. To build the car brand model, you still will extract the basic features from images and do modeling. Some modeling procedures are in common, and even some extracted features are in common. There are large-scale models already built. They already have the basic features of different images. They are called Pre-trained models. They can be leveraged to build specialized models such as the above national flag or car logo examples. This process is the transfer learning technique.

(D) How Does It Work?

An image classification model is a type of supervised learning model. Supervised learning is where data and labels are present. Unsupervised learning is where only data but no labels are present. In an image classification model, the inputs are the image data, and the targets are the image labels.

Let me start to explain transfer learning at a high level. In Figure (D.1) there are two models. The top one is a pre-trained model. It was trained on many classes of images such as airplanes, ships, birds, and dogs. The images in each category have been annotated with the same label. This pre-trained model can identify a new “ship” image with a high level of confidence.

Pre-trained models can recognize the classes that they were trained on. They cannot recognize the classes that they were not trained on. If a pre-trained model has never seen a rocket image and was fed with a rocket image, it would fail to recognize it terribly. It will recognize it as something similar (and amusing!), such as a “hot dog” or a “pencil”.

Suppose the U.S. Navy needs an image classification model to distinguish specific types of warships such as carriers, destroyers, yachts, or motorboats. What can data scientists do? They can apply the transfer learning technique to a pre-trained model. Pre-trained models were not trained on specific types of warship images like carriers, destroyers, battleships, frigates, or guardian shields. If a pre-trained model sees the different warship images, it calls all of them “ship”. However, a pre-trained model knows the basic features of a ship. It knows a ship has a boat shape and pointy things on top of the boat such as the poles, as shown in Figure (D.2). These basic features have been saved in a pre-trained model.

Figure (D.2) (Images collected from boat Wikipedia)

Now, look into the pre-trained model in Figure (D.1) more carefully. It has two parts. The first part is the boxes, and the second part looks like a standard neural network. The boxes store the basic features of a “ship”. The standard neural networks combine the basic features of the “ship” and link it with the target label “ship”.

How can the data scientists transfer the learning from the pre-trained model? They actually can “strip off” the standard neural network of the pre-trained model, then add a new standard neural network to build a new model. The new model will be trained with the warship images and the corresponding labels. This new model will be able to identify a warship as a carrier, a destroyer, a battleship, a frigate, or a guardian shield.

(E) Computer Vision

You may ask where transfer learning sits in computer vision (CV). Here I dedicate a section to describing related fields in CV. This will give you a good view of the body of knowledge. And you may determine how to invest your time and effort.

(E.1) Image Classification

This line of modeling techniques identifies an image. A neural network model is trained on millions of images with the corresponding labels. Figure (E.1) demonstrates the idea. The dataset has the target class “airplane” with a variety of airplane images, or the class “automobile” with automobile images. Once the model is trained, it can identify a new airplane image. The new airplane image does not need to be the same as any airplane images in the training dataset. As long as the new image looks like an airplane, the model can recognize it.

The training dataset could not have all classes. What if you need to identify the front or rear view of an automobile specifically? Although the target classes of the data include “automobile”, it does not have the class “front view” or “rear view” of an automobile specifically. The model will just label all car images as “automobile” regardless front view or rear view. This is where transfer learning comes in. You can leverage a pre-trained model to build a new model. You train the new model with the classes of the front view and rear view of automobile images. Then your new model can recognize an image accordingly. Transfer learning is relatively easy. You can consider it as the point of entry to computer vision.

(E.2) Object Detection

Suppose you record yourself throwing a ball to your dog and it takes the ball back to you. The Object Detection Techniques help to identify you, the dog, and the ball. The techniques have wide commercial applications such as counting vehicles on a highway or detecting faces in a crowd. The most popular algorithm for real-time object detection is YOLO (You Only Look Once).

(E.3) Pixel Segmentation

It is not trivial for a machine to identify an object of irregular shapes such as a tree or a road. The Pixel Segmentation Techniques classify pixels into clusters of pixels that make sense. The Mean Shift algorithm and the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm are popular techniques. This line of techniques is relatively easy to learn. If you are interested in computer vision, you are advised to learn it.

(E.4) Generative Models or Generative Adversarial Networks (GAN)

This line of techniques generates new photos. Figure (E.4) shows a generative model that can “imagine” the photos of a person at various ages. The model was first trained on millions of images (X) with their labels (Y) and learned the conditional probability P(Y|X). The conditional probability simply means the probability of Y given X. Suppose X are the attributes of a “Mona Lisa” image and Y is “Mona Lisa”. By seeing all the attributes of Mona Lisa (X) in an image, a model can predict the image is Mona Lisa.

GAN creates new photos given a label Y. If Y is “Mona Lisa”, the model can generate many new images of Mona Lisa. In mathematical form, it applies the Bayesian rules and reverses P(Y|X) to P(X|Y). Interested readers are recommended to study the paper by Goodfellow et al. titled “Generative Adversarial Networks”. The reversal of the above P(Y|X) to P(X|Y) is called “Bayesian thinking”. Interested readers who want to know more about it can reference Section 4 of my article “A Wide Variety of Models for Multi-class Classification”.

Figure (E.4): Image credit: The paper “Face aging with conditional GANs”

(F) Do Other Fields Use Transfer Learning?

You may ask, “Since the transfer learning techniques are so powerful, are they applied in other fields?” The transfer learning techniques have been applied to Natural Language Processing (NLP), and Sound Classification.

(F.1) Natural Language Processing (NLP)

The modern natural language models can answer questions, program computer code, or even compose stories (surprisingly). Those pre-trained NLP models are large and contain millions of parameters. It is very hard for individual researchers to develop their own NLP models. Transfer learning techniques can be applied to the pre-trained NLP models to develop customized speech models.

To help readers to understand the scale of the modern NLP models, let me present some statistics. A well-known deep neural network GPT-3 has 175 billion parameters. Its next-generation model GPT-4 has 100 trillion Parameters. Other than GPT-3, another popular pre-trained NLP is the BERT model which has 110M parameters.

(F.2) Sound Classification

Transfer learning is also applied to sound classification. A pre-trained audio classification is YAMNet. It is a deep neural network that can predict audio events from 521 classes. Below is a short list:

Speech
Child speech, kid speaking
Conversation
Narration, monologue
Babbling
Speech synthesizer
Shout
Bellow
Whoop
Yell
Children shouting
Screaming
Whispering
Laughter
Baby laughter
Giggle
Snicker
Belly laugh
Chuckle, chortle
Crying, sobbing
......

Although the YAMNet model is large, the 521 classes of sounds are apparently not exhaustive. We can apply transfer learning techniques to the above model to recognize more other sounds. For example, the project in this TensorFlow tutorial transfers YAMNet to recognize the environmental sound. How does it work? In order to build a new model to recognize environmental sounds, we need environmental sounds with labels. That project uses the dataset ESC-50, which contains 2,000 five-second long environmental audio recordings with labels. The dataset was compiled by Piczak (2015). I mention data sources because it is not a trivial job to compile a data source. A comprehensive data source can facilitate more advanced algorithms. In the next chapter we will learn that data invention and algorithm development are two pillars towards success in this field.

Conclusion

I hope this chapter motivates you for transfer learning. We learned what pre-trained models can and cannot do. We learned how transfer learning helps to overcome the obstacles of large-scale image classification models. We understood the research frontiers in computer vision. We surveyed how transfer learning techniques are applied in other fields. In the next chapter we will learn that data invention and algorithm development are two pillars towards success in this field. Let’s continue to the next chapter “Chapter 2: The stories on pre-trained models”.

Chapter 1: What is transfer learning?

Chapter 2: The stories of pre-trained models

Chapter 3: Let’s understand a convolutional neural network

Chapter 4: Let’s visualize the pre-trained VGG-16 model layer-by-layer

Chapter 5: Get image data, ready, and go

Chapter 6: Build and fine-tune your transfer learning model

Join Medium with my referral link — Chris Kuo/Dr. Dataman

Readers are recommended to purchase books by Chris Kuo:

The explainable AI: https://a.co/d/cNL8Hu4
Transfer learning for image classification: https://a.co/d/hLdCkMH
Modern time series anomaly detection: https://a.co/d/ieIbAxM
Handbook of Anomaly Detection: https://a.co/d/5sKS8bI