Summary

The web content provides a comprehensive guide to implementing multilabel classification using PyTorch and the Stanford Car Dataset, demonstrating how to classify multiple features of car images with a single model.

Abstract

The article outlines a method for performing multilabel image classification, focusing on categorizing car images based on brand, type, and year of manufacture. Utilizing the Stanford Car Dataset, the author explains the process of creating a custom dataset, adapting a pre-trained ResNet34 model, and developing a flexible training routine that can handle multiple classification heads. The approach leverages transfer learning to achieve high accuracy in predicting multiple classes simultaneously, showcasing the potential for applications like used car platforms that require feature extraction from images. The article also hints at addressing multilabel classification with non-exclusive classes in subsequent discussions.

Opinions

The author advocates for the efficiency of using a single model to classify multiple features, as opposed to training independent classifiers for each feature.
There is an emphasis on the practical application of multilabel classification models in real-world scenarios, such as used car platforms.
The author suggests that with the right approach, complex classification tasks can be simplified and made more efficient.
The use of transfer learning is presented as a key factor in achieving good results quickly, despite the complexity of the task.
The article implies that the methods discussed are accessible and can be implemented within a short time frame, as indicated by the title "Multilabel Classification With PyTorch In 5 Minutes."
The author expresses that the techniques demonstrated are not limited to car classification and can be applied to other domains, such as social network image tagging.

Multilabel Classification With PyTorch In 5 Minutes

A blueprint for your own classification task

When dealing with image classification, one often starts by classifying one or more categories within a class. For example, if you want to classify cars, you could make the distinction of whether it is a convertible or not. This would be an example of binary classification. A more complex task could be to distinguish between several categories. Is it an Audi, a BMW, a Mercedes or a Ford? There is more than one category within the car brand. What if we want to combine both examples? We could classify multiple features at once for each image showing a vehicle, e.g. the brand, the vehicle type, and the year of manufacture. One way would be to train three independent classifiers, but it is also possible to integrate everything into one model.

We will do this together with the Stanford Car Dataset which is free to use for educational purposes. Here we go:

First, we created two functions to a) download and extract the images themselves and b) store the corresponding metadata (containing information about the brand and model). In a next step, we create a class that merges both information and extracts a total of three relevant features:

All brands in the dataset with more than 1000 images. We put all other brands into the category Other.

We distinguish between different types of vehicles: Convertible, Coupe, SUV, Van. All models without reference to the vehicle type, we summarize to the category “Other”.

We divide the carpool into two time-related cohorts: All cars released in 2009 and earlier and all cars released in 2010 and later.

So we have three targets with different classes, each of which we want to predict all at the same time. We can extract all the needed information from the metadata.

As described in the docstrings of the class, we can pass dictionaries that contain the categories for our class:

As expected, we get a list containing three lists of numeric features for our three classes (brand, type, year). These are our training labels. We can use the dictionaries to reassign them later:

At first glance, we have enough cases for each class. We do have skewed distributions, but we could mitigate that with weighting. We leave the classes as they are and create a dictionary for our custom dataset. We assign the corresponding training-labels to each filename:

Next we will create our custom dataset. For a deeper introduction you can have a look at this article of mine. Basically, there is nothing special yet. The only difference is that we load three taining-labels for each sample instead of one, and pass all three into our training loop:

We can load a sample with the dataloader and look at it:

Our custom dataset and the dataloader work as intended. We get one dictionary per batch with the images and 3 target labels. With this we have the prerequisites for our multilabel classifier.

Custom Multilabel Classifier (by the author)

First, we load a pretrained ResNet34 and display the last 3 children elements. First comes a sequential block, then a pooling operation and finally a linear layer. This gets 512 features as input and gives 1000 as output. We want to remove this last layer and replace it with new layers. We already know that we have 512 in-features each and need a) 6 out-features for the brands, b) 5 out-features for the vehicle types and c) 2 out-features for the epochs. We can remove the last layer by putting all children elements into a list and removing the last element:

We can process an output with our ResNet without a classifier head and look at the respective tensor shapes:

As a result we get a tensor with the format [16,512,1,1]. We have 16 samples in our batch and 512 features per image. The 3rd and 4th dimension has size 1 and can be smoothed by torch.flatten(). We can now pass this output to our new classifier layers:

This is exactly what we wanted to have. We get 6 logits per sample in our batch. We can now process these as usual using a loss function in our training loop. Now we add the other two classifier layers and put everything together in a custom model:

We create a flexible training routine that takes into account all outputs of our model. Therefore, it does not matter whether we have 2, 3 or, for example, 5 classifier heads. We simply use the conventional loss function for multi-classification tasks. We calculate the CrossEntropyLoss for each head and sum the losses. This way we can optimize the weights with a single optimizer step for all three heads:

We also write the validation routine so that we can pass a flexible number of categories to be classified. We calculate both the total performance per class and the performance per category:

Conclusion

With about 90% accuracy per class, we were able to make good predictions. We saw that we can classify multiple classes with one model without needing multiple models or runs. In our example, we used PyTorch and saw that we can quickly create a custom training routine with a custom dataset and a custom model. Furthermore, we took advantage of transfer learning to get good results quickly despite the complexity of the task. In the real world, there are many such application areas. Imagine you run a used car platform and want to extract suggestions for individual vehicle features directly from the images. We are not that far away from that in our example. There is another form of multilabel classification. Think of image tags in social networks, for example. Here, one has also given certain class, but not every image is forcibly assigned to every class. We will address this issue in the next chapter. Thanks for reading!

Dataset Credits

3D Object Representations for Fine-Grained Categorization Jonathan Krause, Michael Stark, Jia Deng, Li Fei-Fei 4th IEEE Workshop on 3D Representation and Recognition, at ICCV 2013 (3dRR-13). Sydney, Australia. Dec. 8, 2013. [pdf] [BibTex] [slides]

Cars Dataset

Overview The Cars dataset contains 16,185 images of 196 classes of cars. The data is split into 8,144 training images…

ai.stanford.edu

If you enjoy Medium and Towards Data Science and didn’t sign up yet, feel free to use my referral link to join the community.