avataralpha2phi

Summary

The article compares the performance of CLIP (Contrastive Language–Image Pre-training) and ResNext models in image classification tasks using Google Colab.

Abstract

The article provides a practical comparison between two advanced neural network models, CLIP and ResNext, for image classification. It outlines the process of setting up a Jupyter Notebook on Google Colab to test these models on various images, including the necessary steps to optimize performance by selecting GPU or TPU runtime. The author emphasizes the simplicity of using pre-trained models from OpenAI and PyTorch Hub without delving into the underlying theory. The tests conducted reveal that while both models can accurately classify images, CLIP demonstrates superior performance in predicting unseen object categories, albeit with longer processing times. The article also references additional resources for hosting machine learning models and other practical machine learning applications.

Opinions

  • The author suggests that CLIP's performance is more representative and robust across different datasets, not limited to ImageNet.
  • It is implied that the ease of using Google Colab, with its pre-installed libraries and ability to select GPU or TPU runtime, enhances the efficiency of conducting such tests.
  • The author's opinion on the ResNext model is neutral, acknowledging its modular design but not explicitly favoring it over CLIP.
  • There is an implication that the longer processing time for CLIP might be a trade-off for its better prediction capabilities on unseen data.
  • The author encourages readers to explore further by providing links to articles on serving machine learning models using Streamlit and FastAPI, indicating the practical value of such knowledge.

Image Classification: CLIP vs ResNext on Colab

CLIP and ResNext Classification Test

Overview

CLIP (Contrastive Language–Image Pre-training) is a new neural network introduced by OpenAI. There is a very detailed paper talking about it and you can go through it if you are interested. It is claimed that CLIP’s performance is much more representative of how it will fare on datasets that measure accuracy in different, non-ImageNet settings.

Image from CLIP

ResNext is a simple, highly modularized network architecture for image classification. The network is constructed by repeating a building block that aggregates a set of transformations with the same topology.

Without going into any theory, I am going to use both models and perform some image classification testing using Jupyter Notebook and Google Colab.

Google Colab

From Google Colab, open the notebook available in this repository.

Google Colab — Open Notebook from GitHub

For better performance, ensure that you change the runtime type to GPU or TPU under Runtime -> Change runtime type in Google Colab.

Google Colab — GPU Runtime

Project and Library Setup

Let’s install the Python libraries and clone the repository to download additional Python files and the images that will be used for the testing.

Since the Colab virtual machine comes with PyTorch and cudatoolkit pre-installed, I will not be installing them again.

Project and Library Setup

As you can see from the screenshot above, the current CUDA version is 10.1

Import Libraries and Pre-trained Models

Let’s import the required libraries used by the notebook.

For the CLIP pre-trained model, I download it from the OpenAI site using the provided CLIP code snippet.

For the ResNext pre-trained model, I use the model from PyTorch Hub.

Import Libraries and Pre-trained Models

Prediction using CLIP and ResNext on ImageNet Classes

I implemented 2 methods — predict_clip and predict_resnext using the 1000 ImageNet classes. Both methods return the top 5 probable classes.

Prediction Methods in CLIP and ResNext

Image Classification Testing

Using a combination of different images, I performed a test using both prediction methods.

Using a simple panda image, both models are able to predict correctly.

Prediction made by CLIP and ResNext

And here is the test for other images.

CLIP and ResNext Image Classification Test

Just by this quick test and the observation from the results, it seems CLIP is able to make better predictions for unseen object categories. However, CLIP is taking a much longer time to come out with the prediction.

Also check out the following articles to see how we can host machine learning models using Streamlit and FastAPI, including ResNext.

And more articles below on practical usages of machine learning.

Machine Learning
Python
Jupyter Notebook
Programming
Deep Learning
Recommended from ReadMedium