Image Classification: CLIP and ResNext

Summary

The article compares the performance of CLIP (Contrastive Language–Image Pre-training) and ResNext models in image classification tasks using Google Colab.

Abstract

The article provides a practical comparison between two advanced neural network models, CLIP and ResNext, for image classification. It outlines the process of setting up a Jupyter Notebook on Google Colab to test these models on various images, including the necessary steps to optimize performance by selecting GPU or TPU runtime. The author emphasizes the simplicity of using pre-trained models from OpenAI and PyTorch Hub without delving into the underlying theory. The tests conducted reveal that while both models can accurately classify images, CLIP demonstrates superior performance in predicting unseen object categories, albeit with longer processing times. The article also references additional resources for hosting machine learning models and other practical machine learning applications.

Opinions

The author suggests that CLIP's performance is more representative and robust across different datasets, not limited to ImageNet.
It is implied that the ease of using Google Colab, with its pre-installed libraries and ability to select GPU or TPU runtime, enhances the efficiency of conducting such tests.
The author's opinion on the ResNext model is neutral, acknowledging its modular design but not explicitly favoring it over CLIP.
There is an implication that the longer processing time for CLIP might be a trade-off for its better prediction capabilities on unseen data.
The author encourages readers to explore further by providing links to articles on serving machine learning models using Streamlit and FastAPI, indicating the practical value of such knowledge.

Overview

CLIP (Contrastive Language–Image Pre-training) is a new neural network introduced by OpenAI. There is a very detailed paper talking about it and you can go through it if you are interested. It is claimed that CLIP’s performance is much more representative of how it will fare on datasets that measure accuracy in different, non-ImageNet settings.

Image from CLIP

ResNext is a simple, highly modularized network architecture for image classification. The network is constructed by repeating a building block that aggregates a set of transformations with the same topology.

Without going into any theory, I am going to use both models and perform some image classification testing using Jupyter Notebook and Google Colab.

Project and Library Setup

Let’s install the Python libraries and clone the repository to download additional Python files and the images that will be used for the testing.

Since the Colab virtual machine comes with PyTorch and cudatoolkit pre-installed, I will not be installing them again.

Project and Library Setup

As you can see from the screenshot above, the current CUDA version is 10.1

Image Classification Testing

Using a combination of different images, I performed a test using both prediction methods.

Using a simple panda image, both models are able to predict correctly.

Prediction made by CLIP and ResNext

And here is the test for other images.

CLIP and ResNext Image Classification Test

Just by this quick test and the observation from the results, it seems CLIP is able to make better predictions for unseen object categories. However, CLIP is taking a much longer time to come out with the prediction.

Also check out the following articles to see how we can host machine learning models using Streamlit and FastAPI, including ResNext.

And more articles below on practical usages of machine learning.

Image Classification: CLIP vs ResNext on Colab

Overview

Google Colab

Project and Library Setup

Import Libraries and Pre-trained Models

Prediction using CLIP and ResNext on ImageNet Classes

Image Classification Testing

Serving Machine Learning Models (DCGAN, PGAN, ResNext) using FastAPI and Streamlit

Overview

Generate Image from Text

Text to image using Jupyter Notebook on Google Colab.

YOLO using FastAPI WebSocket and React

Overview

RPA and Web Scraping using Jupyter

Overview

Serverless Machine Learning APIs using Lambda and EFS

Overview