Summary

The webpage provides a historical overview of the development of Convolutional Neural Networks (CNNs), from their biological inspiration to their current state-of-the-art applications in computer vision.

Abstract

Convolutional Neural Networks (CNNs) have evolved to become a cornerstone of artificial intelligence in image processing and computer vision. This evolution began with the discovery of simple and complex cells in the 1950s, which inspired the creation of artificial neural networks like the Neocognitron by Kunihiko Fukushima in the 1980s. The 1990s saw Yann LeCun's LeNet, a pioneering CNN that set the stage for modern applications. The field has since progressed with larger datasets and more complex models, leading to significant achievements such as the AlexNet's performance in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012. The article concludes by noting the advancement of CNNs and their future potential in 3D object classification tasks.

Opinions

The author suggests that CNNs are the most important neural network architecture for computer vision tasks.
The work of David Hubel and Torsten Wiesel on simple and complex cells is highlighted as foundational for understanding visual pattern recognition.
Fukushima's Neocognitron is recognized as an early influential artificial neural network model inspired by biological processes.
LeCun's LeNet is celebrated for its impact on the field, with its paper being one of the most cited in AI history.
The author expresses admiration for the progress in CNNs, noting the significant reduction in error rates in image classification challenges like ILSVRC.
The author is optimistic about the future of CNNs, anticipating further advancements and new developments in the field.
The author encourages readers interested in deep learning to explore their content and resources, offering a guide and inviting them to subscribe for more insights and private Google Colab notebooks.

Artificial Intelligence Essentials

The Brief History of Convolutional Neural Networks

Discover the history of one of the most popular deep learning models used for almost every computer vision tasks

Figure 1. Photo by Joel Filipe on Unsplash

Convolutional Neural Networks are the most important artificial neural network architecture today for almost any computer vision and image processing-related AI tasks. In this post, we will briefly visit the origins of CNNs from biological experiments of the 1950s until today’s complex pre-trained Computer Vision models.

From Simple and Complex to Grandmother Cells

In 1959, David Hubel and Torsten Wiesel discovered simple and complex cells. According to their study, for visual pattern recognition, we use two kinds of cells. A simple cell can recognize edges and bars of particular orientations at a particular part of the image, such as the image below:

Figure 2. The models of simple and complex cells proposed by Movshon, Thompson, and Tolhurst (Figure by Movshon in the paper)

On the other hand, a complex cell responds to edges and bars of particular orientations as well. In addition to this capability, complex cells -in contrast to simple cells- can respond to these edges and bars at any location in the scene.

For instance, while a simple cell can only respond to a vertical bar located in the upper section of a scene, a complex cell can respond to vertical scenes that are located anywhere in the scene.

Complex cells can achieve this location-agnostic recognition capability by summing information from multiple simple cells. Throughout the human body, we see simple and complex cell structures, which together comprise our visual system.

A grandmother or gnostic cell is a hypothetical neuron that represents a complex but specific concept or object. It activates when a person “sees, hears, or otherwise sensibly discriminates” a specific entity, such as his or her grandmother.

Therefore, in our body, probably there is one complex neuron that obtains its information from other complex neurons (necessary to detect a specific item) and only activates when we have the visual of our grandma.

The Neocognitron by Kunihiko Fukushima

Inspired by Hubel and Wiesel’s work, in the 1980s, Dr. Kunihiko Fukushima designs an artificial neural network that mimics the functioning of simple and complex cells. While S-cells operate as artificial simple cells, C-cells operate as artificial complex cells. They are artificial because they are not biological neurons, but instead, they mimic the algorithmic structure of simple and complex cells. The main idea of Fukushima’s Neocognitron was simple: Capture complex patterns (e.g., a dog) using complex cells that gather their information from other lower-level complex cells or simple cells that detect simpler patterns (e.g., a tail).

Figure 3. Schematic diagram illustrating the interconnections between layers in Neocognitron (Figure by Fukushima in the Neocognitron Paper)

Check out the Neocognitron paper: Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position

The LeNet by Yann LeCun

Although the work of Fukushima was very powerful in the newly developing field of artificial intelligence, the first modern application of convolutional neural networks was implemented in the 90s by Yann LeCun et al. in their paper Gradient-Based Learning Applied to Document Recognition, which is probably by far the most popular AI paper from the 90s (cited by 34378 papers).

Figure 4. The architecture of LeNet-5 for digit recognition (Figure by LeCun in the LeNet Paper)

In the paper, YaCun trained a convolutional neural network with the MNIST dataset of handwritten digits.

The MNIST database contains 60,000 training images and 10,000 testing images taken from American Census Bureau employees and American high school students [Wikipedia]. MNIST dataset contains greyscale RGB codes of handwritten digits (from 0 to 9) with their labels that represent which number they actually are:

Figure 5. An Example Grid of Handwritten Digit Examples (Created from MNIST Dataset)

The idea was a follow-up of Fukushima’s Neocognitron: Aggregating simpler features into more complicated features using complex artificial cells. The LeNet was trained on MNIST by following:

Provide the model with an example image;
Ask the model to predict the label;
Update the model settings comparing the outcome of the prediction and the real label value;
Repeat this process until reaching the optional model settings where the loss is minimized.

LeCun’s implementation set the standards for today’s computer vision and image processing applications.

From the 1990s Onwards

The 90s, 00s, and 10s are the years where the streamlined process of building convolutional neural networks was used for more and more complex models trained on larger and larger datasets.

In 2005, the PASCAL VOC challenge -where the participants compete to achieve the lowest loss + highest accuracy performance with their model- started with approximately 20,000 images and 20 object classes. However, with the advancements in the field, these numbers were dwarfed by other private studies. Starting from 2010, Fei-Fei Li started collaborating with the PASCAL VOC team to make a very large image dataset available with the name, ImageNet. Every year researchers were invited to the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). Currently, the ImageNet dataset contains 14,197,122 images in 1000 different object classes.

Figure 6. An Example Grid of Random Photos (Created from ImageNet Dataset)

In 2012, a deep convolutional neural network architecture called AlexNex achieved a 16% error rate (10% lower than the runner-up) by utilizing GPUs. The use of GPUs for computer vision tasks became standard after AlexNext’s incredible achievement for its time.

Check out the AlexNet paper: ImageNet Classification with Deep Convolutional Neural Networks

In 2017, 29 of 38 competing teams at the ILSVRC achieved less than 5% error. Therefore, since we are at the point of solving complex 2D classification problems, the organizers of ILSVRC announced that the format of ILSVRC would be a 3D object classification in a near future.

Final Notes

From the discovery of simple and complex cells in our brain to 3D object detection challenges, convolutional neural network structures came a long way and they will only get more advanced from this moment onwards. It is exciting to see how many new developments we will see in the near future. If you are interested in being part of this process and learn deep learning, check out the Guide to my Content.

A Guide to My Content on Artificial Intelligence

The guide to help you navigate around my content with ease.

oyalcin.medium.com

more specifically, if you are looking for building your own convolutional neural networks using TensorFlow and classifying handwritten digits using MNIST dataset, check out this article:

Image Classification in 10 Minutes with MNIST Dataset

undefined

Subscribe to the Mailing List for My Latest Content

If you liked what I shared so far, consider subscribing to the Newsletter! ✉️

Subscribe Now

With my subscribers, I also share my private Google Colab notebooks, containing full codes for every post I published.

If you are reading this article, I am sure that we share similar interests and are/will be in similar industries. So let’s connect via Linkedin! Please do not hesitate to send a contact request! Orhan G. Yalçın — Linkedin