Haar Cascades, Explained

You see facial recognition everywhere, from the security camera in the front porch of your house to your sensor on your iPhone X. But how exactly does facial recognition work to classify faces, considering the large number of features as input and the striking similarities between humans?

Enter Haar classifiers, classifiers that were used in the first real-time face detector. A Haar classifier, or a Haar cascade classifier, is a machine learning object detection program that identifies objects in an image and video.
A detailed description of Haar classifiers can be seen in Paul Viola and Michael Jones’s paper “Rapid Object Detection using a Boosted Cascade of Simple Features”, linked over here. Note that the article goes into some mathematics, and assumes knowledge of machine learning terminology. If you want a summarized, high-level overview, make sure to keep reading!
Making a Haar Cascade Classifier
Note: This discussion will assume basic knowledge of boosting algorithms and weak vs. strong learners with regards to machine learning. Click here for a quick Adaboost tutorial.
The algorithm can be explained in four stages:
- Calculating Haar Features
- Creating Integral Images
- Using Adaboost
- Implementing Cascading Classifiers
It’s important to remember that this algorithm requires a lot of positive images of faces and negative images of non-faces to train the classifier, similar to other machine learning models.
Calculating Haar Features
The first step is to collect the Haar features. A Haar feature is essentially calculations that are performed on adjacent rectangular regions at a specific location in a detection window. The calculation involves summing the pixel intensities in each region and calculating the differences between the sums. Here are some examples of Haar features below.

These features can be difficult to determine for a large image. This is where integral images come into play because the number of operations is reduced using the integral image.
Creating Integral Images
Without going into too much of the mathematics behind it (check out the paper if you’re interested in that), integral images essentially speed up the calculation of these Haar features. Instead of computing at every pixel, it instead creates sub-rectangles and creates array references for each of those sub-rectangles. These are then used to compute the Haar features.

It’s important to note that nearly all of the Haar features will be irrelevant when doing object detection, because the only features that are important are those of the object. However, how do we determine the best features that represent an object from the hundreds of thousands of Haar features? This is where Adaboost comes into play.
Adaboost Training
Adaboost essentially chooses the best features and trains the classifiers to use them. It uses a combination of “weak classifiers” to create a “strong classifier” that the algorithm can use to detect objects.
Weak learners are created by moving a window over the input image, and computing Haar features for each subsection of the image. This difference is compared to a learned threshold that separates non-objects from objects. Because these are “weak classifiers,” a large number of Haar features is needed for accuracy to form a strong classifier.

The last step combines these weak learners into a strong learner using cascading classifiers.
Implementing Cascading Classifiers

The cascade classifier is made up of a series of stages, where each stage is a collection of weak learners. Weak learners are trained using boosting, which allows for a highly accurate classifier from the mean prediction of all weak learners.
Based on this prediction, the classifier either decides to indicate an object was found (positive) or move on to the next region (negative). Stages are designed to reject negative samples as fast as possible, because a majority of the windows do not contain anything of interest.
It’s important to maximize a low false negative rate, because classifying an object as a non-object will severely impair your object detection algorithm. A video below shows Haar cascades in action. The red boxes denote “positives” from the weak learners.








