Summary

The web content discusses the implementation of Selective Search for object detection in images using Python, highlighting its ability to capture all scales and its computational efficiency compared to exhaustive search methods.

Abstract

The article "Understanding Selective Search for Object Detection" delves into the Selective Search algorithm, a method for generating bounding boxes in object detection tasks. It combines the strengths of exhaustive search and segmentation to propose regions of interest by considering various scales and diversification strategies, such as color, texture, size, and fill. The Python implementation uses the selective-search library, demonstrating how to apply the algorithm to an image and filter the resulting boxes. The article also compares different modes of the algorithm, showcasing its application in popular object detection frameworks like R-CNN, Fast R-CNN, and Faster R-CNN.

Opinions

The authors of the original Selective Search paper emphasize the method's ability to address the variability in object size and orientation in images.
The article suggests that there is no single optimal strategy for grouping regions, which is why the Selective Search algorithm employs multiple diversification strategies.
The author of the article praises the selective-search Python library for its ease of use and full implementation of the algorithm.
The article conveys that Selective Search is not only theoretically sound but also computationally efficient, making it a powerful tool in the realm of object detection.

Understanding Selective Search for Object Detection

An implementation with Python

Object Detection is a technique used in Computer Vision that aims at localizing and classifying objects in images or video. The main idea is that of building bounding boxes that envelop a specific object and then apply a classification algorithm (in the field of Computer Vision it will belong to the family of CNN) to the bounded object, in order to classify it with a probabilistic output.

So as you can see, there are two main steps in an object detection task: building the bounding boxes and classifying objects within them. In this article, we are going to focus on the first step, having a look at a very popular method for building boxes and implementing it in Python.

Selective Search

Selective Search was first introduced in 2012 in the paper of Uijlings et Al (you can find it here). According to the authors, this method combines the strength of both exhaustive search and segmentation:

exhaustive search → aims at finding all the possible locations by systematically enumerating all the possible candidates.
segmentation → uses the image structure to cluster pixels into different regions

Selective search captures both these features, yet with some additional benefits:

It captures all scales, addressing the fact that objects in images can present at different sizes and orientations. This step is based on the intuition of the hierarchical structure of images. The creation of initial regions is done via a graph-based greedy algorithm, which starts grouping the most similar regions until the whole image becomes a unique region.
It diversifies the grouping of regions according to different metrics. Indeed, there is no optimal strategy to group regions together: sometimes might be because of the color, sometimes because of the texture, and so forth. Specifically, the authors introduced four different diversification strategies based on color, texture, size and fill. For each feature, a similarity score (between each couple of regions) is computed. The final similarity score is nothing but a linear combination of the above four similarity scores.
It is computationally fast, differently from its predecessor exhaustive search.

Great, now that we have an idea of the reasoning behind it, let’s see an implementation with Python.

Implementation with Python

For this purpose, I’m going to use the Python library selective-search, which allows full implementation of this algorithm. You can easily install it via pip. Plus, in order to easily get some images, I will also use the skimage library, a very useful one that you can find here.

First thing first, let’s import all the packager we are going to need.

! pip install selective-search

import skimage
import selective_search

import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

Great, now let’s import an image and apply the selective search algorithm on it:

# Load image

image = skimage.data.astronaut()
fig, ax = plt.subplots(figsize=(6, 6))
ax.imshow(image)

# Propose boxes using selective search

boxes = selective_search.selective_search(image, mode='fast')

Before having a look at our results, let’s first examine the parameter mode. It refers to the various combinations of diversification strategies proposed in the paper. More specifically, three strategies are available for this purpose:

Source: https://pypi.org/project/selective-search/

The color spaces identify the domain where the algorithm is performed. The naming convention is explained in detail in the original paper at chapter 3.2. The similarity measures indicate which combinations of the different similarity measures are considered. Namely, the combination CTSF means that we are considering all Color, Texture, Size and Fill similarity measures in the final linear combinations. Finally, with Starting Regions the authors refer to the initial grouping algorithm.

Now let’s see the result:

boxes_filter = selective_search.box_filter(boxes, min_size=20, topN=80)

# drawing rectangles on the original image
fig, ax = plt.subplots(figsize=(6, 6))
ax.imshow(image)
for x1, y1, x2, y2 in boxes_filter:
    bbox = mpatches.Rectangle(
        (x1, y1), (x2-x1), (y2-y1), fill=False, edgecolor='green', linewidth=1)
    ax.add_patch(bbox)

plt.axis('off')
plt.show()

As you can see, the algorithm created several boxes. Of course, the way it did so can be modified by changing the mode parameter. Namely, let’s see a comparison between the three:

Selective search is a powerful technique that is widely used in popular object detection algorithms, like within the family of Region-Based CNNs (R-CNN, Fast R-CNN and Faster R-CNN).

References