DL Tutorial 6 — Pooling and Padding Techniques in CNNs

Learn how pooling and padding techniques are used in convolutional neural networks.

Table of Contents 1. Introduction 2. What is Pooling? 3. Types of Pooling 4. What is Padding? 5. Types of Padding 6. Benefits and Drawbacks of Pooling and Padding 7. Conclusion

Subscribe for FREE to get your 42 pages e-book: Data Science | The Comprehensive Handbook

Get step-by-step e-books on Python, ML, DL, and LLMs.

1. Introduction

In this tutorial, you will learn about pooling and padding techniques in convolutional neural networks (CNNs). CNNs are a type of deep learning model that can perform various tasks such as image classification, object detection, face recognition, and more. CNNs consist of multiple layers that process the input data and extract features that are relevant for the task. Two common types of layers in CNNs are convolutional layers and pooling layers.

Convolutional layers apply a set of filters to the input data and produce feature maps that capture the spatial information of the data. Pooling layers reduce the size of the feature maps and make the model more efficient and robust to noise. Padding is a technique that adds extra pixels to the input data or the feature maps to preserve the spatial information and avoid losing information at the edges.

In this tutorial, you will learn:

What is pooling and why it is used in CNNs
What are the types of pooling and how they differ
What is padding and why it is used in CNNs
What are the types of padding and how they differ
What are the benefits and drawbacks of pooling and padding

By the end of this tutorial, you will be able to apply pooling and padding techniques to your own CNN models and understand their effects on the performance and accuracy of the model.

Let’s get started!

2. What is Pooling?

Pooling is a technique that reduces the size of the feature maps produced by the convolutional layers in a CNN. Pooling is also known as downsampling or subsampling, as it reduces the number of pixels or units in each feature map. Pooling is usually applied after one or more convolutional layers, and before the fully connected layers or the output layer of the CNN.

Why do we need pooling? Pooling has several benefits for the CNN model, such as:

It reduces the computational cost and memory usage of the model, as it reduces the number of parameters to learn and the amount of data to process.
It makes the model more robust to noise and variations in the input data, as it reduces the sensitivity of the feature maps to small changes in the input.
It enhances the feature extraction capability of the model, as it extracts the most important or dominant features from the feature maps, such as edges, corners, shapes, etc.

How does pooling work? Pooling works by applying a pooling function to a small region or window of the feature map, and outputting a single value for that region. The pooling function can be different depending on the type of pooling, but the most common ones are max pooling and average pooling. We will discuss these types of pooling in the next section.

Pooling is usually done with a fixed size and stride for the pooling window. The size of the pooling window determines how many pixels or units are pooled together, and the stride determines how much the window moves across the feature map. For example, if the pooling window size is 2x2 and the stride is 2, then the window will move 2 pixels horizontally and vertically each time, and cover the entire feature map without overlapping. The output of the pooling layer will have half the height and width of the input feature map.

Here is an example of how pooling works on a 4x4 feature map, with a 2x2 pooling window and a stride of 2. The pooling function is max pooling, which outputs the maximum value in each window.

# Input feature map
    [[1, 2, 3, 4],
     [5, 6, 7, 8],
     [9, 10, 11, 12],
     [13, 14, 15, 16]]

    # Output feature map after max pooling
    [[6, 8],
     [14, 16]]

In the next section, we will learn about the different types of pooling and how they affect the CNN model.

3. Types of Pooling

There are different types of pooling that can be applied to the feature maps in a CNN. The most common ones are max pooling and average pooling, but there are also other types such as min pooling, median pooling, and adaptive pooling. Each type of pooling has its own advantages and disadvantages, and can affect the performance and accuracy of the CNN model in different ways.

Let’s look at each type of pooling in more detail.

3.1 Max Pooling

Max pooling is the most widely used type of pooling in CNNs. Max pooling outputs the maximum value in each pooling window, and discards the rest of the values. Max pooling is useful for extracting the most prominent features from the feature maps, such as edges, corners, shapes, etc. Max pooling also provides a degree of translation invariance, meaning that the output of the pooling layer does not change much if the input is slightly shifted or rotated.

However, max pooling also has some drawbacks. Max pooling can lose some important information from the feature maps, as it only keeps the maximum value and ignores the rest. Max pooling can also be sensitive to outliers, meaning that a single very high or very low value can affect the output of the pooling layer. Max pooling can also reduce the resolution of the feature maps, making them less detailed and precise.

Here is an example of how max pooling works on a 4x4 feature map, with a 2x2 pooling window and a stride of 2. The pooling function is max pooling, which outputs the maximum value in each window.

# Input feature map
    [[1, 2, 3, 4],
     [5, 6, 7, 8],
     [9, 10, 11, 12],
     [13, 14, 15, 16]]

    # Output feature map after max pooling
    [[6, 8],
     [14, 16]]

3.2 Average Pooling

Average pooling is another common type of pooling in CNNs. Average pooling outputs the average value in each pooling window, and discards the rest of the values. Average pooling is useful for smoothing the feature maps and reducing the noise and variations in the input data. Average pooling also provides a degree of rotation invariance, meaning that the output of the pooling layer does not change much if the input is slightly rotated.

However, average pooling also has some drawbacks. Average pooling can lose some important features from the feature maps, as it blends the values together and reduces the contrast. Average pooling can also be affected by the background or irrelevant pixels, meaning that the output of the pooling layer can be influenced by the values that are not related to the object of interest. Average pooling can also reduce the resolution of the feature maps, making them less detailed and precise.

Here is an example of how average pooling works on a 4x4 feature map, with a 2x2 pooling window and a stride of 2. The pooling function is average pooling, which outputs the average value in each window.

# Input feature map
    [[1, 2, 3, 4],
     [5, 6, 7, 8],
     [9, 10, 11, 12],
     [13, 14, 15, 16]]

    # Output feature map after average pooling
    [[3.5, 5.5],
     [11.5, 13.5]]

3.3 Other Types of Pooling

Besides max pooling and average pooling, there are also other types of pooling that can be used in CNNs. Some examples are:

Min pooling: outputs the minimum value in each pooling window, and discards the rest of the values. Min pooling is useful for extracting the darkest or lowest features from the feature maps, such as shadows, holes, gaps, etc.
Median pooling: outputs the median value in each pooling window, and discards the rest of the values. Median pooling is useful for removing outliers and noise from the feature maps, as it outputs the middle value that is not affected by extreme values.
Adaptive pooling: adjusts the size and stride of the pooling window according to the input size, and outputs a fixed size feature map. Adaptive pooling is useful for handling inputs of different sizes and resolutions, as it outputs a consistent size feature map that can be fed to the next layer of the CNN.

Each type of pooling has its own pros and cons, and can be used for different purposes and applications. The choice of pooling depends on the task, the data, and the desired output of the CNN model.

In the next section, we will learn about padding and how it is used in CNNs.

4. What is Padding?

Padding is a technique that adds extra pixels or units to the input data or the feature maps in a CNN. Padding is usually applied before or after the convolutional layers, and before the pooling layers of the CNN. Padding is also known as zero-padding, as it usually adds zeros to the input data or the feature maps, but other values can also be used.

Why do we need padding? Padding has several benefits for the CNN model, such as:

It preserves the spatial information and avoids losing information at the edges of the input data or the feature maps, as it adds extra pixels or units that can be processed by the convolutional filters.
It controls the output size and resolution of the feature maps, as it adjusts the input size and shape to match the desired output size and shape.
It improves the performance and accuracy of the model, as it reduces the overfitting and increases the generalization of the model.

How does padding work? Padding works by adding extra pixels or units to the input data or the feature maps along the height and width dimensions. The amount of padding can be different depending on the type of padding, but the most common ones are valid padding and same padding. We will discuss these types of padding in the next section.

Padding is usually done with a fixed size and shape for the convolutional filters. The size and shape of the convolutional filters determine how many pixels or units are processed at a time, and how much the filters move across the input data or the feature maps. For example, if the convolutional filter size is 3x3 and the stride is 1, then the filter will process 3x3 pixels or units at a time, and move 1 pixel or unit horizontally and vertically each time, and cover the entire input data or feature map without skipping or overlapping.

Here is an example of how padding works on a 4x4 input data, with a 3x3 convolutional filter and a stride of 1. The padding type is same padding, which adds enough padding to make the output size equal to the input size.

# Input data
    [[1, 2, 3, 4],
     [5, 6, 7, 8],
     [9, 10, 11, 12],
     [13, 14, 15, 16]]

    # Input data after same padding
    [[0, 0, 0, 0, 0, 0],
     [0, 1, 2, 3, 4, 0],
     [0, 5, 6, 7, 8, 0],
     [0, 9, 10, 11, 12, 0],
     [0, 13, 14, 15, 16, 0],
     [0, 0, 0, 0, 0, 0]]

    # Output data after convolution
    [[X, X, X, X],
     [X, X, X, X],
     [X, X, X, X],
     [X, X, X, X]]

In the next section, we will learn about the different types of padding and how they affect the CNN model.

5. Types of Padding

There are different types of padding that can be applied to the input data or the feature maps in a CNN. The most common ones are valid padding and same padding, but there are also other types such as full padding and causal padding. Each type of padding has its own advantages and disadvantages, and can affect the output size and resolution of the feature maps in different ways.

Let’s look at each type of padding in more detail.

5.1 Valid Padding

Valid padding is the simplest type of padding in CNNs. Valid padding does not add any extra pixels or units to the input data or the feature maps, and only processes the valid pixels or units that fit the convolutional filter size and stride. Valid padding is also known as no padding, as it does not change the input size or shape.

Why do we use valid padding? Valid padding has some benefits for the CNN model, such as:

It reduces the computational cost and memory usage of the model, as it reduces the number of parameters to learn and the amount of data to process.
It avoids adding unnecessary or irrelevant information to the input data or the feature maps, as it only processes the original pixels or units that contain the actual information.
It prevents overfitting and increases the generalization of the model, as it reduces the complexity and variance of the model.

However, valid padding also has some drawbacks. Valid padding can lose some important information from the input data or the feature maps, as it discards the pixels or units that do not fit the convolutional filter size and stride. Valid padding can also reduce the output size and resolution of the feature maps, making them smaller and less detailed than the input data.

Here is an example of how valid padding works on a 4x4 input data, with a 3x3 convolutional filter and a stride of 1. The padding type is valid padding, which does not add any padding to the input data.

# Input data
    [[1, 2, 3, 4],
     [5, 6, 7, 8],
     [9, 10, 11, 12],
     [13, 14, 15, 16]]

    # Input data after valid padding
    [[1, 2, 3, 4],
     [5, 6, 7, 8],
     [9, 10, 11, 12],
     [13, 14, 15, 16]]

    # Output data after convolution
    [[X]]

5.2 Same Padding

Same padding is another common type of padding in CNNs. Same padding adds enough extra pixels or units to the input data or the feature maps to make the output size equal to the input size, regardless of the convolutional filter size and stride. Same padding is also known as zero padding, as it usually adds zeros to the input data or the feature maps, but other values can also be used.

Why do we use same padding? Same padding has some benefits for the CNN model, such as:

It preserves the spatial information and avoids losing information at the edges of the input data or the feature maps, as it adds extra pixels or units that can be processed by the convolutional filters.
It controls the output size and resolution of the feature maps, as it adjusts the input size and shape to match the desired output size and shape.
It improves the performance and accuracy of the model, as it increases the complexity and diversity of the model.

However, same padding also has some drawbacks. Same padding can increase the computational cost and memory usage of the model, as it increases the number of parameters to learn and the amount of data to process. Same padding can also add unnecessary or irrelevant information to the input data or the feature maps, as it adds extra pixels or units that do not contain any actual information. Same padding can also cause overfitting and reduce the generalization of the model, as it increases the variance and noise of the model.

Here is an example of how same padding works on a 4x4 input data, with a 3x3 convolutional filter and a stride of 1. The padding type is same padding, which adds enough padding to make the output size equal to the input size.

# Input data
    [[1, 2, 3, 4],
     [5, 6, 7, 8],
     [9, 10, 11, 12],
     [13, 14, 15, 16]]

    # Input data after same padding
    [[0, 0, 0, 0, 0, 0],
     [0, 1, 2, 3, 4, 0],
     [0, 5, 6, 7, 8, 0],
     [0, 9, 10, 11, 12, 0],
     [0, 13, 14, 15, 16, 0],
     [0, 0, 0, 0, 0, 0]]

    # Output data after convolution
    [[X, X, X, X],
     [X, X, X, X],
     [X, X, X, X],
     [X, X, X, X]]

5.3 Other Types of Padding

Besides valid padding and same padding, there are also other types of padding that can be used in CNNs. Some examples are:

Full padding: adds enough extra pixels or units to the input data or the feature maps to make the output size equal to the input size plus the filter size minus one, regardless of the convolutional filter size and stride. Full padding is useful for increasing the output size and resolution of the feature maps, and for processing the entire input data or feature map without discarding any pixels or units.
Causal padding: adds extra pixels or units only to the left or top of the input data or the feature maps, and not to the right or bottom. Causal padding is useful for preserving the temporal or spatial order of the input data or the feature maps, and for avoiding future information leakage in sequential or time-series data.

Each type of padding has its own pros and cons, and can be used for different purposes and applications. The choice of padding depends on the task, the data, and the desired output of the CNN model.

In the next section, we will learn about the benefits and drawbacks of pooling and padding in CNNs.

6. Benefits and Drawbacks of Pooling and Padding

In this section, we will summarize the benefits and drawbacks of pooling and padding techniques in CNNs, and compare them with each other. We will also provide some tips and best practices on how to choose and apply these techniques to your own CNN models.

6.1 Benefits of Pooling and Padding

Pooling and padding are two important techniques that can improve the performance and accuracy of CNN models. Here are some of the benefits of pooling and padding:

Pooling reduces the size of the feature maps and makes the model more efficient and robust. Pooling can extract the most important or dominant features from the feature maps, such as edges, corners, shapes, etc. Pooling can also provide some degree of translation and rotation invariance, meaning that the output of the pooling layer does not change much if the input is slightly shifted or rotated.
Padding preserves the spatial information and avoids losing information at the edges of the input data or the feature maps. Padding can adjust the input size and shape to match the desired output size and shape. Padding can also improve the performance and accuracy of the model, as it reduces the overfitting and increases the generalization of the model.

6.2 Drawbacks of Pooling and Padding

However, pooling and padding also have some drawbacks that can affect the performance and accuracy of CNN models. Here are some of the drawbacks of pooling and padding:

Pooling can lose some important information from the feature maps, as it only keeps one value and discards the rest. Pooling can also be sensitive to outliers, meaning that a single very high or very low value can affect the output of the pooling layer. Pooling can also reduce the resolution of the feature maps, making them less detailed and precise.
Padding can increase the computational cost and memory usage of the model, as it increases the number of parameters to learn and the amount of data to process. Padding can also add unnecessary or irrelevant information to the input data or the feature maps, as it adds extra pixels or units that do not contain any actual information. Padding can also cause overfitting and reduce the generalization of the model, as it increases the complexity and diversity of the model.

6.3 Comparison of Pooling and Padding

Pooling and padding are two complementary techniques that can be used together or separately in CNN models. The choice of pooling and padding depends on the task, the data, and the desired output of the model. Here are some general guidelines on how to compare and choose pooling and padding:

If you want to reduce the size of the feature maps and make the model more efficient and robust, you can use pooling. If you want to preserve the spatial information and avoid losing information at the edges of the input data or the feature maps, you can use padding.
If you want to extract the most prominent features from the feature maps, such as edges, corners, shapes, etc., you can use max pooling. If you want to smooth the feature maps and reduce the noise and variations in the input data, you can use average pooling.
If you want to make the output size equal to the input size, regardless of the convolutional filter size and stride, you can use same padding. If you want to process only the valid pixels or units that fit the convolutional filter size and stride, you can use valid padding.
If you want to increase the output size and resolution of the feature maps, and process the entire input data or feature map without discarding any pixels or units, you can use full padding. If you want to preserve the temporal or spatial order of the input data or the feature maps, and avoid future information leakage in sequential or time-series data, you can use causal padding.

In the next section, we will conclude this tutorial and provide some resources for further learning.

7. Conclusion

In this tutorial, you have learned about pooling and padding techniques in convolutional neural networks (CNNs). You have learned:

What is pooling and why it is used in CNNs
What are the types of pooling and how they differ
What is padding and why it is used in CNNs
What are the types of padding and how they differ
What are the benefits and drawbacks of pooling and padding

By following this tutorial, you have gained a better understanding of how pooling and padding work and how they affect the performance and accuracy of CNN models. You have also learned how to choose and apply these techniques to your own CNN models and understand their effects on the output size and resolution of the feature maps.

We hope you have enjoyed this tutorial and found it useful for your learning. If you want to learn more about CNNs and other deep learning topics, you can check out the following resources:

[A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way]: A beginner-friendly introduction to CNNs and their applications.
[Convolutional Neural Networks (CNNs / ConvNets)]: A detailed and interactive explanation of CNNs and their components.
[CS231n: Convolutional Neural Networks for Visual Recognition]: A popular and advanced course on CNNs and computer vision from Stanford University.

Thank you for reading this tutorial and happy learning!

The complete tutorial list is here:

Deep Learning Tutorial Series: 50 Step-by-Step Lessons [FREE][2024]

Edit description

medium.com

Subscribe for FREE to get your 42 pages e-book: Data Science | The Comprehensive Handbook

Get step-by-step e-books on Python, ML, DL, and LLMs.