Free AI web copilot to create summaries, insights and extended knowledge, download it at here

Abstract

/b></p><ul><li>Calculate the IoUs I between proposals and their target ground-truths</li><li>Keep storing the K-th largest value from I for C iterations in set S_k</li><li>Take the mean value of S_k as current threshold T_now</li></ul><figure id="e92e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Jkftw7zfHXJpW9R2csLZBQ.png"><figcaption>Dynamic Label Assignment (Source: <a href="https://arxiv.org/abs/2004.06002">Dynamic RCNN</a>)</figcaption></figure><p id="0709">We can find that there are more high-quality proposals as the training goes. With the improved quality of proposals, DLA will automatically raise the IoU threshold based on the proposal distribution. Then positive (green) and negative (red) labels are assigned for the proposals by DLA which are shown in the right part of the figure.</p><h2 id="a5fd">Bounding box regression</h2><p id="2733">In regression, the task is to regress the positive proposals to their corresponding target ground-truth boxes (offset ∆ = (δx, δy, δw, δh)). These offsets are learned through regression loss function. The distribution of ∆ changes as training goes. Check the figure below.</p><figure id="5001"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*yF5mykg-4x6EYA37e_HvHA.png"><figcaption>regression deltas distribution as training goes (Source: <a href="https://arxiv.org/abs/2004.06002">Dynamic RCNN</a>)</figcaption></figure><p id="1b3e">The smooth L1 loss used for regression is as below:</p><figure id="aff0"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*4r7hFaXIAVzmFh9ZQmB1pA.png"><figcaption>Smooth l1 loss (Source: <a href="https://arxiv.org/abs/2004.06002">Dynamic RCNN</a>)</figcaption></figure><p id="3e0a">Here the x stands for the regression label. β is a hyper-parameter controlling in which range we should use a softer loss function like l1 loss instead of the original l2 loss. Considering the robustness of training, β is set to default as 1.0 to prevent the exploding loss due to the poorly trained network in the early stages. As you can see in the below figure, smaller β values accelerate the training.</p><figure id="3d44"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*VaaMJ2RJ2WddBVW0V4cAiw.png"><figcaption>Regression loss with different β values (Source: <a href="https://arxiv.org/abs/2004.06002">Dynamic RCNN</a>)</figcaption></figure><p id="5c78">We need to fit the distribution change and adjust the regression loss function to compensate for the high-quality samples.</p><h2 id="f873">Dynamic SmoothL1 Loss</h2><p id="9891">Idea: Dynamically change the β values as training goes to improve the quality of proposals.</p><figure id="6da3"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*UwVvCyDiJxife-lp6H_ruA.png"><figcaption>Dynamic Smooth L1 loss (Source: <a href="https://arxiv.org/abs/2004.06002">Dynamic RCNN</a>)</figcaption></figure><p id="07c2">βnow will be calculated as follows:</p><ul><li>Calculate the regression labels E between proposals and their target ground-truths</li><li>Keep storing the K-th smallest value from E for C iterations in set S_k</li><li>Take the median value of S_k as current βnow</li></ul><p id="c824">Median is chosen instead of mean to deal with outliers.</p><p id="89e2">The whole Dynamic RCNN algorithm can be summarized as below</p><figure id="3acb"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*6tgCD-hJL9D2O6QpBthQXg.png"><figcaption>Dynamic RCNN Algorithm (Source: <a href="https://arxiv.org/abs/2004.06002">Dynamic RCNN</a>)</figcaption></figure><p id="0d91">By dynamically choosing the values of T and β, the quality of learning can be improved.</p><h2 id="ff39">Results</h2><p id="26d4">Check out the improvements in Average Precision valu

Options

es below.</p><figure id="096e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Oe2_3cn8XW09saLd4Fmr9w.png"><figcaption>Experimental Results (Source: <a href="https://arxiv.org/abs/2004.06002">Dynamic RCNN</a>)</figcaption></figure><p id="95fe">By doing several ablation studies, it is found that values of K, C doesn’t matter. Also, there is no increase in training time as there is no additional computation required except the calculation of mean and median. Check the details below.</p><figure id="4e4a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*OfwY4l6ro538j8mewTCV_Q.png"><figcaption>Tables showing ablation studies (Source: <a href="https://arxiv.org/abs/2004.06002">Dynamic RCNN</a>)</figcaption></figure><h2 id="7532">References</h2><ul><li><a href="https://arxiv.org/abs/2004.06002">https://arxiv.org/abs/2004.06002</a> — paper</li><li><a href="https://github.com/hkzhang95/DynamicRCNN">https://github.com/hkzhang95/DynamicRCNN</a> — Pytorch implementation</li><li><a href="https://arxiv.org/abs/1712.00726">https://arxiv.org/abs/1712.00726</a> — Cascade RCNN</li></ul><p id="f813">Check out my previous articles</p><div id="705d" class="link-block"> <a href="https://readmedium.com/context-rcnn-long-term-temporal-context-for-per-camera-object-detection-1cc493176400"> <div> <div> <h2>Context RCNN — Long Term Temporal Context for Per-Camera Object Detection</h2> <div><h3>Dynamically incorporate other frames taken by the same camera into the object detection pipeline.</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*Cmkb3bk35CB1ZIdonApzcQ.jpeg)"></div> </div> </div> </a> </div><div id="a44c" class="link-block"> <a href="https://towardsdatascience.com/region-proposal-network-a-detailed-view-1305c7875853"> <div> <div> <h2>Region Proposal Network — A detailed view</h2> <div><h3>What are anchors? How can RPN learn from feature maps to generate boxes? How does it cover boxes of all shapes?</h3></div> <div><p>towardsdatascience.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*FifNx4NCyynAZqLjVtB5Ow.png)"></div> </div> </div> </a> </div><div id="d7d6" class="link-block"> <a href="https://readmedium.com/numpy-on-gpu-tpu-efb8d367020a"> <div> <div> <h2>Numpy on GPU/TPU</h2> <div><h3>Make your Numpy code to run 50x faster.</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*ccZoyf2TfAonIFE-knrlZQ.png)"></div> </div> </div> </a> </div><div id="b6ec" class="link-block"> <a href="https://towardsdatascience.com/non-maximum-suppression-nms-93ce178e177c"> <div> <div> <h2>Non-maximum Suppression (NMS)</h2> <div><h3>A Technique to remove duplicates and false positives in object detection</h3></div> <div><p>towardsdatascience.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*6d_D0ySg-kOvfrzIRwHIiA.png)"></div> </div> </div> </a> </div></article></body>

Deep Learning

Dynamic RCNN — Towards High-Quality Object Detection via Dynamic Training

Fixing the inconsistency problems during training

Hi there! Today we will have a look at Dynamic RCNN, a paper by the researchers of the Chinese Academy Of Sciences.

Overview

Goal: Fixing the inconsistency problems in the training process

Inconsistency? — Network parameters are fixed, but the learning is dynamic. Ex: Quality of the proposals increase as training goes regardless of a certain IOU, but we keep the same IOU throughout the training.

In this paper, they have dealt with 2 inconsistency problems in Network training: Proposal Classification and Bounding box regression.

Let’s see the inconsistency problems in both these cases.

Proposal Classification

The annotations for the object detection task are the bounding boxes of objects. RPN stage generates proposals at every location on the image and we have to assign these proposals to ground truth. It is not very clear how to say that a proposal is positive or negative. The most widely used technique here is to calculate the IOU of the proposal with the GT and find out whether it falls within some threshold.

Let us consider that the IOU threshold is 0.5, now proposals with values more than 0.5 are considered as positive and others are considered as negatives. However, it is observed that having a single IOU threshold throughout the training is degrading the performance and different IOU thresholds result in different performances. Check the graphs below.

Model performance when trained at different IOU thresholds (Source: Cascade R-CNN)

Positive proposal percentage (Source: Cascade RCNN)

High IOU thresholds give high-quality proposals, but we can’t keep high thresholds at the beginning of training as it won’t give enough positive proposals for the model to learn.

Positive proposal count vs training iterations(Source: Dynamic RCNN)

The number of positive proposals increases over time during training. As the number of iterations increases, positive proposals increases.

To solve this issue, Dynamic Label Assigmement is proposed.

Dynamic Label Assignment

Idea: Dynamically change the IOU threshold as learning improves during training

Instead of using a fixed IOU threshold throughout training, threshold T keeps changing.

T_now stands for current IOU threshold.

Calculation of T_now:

Calculate the IoUs I between proposals and their target ground-truths
Keep storing the K-th largest value from I for C iterations in set S_k
Take the mean value of S_k as current threshold T_now

Dynamic Label Assignment (Source: Dynamic RCNN)

We can find that there are more high-quality proposals as the training goes. With the improved quality of proposals, DLA will automatically raise the IoU threshold based on the proposal distribution. Then positive (green) and negative (red) labels are assigned for the proposals by DLA which are shown in the right part of the figure.

Bounding box regression

In regression, the task is to regress the positive proposals to their corresponding target ground-truth boxes (offset ∆ = (δx, δy, δw, δh)). These offsets are learned through regression loss function. The distribution of ∆ changes as training goes. Check the figure below.

regression deltas distribution as training goes (Source: Dynamic RCNN)

The smooth L1 loss used for regression is as below:

Here the x stands for the regression label. β is a hyper-parameter controlling in which range we should use a softer loss function like l1 loss instead of the original l2 loss. Considering the robustness of training, β is set to default as 1.0 to prevent the exploding loss due to the poorly trained network in the early stages. As you can see in the below figure, smaller β values accelerate the training.

Regression loss with different β values (Source: Dynamic RCNN)

We need to fit the distribution change and adjust the regression loss function to compensate for the high-quality samples.

Dynamic SmoothL1 Loss

Idea: Dynamically change the β values as training goes to improve the quality of proposals.

βnow will be calculated as follows:

Calculate the regression labels E between proposals and their target ground-truths
Keep storing the K-th smallest value from E for C iterations in set S_k
Take the median value of S_k as current βnow

Median is chosen instead of mean to deal with outliers.

The whole Dynamic RCNN algorithm can be summarized as below

Dynamic RCNN Algorithm (Source: Dynamic RCNN)

By dynamically choosing the values of T and β, the quality of learning can be improved.

Results

Check out the improvements in Average Precision values below.

By doing several ablation studies, it is found that values of K, C doesn’t matter. Also, there is no increase in training time as there is no additional computation required except the calculation of mean and median. Check the details below.

Tables showing ablation studies (Source: Dynamic RCNN)

References

https://arxiv.org/abs/2004.06002 — paper
https://github.com/hkzhang95/DynamicRCNN — Pytorch implementation
https://arxiv.org/abs/1712.00726 — Cascade RCNN

Check out my previous articles

Context RCNN — Long Term Temporal Context for Per-Camera Object Detection

Dynamically incorporate other frames taken by the same camera into the object detection pipeline.

medium.com

Region Proposal Network — A detailed view

What are anchors? How can RPN learn from feature maps to generate boxes? How does it cover boxes of all shapes?

towardsdatascience.com

Numpy on GPU/TPU

Make your Numpy code to run 50x faster.

medium.com

Non-maximum Suppression (NMS)

A Technique to remove duplicates and false positives in object detection

towardsdatascience.com