avatarRenu Khandelwal

Summary

The different IoU losses, including Generalized IoU (GIoU), Distance-IoU (DIoU), and Complete IoU (CIoU), are used in state-of-the-art object detection algorithms, each with its own advantages and trade-offs in terms of convergence speed and localization accuracy.

Abstract

Object detection relies on bounding box regression (BBR) to localize objects, and IoU loss is a popular technique for this purpose. However, IoU loss has limitations, such as slow convergence and ineffectiveness in non-overlapping cases. To address these issues, Generalized IoU (GIoU) loss was introduced to maximize the overlap area between the predicted and ground truth bounding boxes. However, GIoU loss still suffers from slow convergence and inaccurate regression in extreme aspect ratios. To improve upon this, Distance-IoU (DIoU) loss was introduced, which uses the normalized distance between the predicted box and ground truth, resulting in faster convergence. Finally, Complete IoU (CIoU) loss was introduced as an aggregation of overlap area, distance, and aspect ratio, providing faster convergence and better performance than IoU and GIoU losses.

Bullet points

  • Object detection includes object classification and localization.
  • Bounding box regression (BBR) is used for object localization in object detection algorithms.
  • IoU loss is a popular technique for bounding box regression.
  • IoU loss only works when predicted and ground truth bounding boxes overlap.
  • Generalized IoU (GIoU) loss increases the size of the predicted box to overlap with the target box by moving slowly towards the target box, but suffers from problems of slow convergence and inaccurate regression in extreme aspect ratios.
  • Distance-IoU (DIoU) loss uses the normalized distance between the predicted box and ground truth and converges much faster than IoU and GIoU losses.
  • Complete IoU (CIoU) loss is an aggregation of the overlap area, distance, and aspect ratio, converging faster with fewer iterations compared to IoU and GIoU losses.

Different IoU Losses for Faster and Accurate Object Detection

Learn Generalized IoU, Distance IoU, and Complete IoU Loss used in State of the art object detection algorithms

Object detection, which includes two sub-tasks: object classification and object localization.

Object Localization relies on a bounding box regression (BBR) module to localize objects.

Bounding Box Regression

Bounding-box regression is a popular technique in object detection algorithm used to predict target objects' location using rectangular bounding boxes. It aims to refine the location of a predicted bounding box.

Bounding box regression uses overlap area between the predicted bounding box and the ground truth bounding box referred to as Intersection over Union (IOU) based losses.

Intersection over Union

IoU loss only works when the predicted bounding boxes overlap with the ground truth box. IOU loss would not provide any moving gradient for non-overlapping cases.

The convergence speed of the IOU loss is slow.

Red is the predicted bounding box, and green is the ground truth bounding box.

The equation for IoU and IoU loss is shown below.

IoU loss fails when predicted, and ground truth boxes do not overlap.

Generalized IoU(GIoU) Loss

GIoU loss maximizes the overlap area of the ground truth and predicted bounding box. It increases the predicted box's size to overlap with the target box by moving slowly towards the target box for non-overlapping cases.

Source: Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression

In the above formula for GIoU loss, C is the smallest box covering the predicted and ground truth-bound boxes, which act like a penalty term moving the predicted box closer to the target ground truth box.

GIoU loss: Blue is the predicted bounding box using GIoU loss Source: Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression

As shown in the figure above, GIoU loss initially increases the predicted bounding box's size and slowly moves towards the ground truth. To overlap the predicted box to the ground truth box takes several iterations, especially when the bounding boxes have a horizontal and vertical orientation.

GIoU loss achieves better precision than MSE loss and IoU loss.

GIoU loss solves vanishing gradients for non-overlapping cases but has slow convergence and inaccurate regression, especially for the boxes with extreme aspect ratios.

Distance IoU Loss

The Distance IoU is the normalized distance between the center point of the predicted and ground truth boxes. Distance loss helps with faster convergence and accurate regression.

d represents the euclidian distance between the center point of the predicted and ground truth boxes, and C is the diagonal length of the smallest enclosing box covering two boxes

DIoU loss is invariant to the scale of regression problem, and like GIoU loss, DIoU loss also provides the moving directions for predicted bounding boxes for non-overlapping cases.

Unlike GIoU loss, DIoU loss directly minimizes the distance between predicted and ground truth-bound boxes and converges much faster than GIoU even when the ground truth boxes have horizontal and vertical orientations.

DIoU Loss. Source:Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression

DIoU when employed as a criterion for non-maximum suppression (NMS), gives robust results with occlusions.

Complete IoU Loss

CIoU loss bounding box regression uses three geometric factors.

  • Overlap area between the predicted box and the ground truth bounding box-IOU loss
  • The central point between the predicted box and the ground truth bounding box-DIoU loss
  • An aspect ratio of the predicted box and the ground truth box

As CIoU loss uses complete geometric factors, it converges faster than GIoU loss. It improves average precision (AP) and average recall (AR) for object detection and segmentation.

CIoU loss is an aggregation of the overlap area, distance, and aspect ratio, respectively, referred to as Complete IOU loss.

S is the overlap area denoted by S=1-IoU

D is the normalized distance Iou loss between the center point of the predicted and ground truth boxes.

V is the consistency of the aspect ratio.

All S, V, and D are invariant to the regression scale and are normalized to values between 0 and 1.

CIoU loss, like GIoU loss and DIoU loss, moves the predicted bounding box towards the ground truth bounding box for non-overlapping cases.

CIoU loss needs fewer iterations to converges than GIoU loss. CIoU loss makes regression very fast with extreme aspect ratios.

CIoU loss. Source: Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation

CIoU loss is applied in YOLO v3, Yolo v4, SSD, and Faster RCNN.

The below figure shows the regression error sum curves of different loss functions for different iterations.

IoU loss only works for the cases when the predicted bounding box overlaps with target boxes.

GIoU loss helps with non-overlapping cases by increasing the predicted box's size to overlap with the ground truth by slowly moving towards the ground truth. GIoU loss converges very slowly with a large number of iterations and proper learning rates.

GIoU still has large errors for cases with extreme aspect ratios.

CioU loss uses geometric measures for bounding box regression which helps with faster convergence and better performance than IoU and GIoU losses.

DIoU loss also converges faster with better performance than IoU and GIoU losses with lesser iterations.

Summary:

The different IoU losses are about the convergence speed and localization accuracy.

  • Generalized IoU (GIoU) increases the size of the predicted box to overlap with the target box by moving slowly towards the target box, suffers from the problems of slow convergence. GIoU gives inaccurate regression in case of extreme aspect ratios.
  • Distance-IoU (DIoU) loss uses the normalized distance between the predicted box and ground truth and converges much faster in training than IoU and GIoU losses.
  • CIoU loss is an aggregation of the overlap area(IoU), distance(DIoU), and aspect ratio. It converges faster with fewer iterations compared to IoU loss and GIoU loss.

References:

YOLOv4: Optimal Speed and Accuracy of Object Detection

Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression

https://github.com/Zzh-tju/CIoU

Focal and Efficient IOU Loss for Accurate Bounding Box Regression

Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation

Deep Learning
Object Detection
Intersection Over Union
Yolov4
Bounding Box
Recommended from ReadMedium