avatarRenu Khandelwal

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

5289

Abstract

unfair to the re-ID or object tracking task.</p><p id="4127"><b><i>The object detection and re-ID tasks are treated equally in FairMOT</i></b><i>.</i></p><figure id="1ecd"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*Hq6YRB6T4Iggx-r8.png"><figcaption>FairMOT(Source: <a href="https://arxiv.org/pdf/2004.01888.pdf">FairMOT</a>)</figcaption></figure><p id="9c64"><b>The input image is fed to an encoder-decoder network to extract high-resolution feature maps</b>.</p><p id="affa">FairMOT then adds <b>two homogeneous branches for detecting objects and extracting re-ID features</b> <b>to obtain a good trade-off between detection and re-ID.</b></p><p id="d531">Read this <a href="https://readmedium.com/9fd6249a76b6">article</a> for a detailed understanding on different MOT algorithm</p><h1 id="81e1">BytrTrack Algorithm</h1><blockquote id="ce51"><p><a href="https://readmedium.com/86f1f3632a85"><b><i>ByteTrack</i></b></a><b><i> performs MOT on a video using the high-performance detector <a href="https://arshren.medium.com/yolox-new-improved-yolo-d430c0e4cf20">YOLOX</a> and performs association between the detection boxes and the tracks using BYTE.</i></b></p></blockquote><p id="a716">BYTE keeps all detection boxes and separates them into <b>high score ones (Dʰᶦᵍʰ) and low score(Dˡᵒʷ)</b> ones. <b>BYTE uses a Kalman filter to predict the new locations in the current frame of each track in T</b>.</p><p id="3bbc"><b>The first association in BYTE is performed between the high score detection boxes Dʰᶦᵍʰ to all the tracklets</b>. <b>Similarity for the first association is computed using IoU or the Re-ID feature distances </b>between the detection boxes Dʰᶦᵍʰ and the predicted box of tracks T.</p><p id="6f9d"><b>Some tracklets get unmatched because they do not match an appropriate high score detection box Dʰᶦᵍʰ</b>, which occurs when occlusion, motion blur, or size change occurs.</p><figure id="0c15"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*B95LMEDCIxNkFS_M.png"><figcaption>Inspired by <a href="https://arshren.medium.com/bytetrack-a-simple-yet-effective-multi-object-tracking-technique-86f1f3632a85">ByteTrack: A Simple Yet Effective Multi-Object Tracking Technique</a></figcaption></figure><p id="5a47"><b>The second association is performed after the first association between the low score detection boxes Dˡᵒʷ and the remaining unmatched tracklets(Tʳᵉᵐᵃᶤⁿ)</b> to recover the objects in low score detection boxes and filter out the background.</p><p id="cc5a"><b>Keep the unmatched tracks in Tʳᵉ-ʳᵉᵐᵃᶤⁿ and delete all the unmatched low score detection boxes as those are considered background.</b></p><h2 id="d239">Characteristics of MOT Evaluation Metrics</h2><p id="9031">MOT evaluation metrics need to exhibit two significant properties</p><ol><li><b>MOT evaluation metrics need to address five error types in MOT</b>. These five error types are False negatives(FN), False positives(FP), Fragmentation, Mergers(ID Switch), and Deviation.</li></ol><figure id="02c8"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*rsXOUXkenRpa1PrhaXUuHQ.png"><figcaption>Source:MOT16: <a href="https://arxiv.org/pdf/1603.00831.pdf">A Benchmark for Multi-Object Tracking</a></figcaption></figure><p id="e435"><b>2. MOT evaluation metrics should have monotonicity, and error types should be differentiable</b> so that the metrics have the tracker’s performance concerning each of the five basic error types.</p><h2 id="44d8">Commonly used MOT evaluation metrics.</h2><h2 id="ed70">Track-mAP</h2><p id="1cfc"><b>Track mAP performs both matching and association at a trajectory level and is biased toward measuring association. It operates based on the confidence-ranked potential tracking results. Track-mAP is non-monotonic in detection.</b></p><h2 id="63e2">Multi-Object Tracking Accuracy- MOTA</h2><p id="6b1e">MOTA is the most widely used metric that closely represents human visual assessment. <b>In MOTA, matching is done at a detection level.</b> <b>Association is measured in MOTA using Identity Switch (IDSW), which occurs when a tracker wrongfully swaps object identities or when a track is lost and is reinitialized with a different identity.</b> <b>MOTA measures three types of tracking errors: False Positive, False Negative, and ID Switch</b></p><figure id="5e75"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*dsPQ4VJ9BxLuvvrf9gakAw.png"><figcaption></figcaption></figure><h2 id="f788">The Identification Metrics: IDF1</h2><p id="5dbc"><b>IDF1 emphasizes Association accuracy rather than detection</b>. IDF1 uses IDTP(Identity True Positives), where prID is matched with grID when S ≥ α of trajectories. IDF1 is the ratio of correctly identified detections over the average number of ground-truth and computed detections. The Hungarian algorithm selects trajectories to match for minimizing the sum of IDFP and IDFN.</p><figure id="83f8"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Gg9-Y7U4kB0NP--RF3388A.png"><figcaption>A tracking example displaying the single best trajectory matching performed by IDF1(Source: <a href="https://arxiv.org/pdf/2009.07736v2.pdf">HOTA: A Higher-Order Metric for Evaluating Multi-Object Tracking</a>)</figcaption></figure><p id="6b07"

Options

<b>IDF1 combines IDP(ID Precision) and IDR(ID Recall).</b></p><figure id="c47b"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*VHrLjbva-VdBFjc_8NkFmQ.png"><figcaption></figcaption></figure><h2 id="dd23">Higher-Order Tracking Accuracy-HOTA</h2><p id="02e7">HOTA is a single unified metric for ranking trackers. HOTA can be decomposed into components that correspond to these five error types: Detection<b> Recall, Detection Precision, Association Recall, Association Precision, and Localisation Accuracy. A</b>s a result, HOTA has its error type differentiable and is strictly monotonic, providing information about the tracker’s performance concerning each of the different basic error types</p><p id="73d9">HOTA tracking errors are categorized into Detection errors, Association errors, and Localization errors.</p><ol><li><b>Detection error occurs when a tracker predicts detections that don’t exist in the ground truth or fails to predict detections in the ground truth</b>. Detection errors can be further categorized as detection recall (measured by FNs) and detection precision (measured by FPs)</li><li><b>Association error occurs when trackers assign the same prID to two detections with different gtIDs or assign different prIDs to two detections that should have the same gtID</b>. Association errors are further categorized into errors of association recall (measured by FNAs) and association precision (measured by FPAs)</li><li><b>Localization errors occur when prDets are not perfectly spatially aligned with gtDets.</b></li></ol><figure id="13a8"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*EGmDDLNOUjl7jybIj08zQQ.png"><figcaption>Source: <a href="https://arxiv.org/pdf/2009.07736v2.pdf">HOTA: A Higher-Order Metric for Evaluating Multi-Object Tracking</a>)</figcaption></figure><p id="adbc">MOTA performs both matching and association scoring at a local detection level but accentuates detection accuracy, whereas IDF1 performs at a trajectory level by emphasizing the effect of association.</p><p id="bb88">Track-mAP is similar to IDF1 as it performs both matching and association at a trajectory level and is biased toward measuring association.</p><p id="e7e8">HOTA balances both by being an explicit combination of a detection score and an association score by performing matches at the detection level while scoring association globally over trajectories.</p><figure id="1bcf"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*EtmSc3kmWG0KiWitKakWKQ.png"><figcaption>An overview of different evaluation metrics for MOT(Source: <a href="https://arxiv.org/pdf/2009.07736v2.pdf">HOTA: A Higher-Order Metric for Evaluating Multi-Object Tracking</a>)</figcaption></figure><p id="853f">Read <a href="https://arshren.medium.com/evaluation-metrics-for-multiple-object-tracking-7b26ef23ef5f">this</a> article for a detailed understanding of different MOT evaluation metrics</p><h2 id="2eb3">References:</h2><p id="96c5"><a href="https://arxiv.org/pdf/2110.06864.pdf">Multi-Object Tracking by Associating Every Detection Box by Yifu Zhang, Peize Sun, Yi Jiang, Dongdong Yu, Fucheng Weng, Zehuan Yuan, Ping Luo2, Wenyu Liu, Xinggang Wang</a></p><p id="45d3"><a href="https://www.kalmanfilter.net/modeling.html">Online Kalman Filter Tutoria</a>l</p><p id="529c"><a href="https://www.kalmanfilter.net/modeling.html">www.kalmanfilter.net</a></p><p id="85b7"><a href="https://arxiv.org/pdf/1602.00763.pdf">SIMPLE REAL-TIMEND REALTIME TRACKING Alex Bewley</a></p><p id="fdf9"><a href="https://arxiv.org/pdf/1703.07402.pdf">SIMPLE ONLINE AND REAL-TIME TRACKING WITH A DEEP ASSOCIATION METRIC</a></p><p id="f643"><a href="http://mdpi.com/2076-3417/12/3/1319">Sort and Deep-SORT Based Multi-Object Tracking for Mobile Robotics: Evaluation with New Data Association Metrics</a></p><p id="6433"><a href="https://arxiv.org/pdf/2004.01888.pdf">FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking Yifu Zhang, Chunyu Wang, Xinggang Wang, Wenjun Zeng, Wenyu Liu</a></p><p id="ff95"><a href="https://arxiv.org/pdf/2009.07736v2.pdf">HOTA: A Higher-Order Metric for Evaluating Multi-Object Tracking</a></p><p id="8326"><a href="https://autonomousvision.github.io/hota-metrics/">How to evaluate tracking with the HOTA metrics</a></p><p id="e0cf"><a href="https://arxiv.org/pdf/1603.00831.pdf">MOT16: A Benchmark for Multi-Object Tracking</a></p><p id="b949"><a href="https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.367.6279&amp;rep=rep1&amp;type=pdf">Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics</a></p><p id="9711"><a href="https://www.idiap.ch/~odobez/publications/SmithGaticaOdobezBa-cvpr-eemcv05.pdf">Evaluating Multi-Object Tracking</a></p><p id="2352"><a href="https://arshren.medium.com/an-introduction-to-object-tracking-9fd6249a76b6">An Introduction to Object Tracking</a></p><p id="5e58"><a href="https://arshren.medium.com/bytetrack-a-simple-yet-effective-multi-object-tracking-technique-86f1f3632a85">ByteTrack: A Simple Yet Effective Multi-Object Tracking Technique</a></p><p id="be34"><a href="https://arshren.medium.com/evaluation-metrics-for-multiple-object-tracking-7b26ef23ef5f">Evaluation Metrics for Multiple Object Tracking</a></p></article></body>

A Cheat Sheet For Multi-Object Tracking

Everything about MOT in a nut-shell

Multiple Object Tracking(MOT)

MOT takes a single continuous video and splits it into discrete frames at a specific frame rate(fps) to output

  • Detection: what objects are present in each frame
  • Localization: where objects are in each frame
  • Association: whether objects in different frames belong to the same or different objects

Typical Applications of MOT

Multi-object tracking(MOT) has its application in

  • Video surveillance for traffic control, digital forensics
  • Gesture recognition
  • Robotics
  • Augmented Reality
  • Self-driving vehicles

Challenges with MOT

  • Accurately detect the objects of interest in the frame with high confidence. Issues with accurate object detection are failing to detect an object of interest, assigning a wrong class label to a detected object, or incorrectly localizing an identified object.
  • ID Switching occurs when two similar objects overlap or blend, causing the identity switching; hence, keeping track of the object id is difficult.
  • Background distortion: Busy background makes it difficult to detect small objects during object detection
  • Occlusion: occurs when something you want to see is hidden or occluded by another object.
  • Multiple Spatial Spaces, Deformation, or Object rotation
  • Image illumination
  • Visual streaking or smearing captured on camera due to motion blur

Characteristics of a Multi-object tracker(MOT)

A good multi-object tracker(MOT)

  1. Tracks object by identifying the correct number of trackers at the precise locations in each frame.
  2. Identify objects by tracking individual objects consistently over a long period,
  3. Track objects despite occlusion, illumination changes, background, motion blur, etc.
  4. Detect and Track objects fast

Popular MOT Algorithms

Centroid based Object Tracking

Centroid-based object tracking utilizes the Euclidean distance between the centroids of the objects detected between two consecutive frames in a video.

image by author

IOU Object Tracker

Intersection-over-Union is another technique for object tracking that associates detections of subsequent frames solely by their spatial overlap to tracks.

Visual IOU Object Tracker

Visual IOU Object Tracker works in two directions; visual forward and backward tracking of the object help merge discontinued tracks.

Simple Online Realtime Tracking (SORT)

The SORT method assumes tracking quality depends on object detection performance.

SORT starts by first detecting objects using Faster Region-CNN(FrRCNN).

The object detection is associated with the detected bounding box by predicting its new location in the current frame to update the target state solved optimally using a Kalman filter framework.

The assignment cost matrix is computed as the intersection-over-union (IOU) distance between each detection and all predicted bounding boxes from the existing targets. The assignment is solved using the Hungarian algorithm.

SORT algorithm helps reduce occluder target, and Id switches to work well when object motion is small. SORT may fail in challenging cases of crowded scenes and fast motion

Deep SORT

Deep SORT is an extension of SORT incorporating appearance information through a pre-trained association metric.

Deep SORT allows for tracking through more extended periods of occlusion, is simple to implement, and runs in real-time.

Deep SORT adopts a single conventional hypothesis tracking methodology with recursive Kalman filtering and frame-by-frame data association using the Hungarian algorithm.

The appearance feature describes all the features of a given image. Deep SORT also utilizes a matching cascade similar to SORT to prioritize more frequently seen objects.

Deep SORT reduces ID switches and occlusions, leading to lower False Positives.

FairMOT(Multiple Object Tracking)

The FairMOT does not use the multi-task approach of first detecting objects and their bounding boxes, followed by Object trackings like SORT and Deep SORT. FairMOT considers that the network is biased to the primary detection task, which is unfair to the re-ID or object tracking task.

The object detection and re-ID tasks are treated equally in FairMOT.

FairMOT(Source: FairMOT)

The input image is fed to an encoder-decoder network to extract high-resolution feature maps.

FairMOT then adds two homogeneous branches for detecting objects and extracting re-ID features to obtain a good trade-off between detection and re-ID.

Read this article for a detailed understanding on different MOT algorithm

BytrTrack Algorithm

ByteTrack performs MOT on a video using the high-performance detector YOLOX and performs association between the detection boxes and the tracks using BYTE.

BYTE keeps all detection boxes and separates them into high score ones (Dʰᶦᵍʰ) and low score(Dˡᵒʷ) ones. BYTE uses a Kalman filter to predict the new locations in the current frame of each track in T.

The first association in BYTE is performed between the high score detection boxes Dʰᶦᵍʰ to all the tracklets. Similarity for the first association is computed using IoU or the Re-ID feature distances between the detection boxes Dʰᶦᵍʰ and the predicted box of tracks T.

Some tracklets get unmatched because they do not match an appropriate high score detection box Dʰᶦᵍʰ, which occurs when occlusion, motion blur, or size change occurs.

Inspired by ByteTrack: A Simple Yet Effective Multi-Object Tracking Technique

The second association is performed after the first association between the low score detection boxes Dˡᵒʷ and the remaining unmatched tracklets(Tʳᵉᵐᵃᶤⁿ) to recover the objects in low score detection boxes and filter out the background.

Keep the unmatched tracks in Tʳᵉ-ʳᵉᵐᵃᶤⁿ and delete all the unmatched low score detection boxes as those are considered background.

Characteristics of MOT Evaluation Metrics

MOT evaluation metrics need to exhibit two significant properties

  1. MOT evaluation metrics need to address five error types in MOT. These five error types are False negatives(FN), False positives(FP), Fragmentation, Mergers(ID Switch), and Deviation.
Source:MOT16: A Benchmark for Multi-Object Tracking

2. MOT evaluation metrics should have monotonicity, and error types should be differentiable so that the metrics have the tracker’s performance concerning each of the five basic error types.

Commonly used MOT evaluation metrics.

Track-mAP

Track mAP performs both matching and association at a trajectory level and is biased toward measuring association. It operates based on the confidence-ranked potential tracking results. Track-mAP is non-monotonic in detection.

Multi-Object Tracking Accuracy- MOTA

MOTA is the most widely used metric that closely represents human visual assessment. In MOTA, matching is done at a detection level. Association is measured in MOTA using Identity Switch (IDSW), which occurs when a tracker wrongfully swaps object identities or when a track is lost and is reinitialized with a different identity. MOTA measures three types of tracking errors: False Positive, False Negative, and ID Switch

The Identification Metrics: IDF1

IDF1 emphasizes Association accuracy rather than detection. IDF1 uses IDTP(Identity True Positives), where prID is matched with grID when S ≥ α of trajectories. IDF1 is the ratio of correctly identified detections over the average number of ground-truth and computed detections. The Hungarian algorithm selects trajectories to match for minimizing the sum of IDFP and IDFN.

A tracking example displaying the single best trajectory matching performed by IDF1(Source: HOTA: A Higher-Order Metric for Evaluating Multi-Object Tracking)

IDF1 combines IDP(ID Precision) and IDR(ID Recall).

Higher-Order Tracking Accuracy-HOTA

HOTA is a single unified metric for ranking trackers. HOTA can be decomposed into components that correspond to these five error types: Detection Recall, Detection Precision, Association Recall, Association Precision, and Localisation Accuracy. As a result, HOTA has its error type differentiable and is strictly monotonic, providing information about the tracker’s performance concerning each of the different basic error types

HOTA tracking errors are categorized into Detection errors, Association errors, and Localization errors.

  1. Detection error occurs when a tracker predicts detections that don’t exist in the ground truth or fails to predict detections in the ground truth. Detection errors can be further categorized as detection recall (measured by FNs) and detection precision (measured by FPs)
  2. Association error occurs when trackers assign the same prID to two detections with different gtIDs or assign different prIDs to two detections that should have the same gtID. Association errors are further categorized into errors of association recall (measured by FNAs) and association precision (measured by FPAs)
  3. Localization errors occur when prDets are not perfectly spatially aligned with gtDets.
Source: HOTA: A Higher-Order Metric for Evaluating Multi-Object Tracking)

MOTA performs both matching and association scoring at a local detection level but accentuates detection accuracy, whereas IDF1 performs at a trajectory level by emphasizing the effect of association.

Track-mAP is similar to IDF1 as it performs both matching and association at a trajectory level and is biased toward measuring association.

HOTA balances both by being an explicit combination of a detection score and an association score by performing matches at the detection level while scoring association globally over trajectories.

An overview of different evaluation metrics for MOT(Source: HOTA: A Higher-Order Metric for Evaluating Multi-Object Tracking)

Read this article for a detailed understanding of different MOT evaluation metrics

References:

Multi-Object Tracking by Associating Every Detection Box by Yifu Zhang, Peize Sun, Yi Jiang, Dongdong Yu, Fucheng Weng, Zehuan Yuan, Ping Luo2, Wenyu Liu, Xinggang Wang

Online Kalman Filter Tutorial

www.kalmanfilter.net

SIMPLE REAL-TIMEND REALTIME TRACKING Alex Bewley

SIMPLE ONLINE AND REAL-TIME TRACKING WITH A DEEP ASSOCIATION METRIC

Sort and Deep-SORT Based Multi-Object Tracking for Mobile Robotics: Evaluation with New Data Association Metrics

FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking Yifu Zhang, Chunyu Wang, Xinggang Wang, Wenjun Zeng, Wenyu Liu

HOTA: A Higher-Order Metric for Evaluating Multi-Object Tracking

How to evaluate tracking with the HOTA metrics

MOT16: A Benchmark for Multi-Object Tracking

Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics

Evaluating Multi-Object Tracking

An Introduction to Object Tracking

ByteTrack: A Simple Yet Effective Multi-Object Tracking Technique

Evaluation Metrics for Multiple Object Tracking

Robotics
Multi Object Tracking
Artificial Intelligence
Computer Vision
Technology
Recommended from ReadMedium
avatarAbhishek Kumar Pandey
Optical Flow Estimation

Easy:

4 min read