MOTSynth-MOTS-CVPR22 Results

Click on a measure to sort the table accordingly. See below for a more detailed description.



Benchmark Statistics

TrackerHOTAsMOTSAIDF1AssADetAAssReAssPrDetReDetPrLocAMOTSAMOTSPMODSAMTMLTPFPFNRcllPrcnID Sw.FragHz
Trcktr_bline
1. online method
48.8
50.0
±3.2
61.2 44.6 54.9 50.1 73.2 58.9 73.3 78.5 69.0 74.7 69.8 117 (35.7)58 (17.7)24,222 1,714 8,047 75.1 93.4 257 (0.0)642 (0.0)1.7
P. Bergmann, T. Meinhardt, L. Leal-Taix'e. Tracking Without Bells and Whistles. In The IEEE International Conference on Computer Vision (ICCV), 2019.
SequencesFramesTrajectoriesBoxes
4304432832269

Difficulty Analysis

Sequence difficulty (from easiest to hardest, measured by average HOTA)

MOTS20-06

MOTS20-06

(53.4 HOTA)

MOTS20-12

MOTS20-12

(51.9 HOTA)

MOTS20-07

MOTS20-07

(44.8 HOTA)

MOTS20-01

MOTS20-01

(41.8 HOTA)


Evaluation Measures

Lower is better. Higher is better.
Measure Better Perfect Description
HOTA higher 100%Higher Order Tracking Accuracy [1]. Geometric mean of detection accuracy and association accuracy. Averaged across localization thresholds.
sMOTSA higher 100%Mask-based Soft Multi-Object Tracking Accuracy (+/- denotes standard deviation across all sequences) [2]. Soft version of MOTSA, accumulates the mask overlaps of true positives instead of only counting how many masks reach an IoU of more than 0.5.
IDF1 higher 100%ID F1 Score [3]. The ratio of correctly identified detections over the average number of ground-truth and computed detections.
AssA higher 100%Association Accuracy [1]. Association Jaccard index averaged over all matching detections and then averaged over localization thresholds.
DetA higher 100%Detection Accuracy [1]. Detection Jaccard index averaged over localization thresholds.
AssRe higher 100%Association Recall [1]. TPA / (TPA + FNA) averaged over all matching detections and then averaged over localization thresholds.
AssPr higher 100%Association Precision [1]. TPA / (TPA + FPA) averaged over all matching detections and then averaged over localization thresholds.
DetRe higher 100%Detection Recall [1]. TP /(TP + FN) averaged over localization thresholds.
DetPr higher 100%Detection Precision [1]. TP /(TP + FP) averaged over localization thresholds.
LocA higher 100%Localization Accuracy [1]. Average localization similarity averaged over all matching detections and averaged over localization thresholds.
MOTSA higher 100%Mask-based Multi-Object Tracking Accuracy (+/- denotes standard deviation across all sequences) [2]. Variant of MOTA, evaluated based on mask overlap (mask IoU)
MOTSP higher 100%Mask-overlap based variant of Multi-Object Tracking Precision. [2]. Variant of MOTP, evaluated based on mask IoU instead of bounding box IoU.
MODSA higher 100%Mask-overlap based Multi-Object Detection Accuracy [2]. Variant of MODA, evaluated based on the mask overlap (mask IoU).
MT higher 100%Mostly tracked targets. The ratio of ground-truth trajectories that are covered by a track hypothesis for at least 80% of their respective life span.
ML lower 0%Mostly lost targets. The ratio of ground-truth trajectories that are covered by a track hypothesis for at most 20% of their respective life span.
TP higher #GTThe total number of true positives.
FP lower 0The total number of false positives.
FN lower 0The total number of false negatives (missed targets).
Rcll higher 100%Ratio of correct detections to total number of GT boxes.
Prcn higher 100%Ratio of TP / (TP+FP).
ID Sw. lower 0Number of Identity Switches (ID switch ratio = #ID switches / recall) [4]. Please note that we follow the stricter definition of identity switches as described in the reference
Frag lower 0The total number of times a trajectory is fragmented (i.e. interrupted during tracking).
Hz higher Inf.Processing speed (in frames per second excluding the detector) on the benchmark. The frequency is provided by the authors and not officially evaluated by the MOTChallenge.

Legend

Symbol Description
online method This is an online (causal) method, i.e. the solution is immediately available with each incoming frame and cannot be changed at any later time.
using public detections This method used the provided detection set as input.
using private detections This method used a private detection set as input.
new This entry has been submitted or updated less than a week ago.

References:


[1] Jonathon Luiten, A.O. & Leibe, B. HOTA: A Higher Order Metric for Evaluating Multi-Object Tracking. International Journal of Computer Vision, 2020.
[2] Voigtlaender, P., Krause, M., Osep, A., Luiten, J., Sekar, B.B.G., Geiger, A. & Leibe, B. MOTS: Multi-Object Tracking and Segmentation. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019.
[3] Ristani, E., Solera, F., Zou, R., Cucchiara, R. & Tomasi, C. Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking. In ECCV workshop on Benchmarking Multi-Target Tracking, 2016.
[4] Li, Y., Huang, C. & Nevatia, R. Learning to associate: HybridBoosted multi-target tracker for crowded scene. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2009.