Click on a measure to sort the table accordingly. See below for a more detailed description.
Tracker | HOTA | sMOTSA | IDF1 | AssA | DetA | AssRe | AssPr | DetRe | DetPr | LocA | MOTSA | MOTSP | MODSA | MT | ML | TP | FP | FN | Rcll | Prcn | ID Sw. | Frag | Hz |
Trcktr_bline 1. | 48.8 |
50.0 ±3.2 | 61.2 | 44.6 | 54.9 | 50.1 | 73.2 | 58.9 | 73.3 | 78.5 | 69.0 | 74.7 | 69.8 | 117 (35.7) | 58 (17.7) | 24,222 | 1,714 | 8,047 | 75.1 | 93.4 | 257 (0.0) | 642 (0.0) | 1.7 |
P. Bergmann, T. Meinhardt, L. Leal-Taix'e. Tracking Without Bells and Whistles. In The IEEE International Conference on Computer Vision (ICCV), 2019. |
Sequences | Frames | Trajectories | Boxes |
4 | 3044 | 328 | 32269 |
Sequence difficulty (from easiest to hardest, measured by average HOTA)
Measure | Better | Perfect | Description |
HOTA | higher | 100% | Higher Order Tracking Accuracy [1]. Geometric mean of detection accuracy and association accuracy. Averaged across localization thresholds. |
sMOTSA | higher | 100% | Mask-based Soft Multi-Object Tracking Accuracy (+/- denotes standard deviation across all sequences) [2]. Soft version of MOTSA, accumulates the mask overlaps of true positives instead of only counting how many masks reach an IoU of more than 0.5. |
IDF1 | higher | 100% | ID F1 Score [3]. The ratio of correctly identified detections over the average number of ground-truth and computed detections. |
AssA | higher | 100% | Association Accuracy [1]. Association Jaccard index averaged over all matching detections and then averaged over localization thresholds. |
DetA | higher | 100% | Detection Accuracy [1]. Detection Jaccard index averaged over localization thresholds. |
AssRe | higher | 100% | Association Recall [1]. TPA / (TPA + FNA) averaged over all matching detections and then averaged over localization thresholds. |
AssPr | higher | 100% | Association Precision [1]. TPA / (TPA + FPA) averaged over all matching detections and then averaged over localization thresholds. |
DetRe | higher | 100% | Detection Recall [1]. TP /(TP + FN) averaged over localization thresholds. |
DetPr | higher | 100% | Detection Precision [1]. TP /(TP + FP) averaged over localization thresholds. |
LocA | higher | 100% | Localization Accuracy [1]. Average localization similarity averaged over all matching detections and averaged over localization thresholds. |
MOTSA | higher | 100% | Mask-based Multi-Object Tracking Accuracy (+/- denotes standard deviation across all sequences) [2]. Variant of MOTA, evaluated based on mask overlap (mask IoU) |
MOTSP | higher | 100% | Mask-overlap based variant of Multi-Object Tracking Precision. [2]. Variant of MOTP, evaluated based on mask IoU instead of bounding box IoU. |
MODSA | higher | 100% | Mask-overlap based Multi-Object Detection Accuracy [2]. Variant of MODA, evaluated based on the mask overlap (mask IoU). |
MT | higher | 100% | Mostly tracked targets. The ratio of ground-truth trajectories that are covered by a track hypothesis for at least 80% of their respective life span. |
ML | lower | 0% | Mostly lost targets. The ratio of ground-truth trajectories that are covered by a track hypothesis for at most 20% of their respective life span. |
TP | higher | #GT | The total number of true positives. |
FP | lower | 0 | The total number of false positives. |
FN | lower | 0 | The total number of false negatives (missed targets). |
Rcll | higher | 100% | Ratio of correct detections to total number of GT boxes. |
Prcn | higher | 100% | Ratio of TP / (TP+FP). |
ID Sw. | lower | 0 | Number of Identity Switches (ID switch ratio = #ID switches / recall) [4]. Please note that we follow the stricter definition of identity switches as described in the reference |
Frag | lower | 0 | The total number of times a trajectory is fragmented (i.e. interrupted during tracking). |
Hz | higher | Inf. | Processing speed (in frames per second excluding the detector) on the benchmark. The frequency is provided by the authors and not officially evaluated by the MOTChallenge. |
Symbol | Description |
This is an online (causal) method, i.e. the solution is immediately available with each incoming frame and cannot be changed at any later time. | |
This method used the provided detection set as input. | |
This method used a private detection set as input. | |
![]() |
This entry has been submitted or updated less than a week ago. |
[1] | HOTA: A Higher Order Metric for Evaluating Multi-Object Tracking. International Journal of Computer Vision, 2020. |
[2] | MOTS: Multi-Object Tracking and Segmentation. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019. |
[3] | Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking. In ECCV workshop on Benchmarking Multi-Target Tracking, 2016. |
[4] | Learning to associate: HybridBoosted multi-target tracker for crowded scene. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2009. |