Click on a measure to sort the table accordingly. See below for a more detailed description.
Tracker | AP | MODA | MODP | FAF | TP | FP | FN | Rcll | Prcn | F1 | Hz |
BreseeNet 1. |
0.90 ±0.04 |
89.2 ±14.7 | 82.4 | 1.1 | 108,622 | 6,406 | 5,942 | 94.8 | 94.4 | 94.6 | 14.8 |
SGT_det 2. |
0.90 ±0.08 |
86.6 ±11.4 | 82.4 | 1.3 | 106,849 | 7,688 | 7,715 | 93.3 | 93.3 | 93.3 | 20.8 |
J. Hyun, M. Kang, D. Wee, Y. Yeung. Detection Recovery in Online Multi-Object Tracking with Sparse Graph Tracker. In , 2022. | |||||||||||
SeedDet 3. |
0.90 ±0.07 |
81.8 ±12.1 | 82.8 | 2.3 | 107,291 | 13,631 | 7,273 | 93.7 | 88.7 | 91.1 | 8.2 |
seedland multi-target detection | |||||||||||
GST_det_Test 4. |
0.90 ±0.07 |
81.8 ±12.7 | 82.9 | 2.4 | 107,722 | 13,972 | 6,842 | 94.0 | 88.5 | 91.2 | 29.6 |
GST_det 5. |
0.90 ±0.08 |
79.9 ±10.1 | 82.4 | 2.5 | 106,337 | 14,806 | 8,227 | 92.8 | 87.8 | 90.2 | 408.2 |
MixNet 6. |
0.90 ±0.07 |
74.3 ±15.0 | 78.6 | 3.8 | 107,764 | 22,661 | 6,800 | 94.1 | 82.6 | 88.0 | 197.3 |
AInnoDetV2 7. |
0.89 ±0.05 |
-58.4 ±126.7 | 79.1 | 29.5 | 107,733 | 174,608 | 6,831 | 94.0 | 38.2 | 54.3 | 296.0 |
AInnovation: PC Attention Net | |||||||||||
PA_Det_NJ 8. |
0.89 ±0.06 |
82.1 ±20.1 | 79.3 | 2.4 | 108,480 | 14,417 | 6,084 | 94.7 | 88.3 | 91.4 | 59.2 |
PA_TECH_NJ | |||||||||||
YTLAB 9. |
0.89 ±0.07 |
76.7 ±13.1 | 80.2 | 2.8 | 104,555 | 16,685 | 10,009 | 91.3 | 86.2 | 88.7 | 22.3 |
Z. Cai, Q. Fan, R. Feris, N. Vasconcelos. A unified multi-scale deep convolutional neural network for fast object detection. In European Conference on Computer Vision, 2016. | |||||||||||
KDNT 10. |
0.89 ±0.07 |
67.1 ±22.4 | 80.1 | 4.8 | 105,473 | 28,623 | 9,091 | 92.1 | 78.7 | 84.8 | 0.8 |
F. Yu, W. Li, Q. Li, Y. Liu, X. Shi, J. Yan. POI: Multiple Object Tracking with High Performance Detection and Appearance Feature. In BMTT, SenseTime Group Limited, 2016. | |||||||||||
Tracker | AP | MODA | MODP | FAF | TP | FP | FN | Rcll | Prcn | F1 | Hz |
F_ViPeD_B 11. |
0.89 ±0.06 |
-14.4 ±115.1 | 77.4 | 20.8 | 106,698 | 123,194 | 7,831 | 93.2 | 46.4 | 62.0 | 14.8 |
L. Ciampi, N. Messina, F. Falchi, C. Gennaro, G. Amato. Virtual to Real Adaptation of Pedestrian Detectors. In Sensors, 2020. | |||||||||||
GNN_SDT 12. |
0.89 ±0.09 |
78.1 ±8.3 | 81.3 | 2.4 | 103,895 | 14,397 | 10,669 | 90.7 | 87.8 | 89.2 | 5,919.0 |
Y. Wang, X. Weng, K. Kitani. Joint Detection and Multi-Object Tracking with Graph Neural Networks. In arXiv, 2020. | |||||||||||
ISE_Detv2 13. |
0.88 ±0.05 |
67.4 ±12.8 | 79.9 | 4.9 | 106,094 | 28,865 | 8,470 | 92.6 | 78.6 | 85.0 | 3.2 |
MIFD | |||||||||||
ZIZOM 14. |
0.81 ±0.05 |
72.0 ±22.0 | 79.8 | 2.2 | 95,414 | 12,990 | 19,139 | 83.3 | 88.0 | 85.6 | 2.4 |
C. Lin, L. Jiwen, G. Wang, J. Zhou. Graininess-Aware Deep Feature Learning for Pedestrian Detection. In ECCV, 2018. | |||||||||||
SDP 15. |
0.81 ±0.12 |
76.9 ±16.2 | 78.0 | 1.3 | 95,699 | 7,599 | 18,865 | 83.5 | 92.6 | 87.9 | 0.6 |
F. Yang, W. Choi, Y. Lin. Exploit All the Layers: Fast and Accurate CNN Object Detector With Scale Dependent Pooling and Cascaded Rejection Classifiers. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. | |||||||||||
FRCNN 16. |
0.72 ±0.13 |
68.5 ±0.0 | 78.0 | 1.7 | 88,601 | 10,081 | 25,963 | 77.3 | 89.8 | 83.1 | 5.1 |
S. Ren, K. He, R. Girshick, J. Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NIPS, 2015. | |||||||||||
DPM 17. |
0.61 ±0.14 |
31.2 ±10.8 | 75.8 | 7.1 | 78,007 | 42,308 | 36,557 | 68.1 | 64.8 | 66.4 | 19.7 |
P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan. Object Detection with Discriminatively Trained Part Based Models. In TPAMI, 2010. | |||||||||||
MHD 18. |
0.49 ±0.20 |
11.2 ±42.9 | 69.9 | 8.8 | 64,637 | 51,801 | 49,927 | 56.4 | 55.5 | 56.0 | 3.0 |
Mobilenet-based Human Detection | |||||||||||
YLHDv2 19. |
0.46 ±0.11 |
56.9 ±12.5 | 73.2 | 2.5 | 80,093 | 14,938 | 34,471 | 69.9 | 84.3 | 76.4 | 11.8 |
https://arxiv.org/abs/1612.08242 | |||||||||||
HDGP 20. |
0.45 ±0.20 |
42.1 ±20.1 | 76.4 | 1.3 | 55,680 | 7,436 | 58,884 | 48.6 | 88.2 | 62.7 | 0.6 |
A. Garcia-Martin, R. Sanchez-Matilla, J. Martinez. Hierarchical detection of persons in groups. In Signal, Image and Video Processing, 2017. | |||||||||||
Tracker | AP | MODA | MODP | FAF | TP | FP | FN | Rcll | Prcn | F1 | Hz |
VDet 21. |
0.44 ±0.19 |
44.7 ±19.3 | 75.7 | 1.0 | 56,980 | 5,765 | 57,584 | 49.7 | 90.8 | 64.3 | 5.9 |
Vitrociset Detection Algorithm | |||||||||||
ACF 22. |
0.32 ±0.00 |
18.1 ±0.0 | 72.1 | 2.8 | 37,312 | 16,539 | 77,252 | 32.6 | 69.3 | 44.3 | 74.0 |
P. Dollar, R. Appel, S. Belongie, P. Perona. Fast Feature Pyramids for Object Detection. In TPAMI, 2014. | |||||||||||
SSDMNv3EDD3 23. |
0.22 ±0.16 |
-43.7 ±34.4 | 68.0 | 14.9 | 38,288 | 88,370 | 76,276 | 33.4 | 30.2 | 31.7 | 4.2 |
Sequences | Frames | Trajectories | Boxes |
7 | 5919 | 785 | 188076 |
Sequence difficulty (from easiest to hardest, measured by average AP)
...
...
Measure | Better | Perfect | Description |
AP | higher | 1 | Average Precision taken over a set of reference recall values (0:0.1:1) |
MODA | higher | 100% | Multi-Object Detection Accuracy [1]. This measure combines false positives and missed targets. |
MODP | higher | 100% | Multi-Object Detection Precision [1]. The misalignment between the annotated and the predicted bounding boxes. |
FAF | lower | 0 | The average number of false alarms per frame. |
TP | higher | #GT | The total number of true positives. |
FP | lower | 0 | The total number of false positives. |
FN | lower | 0 | The total number of false negatives (missed targets). |
Rcll | higher | 100% | Ratio of correct detections to total number of GT boxes. |
Prcn | higher | 100% | Ratio of TP / (TP+FP). |
F1 | higher | 100% | Harmonic mean of precision and recall. |
Hz | higher | Inf. | Processing speed (in frames per second excluding the detector) on the benchmark. The frequency is provided by the authors and not officially evaluated by the MOTChallenge. |
Symbol | Description |
This is an online (causal) method, i.e. the solution is immediately available with each incoming frame and cannot be changed at any later time. | |
This method used the provided detection set as input. | |
This method used a private detection set as input. | |
![]() |
This entry has been submitted or updated less than a week ago. |
[1] | Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. Image and Video Processing, 2008(1):1-10, 2008. |