This benchmark extends the traditional Multi-Object Tracking benchmark to a new benchmark defined on a pixel-level with precise segmentation masks. We annotated 8 challenging video sequences (4 training, 4 test) in unconstrained environments filmed with both static and moving cameras. Tracking, segmentation and evaluation are done in image coordinates. All sequences have been annotated with high accuracy on a pixel level, strictly following a well-defined protocol.
Sample | Name | FPS | Resolution | Length | Tracks | Boxes | Density | Description | Source | Ref. |
MOTS20-11 | 30 | 1920x1080 | 900 (00:30) | 62 | 8511 | 9.5 | Forward moving camera in a busy shopping mall | link | [1] | |
MOTS20-09 | 30 | 1920x1080 | 525 (00:18) | 26 | 4774 | 9.1 | A pedestrian street scene filmed from a low angle. | link | [1] | |
MOTS20-05 | 14 | 640x480 | 837 (01:00) | 103 | 6570 | 7.8 | Street scene from a moving platform | link | [1] | |
MOTS20-02 | 30 | 1920x1080 | 600 (00:20) | 37 | 7039 | 11.7 | People walking around a large square. | link | [1] | |
Total | 2862 frm. (128 s.) | 228 | 26894 | 9.4 |
Sample | Name | FPS | Resolution | Length | Tracks | Boxes | Density | Description | Source | Ref. |
MOTS20-07 | 30 | 1920x1080 | 500 (00:17) | 58 | 12878 | 25.8 | A busy pedestrian street filmed at eye level by a moving camera | link | [1] | |
MOTS20-12 | 30 | 1920x1080 | 900 (00:30) | 68 | 6471 | 7.2 | Forward moving camera in a busy shopping mall | link | [1] | |
MOTS20-06 | 14 | 640x480 | 1194 (01:25) | 190 | 9814 | 8.2 | Street scene from a moving platform | link | [1] | |
MOTS20-01 | 30 | 1920x1080 | 450 (00:15) | 12 | 3106 | 6.9 | People walking around a large square. | link | [1] | |
Total | 3044 frm. (147 s.) | 328 | 32269 | 10.6 |
[1] | MOTS: Multi-Object Tracking and Segmentation. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019. |