MOTS

This benchmark extends the traditional Multi-Object Tracking benchmark to a new benchmark defined on a pixel-level with precise segmentation masks. We annotated 8 challenging video sequences (4 training, 4 test) in unconstrained environments filmed with both static and moving cameras. Tracking, segmentation and evaluation are done in image coordinates. All sequences have been annotated with high accuracy on a pixel level, strictly following a well-defined protocol.

Training Set

Sample Name FPS Resolution Length Tracks BoxesDensityDescriptionSourceRef.
MOTS20-11301920x1080900 (00:30)6285119.5Forward moving camera in a busy shopping malllink[1]
MOTS20-09301920x1080525 (00:18)2647749.1A pedestrian street scene filmed from a low angle.link[1]
MOTS20-0514640x480837 (01:00)10365707.8Street scene from a moving platformlink[1]
MOTS20-02301920x1080600 (00:20)37703911.7People walking around a large square.link[1]
Total 2862 frm.
(128 s.)
228 26894 9.4

Test Set

Sample Name FPS Resolution Length Tracks BoxesDensityDescriptionSourceRef.
MOTS20-07301920x1080500 (00:17)581287825.8A busy pedestrian street filmed at eye level by a moving cameralink[1]
MOTS20-12301920x1080900 (00:30)6864717.2Forward moving camera in a busy shopping malllink[1]
MOTS20-0614640x4801194 (01:25)19098148.2Street scene from a moving platformlink[1]
MOTS20-01301920x1080450 (00:15)1231066.9People walking around a large square.link[1]
Total 3044 frm.
(147 s.)
328 32269 10.6


Download

Get all data (783.5 MB)
Get detections and labels only (11.4 MB)
Get development kit (0.5 MB)

References:


[1] Voigtlaender, P., Krause, M., Osep, A., Luiten, J., Sekar, B.B.G., Geiger, A. & Leibe, B. MOTS: Multi-Object Tracking and Segmentation. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019.