MOT15

This benchmark contains video sequences in unconstrained environments filmed with both static and moving cameras. Tracking and evaluation are done in image coordinates.

Training Set

Sample Name FPS Resolution Length Tracks BoxesDensityDescriptionSourceRef.
Venice-2301920x1080600 (00:20)26714111.9People walking around a large square.link[1]
KITTI-17101224x370145 (00:15)96834.7Walking pedestrians on a sunny day, static cameralink[2]
KITTI-13101242x375340 (00:34)427622.2Busy urban environment filmed from a moving carlink[2]
ADL-Rundle-8301920x1080654 (00:22)28678310.4A pedestrian scene filmed at night by a moving cameralink[1]
ADL-Rundle-6301920x1080525 (00:18)2450099.5A pedestrian street scene filmed from a low angle.link[1]
ETH-Pedcross214640x480837 (01:00)13362637.5Street scene from a moving platformlink[3]
ETH-Sunnyday14640x480354 (00:25)3018585.2Street scene on a sunny day from a moving platformlink[3]
ETH-Bahnhof14640x4801000 (01:11)17154155.4Street scene from a moving platformlink[3]
PETS09-S2L17768x576795 (01:54)1944765.6A widely used sequence showing up to 8 walking pedestrians, partly in unusual patterns.link[4]
TUD-Campus25640x48071 (00:03)83595.1A short sequence with side-view pedestrianslink[5]
TUD-Stadtmitte25640x480179 (00:07)1011566.5A static camera at about 2 meters height shows walking people on the street.link[6]
Total 5500 frm.
(389 s.)
500 39905 7.3

Test Set

Sample Name FPS Resolution Length Tracks BoxesDensityDescriptionSourceRef.
Venice-1301920x1080450 (00:15)17456310.1People walking around a large square.link[1]
KITTI-19101238x3741059 (01:46)6253435.0A street scene from a moving vehiclelink[2]
KITTI-16101224x370209 (00:21)1717018.1Pedestrians crossing a street filmed from a carlink[2]
ADL-Rundle-3301920x1080625 (00:21)441016616.3A crowded pedestrian street, stationary cameralink[1]
ADL-Rundle-1301920x1080500 (00:17)32930618.6A busy pedestrian street filmed at eye level by a moving cameralink[1]
AVG-TownCentre2.51920x1080450 (03:45)226714815.9A pedestrian street filmed from an elevated pointlink[7]
ETH-Crossing14640x480219 (00:16)2610034.6Street scene from a moving platformlink[3]
ETH-Linthescher14640x4801194 (01:25)19789307.5Street scene from a moving platformlink[3]
ETH-Jelmoli14640x480440 (00:31)4525375.8Street scene from a moving platformlink[3]
PETS09-S2L27768x576436 (01:02)42964122.1A crowded scene shown from an elevated viewpoint.link[4]
TUD-Crossing25640x480201 (00:08)1311025.5A road crossing from a side viewlink[5]
Total 5783 frm.
(607 s.)
721 61440 10.6


Download

Get all data (1.3 GB)
Get files (no img) only (3.7 MB)
Development Kit

References:


[1] Leal-Taixé, L., Milan, A., Reid, I., Roth, S. & Schindler, K. MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking. arXiv:1504.01942 [cs], 2015., (arXiv: 1504.01942).
[2] Geiger, A., Lenz, P. & Urtasun, R. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2012.
[3] Ess, A., Leibe, B. & Gool, L.V. Depth and Appearance for Mobile Scene Analysis. In Proceedings of the Eleventh IEEE International Conference on Computer Vision, 2007.
[4] Ferryman, J. & Shahrokni, A. PETS2009: Dataset and challenge. In 11th IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS), 2009.
[5] Andriluka, M., Roth, S. & Schiele, B. People-Tracking-by-Detection and People-Detection-by-Tracking. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2008.
[6] Andriluka, M., Roth, S. & Schiele, B. Monocular 3D Pose Estimation and Tracking by Detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
[7] Benfold, B. & Reid, I. Guiding Visual Surveillance by Tracking Human Attention. In Proceedings of the British Machine Vision Conference, 2009.