CroHD provides tracking annotation of pedestrian heads in densely populated video sequences. It consists of 2,276,838 human heads in 11,463 frames across 9 sequences of Full-HD resolution. We built CroHD upon 5 sequences from the publicly available MOTChallenge CVPR19 benchmark to enable performance comparison of trackers in the same scene between two paradigms - head tracking and pedestrian tracking. We further annotated 4 new sequences of higher crowd densities in two new scenarios. The new scenario centers on the Shibuya Train station and Shibuya Crossing, one of the busiest pedestrian crossings in the world. All sequences in CroHD have a framerate of 25fps and are captured from an elevated viewpoint. The sequences involve crowded indoor and outdoor scenes, recorded across different lighting and environmental conditions.
Sample | Name | FPS | Resolution | Length | Tracks | Boxes | Density | Description | Source | Ref. |
![]() | HT21-01 | 25 | 1920x1080 | 429 (00:17) | 85 | 21456 | 50.0 | Crowded indoor train station. | link | [1] |
![]() | HT21-04 | 25 | 1920x1080 | 997 (00:40) | 580 | 175479 | 176.0 | Crowded outdoor train station. | link | [1] |
![]() | HT21-03 | 25 | 1920x1080 | 1000 (00:40) | 811 | 257939 | 257.9 | Crowded pedestrian crossing. | link | [1] |
![]() | HT21-02 | 25 | 1920x1080 | 3315 (02:13) | 1276 | 733622 | 221.3 | People leaving entrance of stadium by night time, elevated viewpoint. | link | [1] |
Total | 5741 frm. (230 s.) | 2752 | 1188496 | 207.0 |
Sample | Name | FPS | Resolution | Length | Tracks | Boxes | Density | Description | Source | Ref. |
![]() | HT21-11 | 25 | 1920x1080 | 585 (00:23) | 133 | 38492 | 65.8 | Crowded indoor train station. | link | [1] |
![]() | HT21-13 | 25 | 1920x1080 | 1000 (00:40) | 734 | 259603 | 259.6 | Crowded pedestrian crossing. | link | [1] |
![]() | HT21-15 | 25 | 1920x734 | 1008 (00:40) | 321 | 149821 | 148.6 | A pedestrian street scene. | link | [1] |
![]() | HT21-14 | 25 | 1920x1080 | 1050 (00:42) | 1040 | 258227 | 245.9 | Crowded outdoor train station. | link | [1] |
![]() | HT21-12 | 25 | 1920x1080 | 2080 (01:23) | 737 | 380647 | 183.0 | People leaving entrance of stadium by night time, elevated viewpoint. | link | [1] |
Total | 5723 frm. (228 s.) | 2965 | 1086790 | 189.9 |
[1] | Tracking Pedestrian Heads in Dense Crowd. In Conference on Computer Vision and Pattern Recognition (CVPR), 2021. |