TAO VOS Benchmark

TAO-VOS is an extension of the TAO Benchmark, where we added segmentation mask annotations. TAO-VOS contains 626 high resolution videos, captured in diverse environments, which are half a minute long on average and cover a large variety of categories. The validation set of 126 sequences was annotated with masks fully manually. The training set of 500 sequences was annotated semi-automatically at high quality level with minor errors in the masks (for details see [1]). As for the original TAO Benchmark, the videos are annotated at 1 FPS, while the raw videos have 30 FPS. Here we provide the masks of the annotated frames together with the corresponding images. If you want to have the images of the intermediate frames (full 30 FPS), please download them from the original TAO Benchmark.

Training Set

Sample Name FPS Resolution Length Tracks BoxesDensityDescriptionSourceRef.
TAO_VOS_val301280x7200 (00:00)835149870.0Validation set, for training trackerslink[1]
TAO_VOS_train301280x7200 (00:00)2833591040.0Training set, for training trackerslink[1]
Total 0 frm.
(0 s.)
3668 74091 inf

Test Set

Sample Name FPS Resolution Length Tracks BoxesDensityDescriptionSourceRef.
Total 0 frm.
(0 s.)
nan


Download

Get all data (728MB)
Get files (no img) only (113MB)
Development Kit

References:


[1] Voigtlaender, P., Luo, L., Yuan, C., Jiang, Y. & Leibe, B. Reducing the Annotation Effort for Video Object Segmentation Datasets. In WACV, 2021.