The "Tracking Any Object in Open-World CVPR 2023 Challenge" consists of two sub-challenges: (i) long-tail challenge and (ii) open-world challenge. These challenges are based on the BURST Benchmark [1], which in turn is an extension of the Tracking Any Object (TAO) dataset [2] that involves pixel-precise segmentation masks for all objects. Submissions for both challenges will be evaluated on three sets of object classes: 1) a 78 "common" class set which roughly corresponds to the 80 standard COCO classes [3]. 2) a 404 "uncommon" class for which there are often very few samples in the dataset. 3) The union of the above two, i.e. the 482 class "all" set. Note that our 482 classes are a subset of the much larger class set for the LVIS dataset for image-level instance segmentation [6]. For the long-tail tracking benchmark, models can be trained using annotations for all 482 classes. Submissions will be evaluated using the HOTA metrics. [4] These are computed separately for each of the three class sets above, and are denoted by HOTA_com, HOTA_unc and HOTA_all, respectively. For more details about the metrics, please refer to the HOTA metrics paper [4] and Sec. 6 of the BURST benchmark paper [2] This challenge is a part of CVPR 2023 workshop: "Tracking and Its Many Guises: Tracking Any Object in Open-World".
Sample | Name | FPS | Resolution | Length | Tracks | Boxes | Density | Description | Source | Ref. |
BURST_test | 30 | 1280x720 | 52194 (29:00) | 7963 | 167132 | 3.2 | The test set of BURST. | link | [1] | |
Total | 52194 frm. (1740 s.) | 7963 | 167132 | 3.2 |
[1] | BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video. In WACV, 2023. |
[2] | TAO: A Large-Scale Benchmark for Tracking Any Object. In European Conference on Computer Vision, 2020. |
[3] | Microsoft COCO: Common Objects in Context. In Computer Vision -- ECCV 2014, 2014. |
[4] | HOTA: A Higher Order Metric for Evaluating Multi-Object Tracking. International Journal of Computer Vision, 2020. |
[5] | Opening up Open-World Tracking. In Conference on Computer Vision and Pattern Recognition (CVPR), 2022. |
[6] | LVIS: A Dataset for Large Vocabulary Instance Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. |