Time October 11 - October 17, 2021; Details TBA
Venue ICCV, virtual
Challenge opens June 1st, 2021
Challenge deadline September 10th, 2021
Method abstract deadline September 25th, 2021
Recordings Will be avalilbe after the workshop!



Time Title Speaker
10:00-10:20 Workshop introduction Organizers
10:20-10:50 Talk 1 TBD
10:50-11:20 Talk 2 TBD
11:20-11:50 Video challenge session TBD
12:50-12:20 Video + depth challenge session TBD
12:20-12:50 LiDAR challenge session TBD
12:50-2:00 Break -
2:00-2:30 Talk 3 TBD
2:30-3:00 Talk 4 TBD
3:00-3:30 Talk 5 TBD
3:30-3:50 Break -
3:50-4:20 Talk 5 TBD
4:20-4:50 Talk 6 TBD
4:50-4:10 Talk 7 TBD
5:10-5:30 Break -
5:30-6:30 Round table discussion All speakers
6:30-6:40 Closing remarks Organizers


For the 6th edition of our Benchmarking Multi-Target Tracking workshop, we are planning to take multi-object tracking and segmentation to the next level. In this edition, we will organize three challenging competitions, for which we require to assign semantic classes and track identities to all pixels in a video or 3D points based either on a monocular video or a LiDAR stream.

Video track

For this track we extended two existing datasets (KITTI and MOTChallenge) with dense, pixel-precise labels in both spatial and temporal domain: KITTI-STEP and MOTChallenge-STEP. For MOTChallenge-STEP, we have extended the instance-level annotations of two training and two test sequences of MOTChallenge-MOTS labels and 21 training and 29 test sequences of KITTI-MOTS labels. For more information we refer to our recent paper. The task will be to assign a semantic and unique instance label to every pixel of the video.

LiDAR track

This challenge will be based on the SemanticKITTI dataset, introduced in the context of LiDAR semantic segmentation and panoptic segmentation. The dataset is densely labeled in the spatial and temporal domain, which makes it a perfect test-bed for our 4D panoptic LiDAR segmentation challenge, as introduced in our recent paper on 4D Panoptic LiDAR segmentation. The task will be to assign a semantic and unique instance label to every 3D LiDAR point.

Video + depth track

This track will be based on the recently introduced SemanticKITTI-DVPS dataset, that augments LiDAR-based SemanticKITTI dataset with pixel-precise semantic and instance labels of images, derived from LiDAR labels in a semi-automated manner, providing semantic and depth labels needed for evaluation of joint video panoptic segmentation and monocular depth estimation. In addition to assigning semantic and instance labels, this track requires a depth estimate for every pixel.

For each challenge, we will award the challenge winner and the most innovative entry that our committee will select based on the submitted 4-page abstracts and invite both to give a short talk at our workshop. We will rank all methods with respect to the recently introduced Segmentating and Tracking Quality (STQ) metric.

More details will be announced soon!


Aljoša Ošep (TUM)

Mark Weber (TUM)

Patrick Dendorfer (TUM)

Jens Behley (Uni Bonn)

Cyrill Stachniss (Uni Bonn)

Andreas Geiger (MPI/Tübingen)

Jun Xie (Google)

Siyuan Qiao (JHU)

Daniel Cremers (TUM)

Liang-Chieh Chen (Google)

Laura Leal-Taixé (TUM)