Please login or register to submit your results.

Submission Policy

We strongly encourage all participants to use only the sequences from the training set for finding parameters and report results on the provided detections to enable a meaningful comparison of tracking methods.

Tracking results will be evaluated automatically and made visible only to you. You will be able to make them public at any time.

Important: The evaluation server is not to be used for training. To discourage this, you will have to wait 72 hours (3 days) before you can re-submit your results. Moreover, you cannot submit the results for the same methods more than 4 times. If you want to present results of your method with various settings (e.g. different features, inference method, etc.), please use the training set for this purpose and submit only one result to the test server! Note that only the latest submission will be considered.

File Format

Please submit your results as a single .zip file. The results for each sequence must be stored in a separate .txt file in the archive's root folder. The file name must be exactly like the sequence name (case sensitive).

The file format should be the same as the ground truth file, which is a CSV text-file containing one object instance per line. Each line must contain 10 values:

<frame>, <id>, <bb_left>, <bb_top>, <bb_width>, <bb_height>, <conf>, <x>, <y>, <z>

The conf value contains the detection confidence in the det.txt files. For the ground truth, it acts as a flag whether the entry is to be considered. A value of 0 means that this particular instance is ignored in the evaluation, while any other value can be used to mark it as active. For submitted results, all lines in the .txt file are considered. The world coordinates x,y,z are ignored for the 2D challenge and can be filled with -1. Similarly, the bounding boxes are ignored for the 3D challenge. However, each line is still required to contain 10 values.

All frame numbers, target IDs and bounding boxes are 1-based. Here is an example:

Tracking with bounding boxes

(2D MOT 2015, MOT16, MOT17, MOT20, HT21)

  1, 3, 794.27, 247.59, 71.245, 174.88, -1, -1, -1, -1
  1, 6, 1648.1, 119.61, 66.504, 163.24, -1, -1, -1, -1
  1, 8, 875.49, 399.98, 95.303, 233.93, -1, -1, -1, -1

Tracking in world coordinates

(3D MOT)

  1, 3, -1, -1, -1, -1, -1, 123.32, 2342.3, 0
  1, 4, -1, -1, -1, -1, -1, 153.12, 2478.2, 0
  2, 3, -1, -1, -1, -1, -1, 125.23, 2213.7, 0

Detection with bounding boxes


  1, -1, 794.27, 247.59, 71.245, 174.88, 4.56
  1, -1, 1648.1, 119.61, 66.504, 163.24, 0.32
  1, -1, 875.49, 399.98, 95.303, 233.93, -1.34

Multi Object Tracking & Segmentation

(MOTS Challenge)

Each line of an annotation txt file is structured like this (where rle means run-length encoding from COCO):

time_frame id class_id img_height img_width rle

An example line from a txt file:

52 1005 1 375 1242 WSV:2d;1O10000O10000O1O100O100O1O100O1000000000000000O100O102N5K00O1O1N2O110OO2O001O1NTga3

time frame 52
object id 1005 (meaning class id is 1, i.e. car and instance id is 5)
class id 1
image height 375
image width 1242
rle WSV:2d;1O10000O10000O1O100O100O1O100O1000000000000000O100O...1O1N

image height, image width, and rle can be used together to decode a mask using cocotools .

3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset


Submit your tracking result where each row of your submission file has to contain the following values. The values are defined as in the annotation file, and any other values will be ignored.

Each line of an annotation txt file is structured as follows:

frame: The video frame which the annotation is associated with 
id: Identity of the fish
3d_x: x coordinate of 3D head position in world coordinates
3d_y: y coordinate of 3D head position in world coordinates
3d_z: z coordinate of 3D head position in world coordinates

Four example lines of a submission txt file:

1, 1, 19.61, 28.313, 7.93
1, 2, 18.317, 28.636, 8.911
2, 1, 19.685, 28.348, 7.886
2, 2, 18.197, 28.625, 8.868

TAO: A Large-Scale Benchmark for Tracking Any Object

Submit your tracking result as a zipped-up json file, which contains a list of elements in the following format:

    "image_id" : int,
    "category_id" : int,
    "bbox" : [x,y,width,height],
    "score" : float,
    "track_id": int,
    "video_id": int
}, ...]
For more details, see, or contact


Submit your result as a zip file, which contains a folder per sequence in its root. Each folder must contain one png file per frame. Please follow the exact naming convention, so your results can be evaluated. Each png file must be RGB with the same size as the input. The channels encode the following: (R) contains the semantic class, (G) contains the trackID // 256, (B) contains the trackID % 256. A list of semantic classes can be found here.
|- STEP-ICCV21-01
|--- 000001.png
|--- 000002.png
|--- ...
|- STEP-ICCV21-07
|--- ...
For any questions regarding the dataset or evaluation, please contact Mark Weber.

TAO Open-/Long-Tail World

Please only upload predictions for which there exists ground-truth (1FPS). The detailed instructions can be found here. Please contact the benchmark author (Ali Athar) in case instructions are unclear.

Archive Structure

The content of the .zip archive should contain

















TAO Challenge




TAO VOS Benchmark


Head Tracking 21




TAO Long-Tail


TAO Open-World