Short name:
GSDT_V2
Detector:
Private
Description:
Object detection and data association are critical components in multi-object tracking (MOT) systems. Despite the fact that the two components are dependent on each other, prior work often designs detection and data association modules separately which are trained with different objectives. As a result, we cannot back-propagate the gradients and optimize the entire MOT system, which leads to sub-optimal performance. To address this issue, recent work simultaneously optimizes detection and data association modules under a joint MOT framework, which has shown improved performance in both modules. In this work, we propose a new instance of joint MOT approach based on Graph Neural Networks (GNNs). The key idea is that GNNs can model relations between variable-sized objects in both the spatial and temporal domains, which is essential for learning discriminative features for detection and data association. Through extensive experiments on the MOT15/16/17/20 datasets, we demonstrate the effectiveness of our GNN-based joint MOT approach and show the state-of-the-art performance for both detection and MOT tasks.
Reference:
Y. Wang, K. Kitani, X. Weng. Joint Object Detection and Multi-Object Tracking with Graph Neural Networks. In ICRA, 2021.
Last submitted:
March 24, 2021 (4 years ago)
Published:
March 24, 2021 at 07:55:48 CET
Submissions:
2
Project page / code:
Open source:
Yes
Hardware:
Titan XP
Runtime:
1.5 Hz
Benchmark performance:
Sequence | MOTA | IDF1 | HOTA | MT | ML | FP | FN | Rcll | Prcn | AssA | DetA | AssRe | AssPr | DetRe | DetPr | LocA | FAF | ID Sw. | Frag |
MOT20 | 67.1 | 67.5 | 53.6 | 660 (53.1) | 164 (13.2) | 31,507 | 135,395 | 73.8 | 92.4 | 52.7 | 54.7 | 58.5 | 71.8 | 59.8 | 74.9 | 81.7 | 7.0 | 3,230 (0.0) | 9,878 (0.0) |
Detailed performance:
Sequence | MOTA | IDF1 | HOTA | MT | ML | FP | FN | Rcll | Prcn | AssA | DetA | AssRe | AssPr | DetRe | DetPr | LocA | FAF | ID Sw. | Frag |
MOT20-04 | 82.3 | 78.6 | 62.1 | 455 | 33 | 10,113 | 37,464 | 86.3 | 95.9 | 59.1 | 65.4 | 65.1 | 75.7 | 70.2 | 78.0 | 82.2 | 4.9 | 1,015 | 3,889 |
MOT20-06 | 49.9 | 52.0 | 40.9 | 91 | 67 | 9,796 | 55,409 | 58.3 | 88.8 | 39.4 | 42.6 | 44.7 | 63.0 | 46.8 | 71.4 | 80.7 | 9.7 | 1,262 | 3,566 |
MOT20-07 | 75.0 | 68.1 | 55.6 | 71 | 2 | 1,870 | 6,116 | 81.5 | 93.5 | 51.7 | 60.6 | 57.3 | 72.7 | 66.4 | 76.1 | 82.1 | 3.2 | 280 | 608 |
MOT20-08 | 39.6 | 48.9 | 39.0 | 43 | 62 | 9,728 | 36,406 | 53.0 | 80.9 | 41.3 | 37.0 | 46.8 | 64.7 | 42.6 | 64.9 | 80.8 | 12.1 | 673 | 1,815 |
Raw data: