In online multi-target tracking, it is of great importance to model appearance and geometric similarity between pedestrians which have been tracked and appeared in a new frame. The dimension of the inherent feature vector in the appearance model is higher than that in the geometric model, which causes many problems in general. However, the recent success of deep learning-based methods makes it possible to handle high dimensional appearance information successfully. Among many deep networks, the Siamese network with triplet loss is popularly
adopted as an appearance feature extractor. Since the Siamese network can extract features of each input independently, it is possible to update and maintain target-specific features. However, it is not suitable for multi-target settings that require comparison with other inputs. In this paper, to address this issue, we propose a novel track appearance model based on the joint-inference network. The proposed method enables a comparison of two inputs to be used for adaptive appearance modeling, and contributes to disambiguating the process of target-observation matching and consolidating identity consistency. Diverse experimental results support the effectiveness of our method. Our work has been awarded as a 3rd-highest tracker on MOTChallenge19, held in CVPR2019.1 The code is available on https://github.com/yyc9268/Deep-TAMA.
Y. Yoon, D. Kim, K. Yoon, Y. Song, M. Jeon. Online Multiple Pedestrian Tracking using Deep Temporal Appearance Matching Association. In arXiv:1907.00831, 2019.
February 10, 2019 (1 year ago)
April 28, 2019 at 13:42:34 CET
Project page / code:
3.7GHZ, 1 Core, no GPU
|MOT16||46.2||49.4||75.4||107 (14.1)||334 (44.0)||5,126||92,367||49.3||94.6||0.9||598 (12.1)||1,127 (22.8)|