MOT Challenge

Video not available.

Rendering of new sequences is currently deactivated due to heavy load.

Rendering of new sequences is currently deactivated due to heavy load.

Rendering of new sequences is currently deactivated due to heavy load.

Rendering of new sequences is currently deactivated due to heavy load.

Rendering of new sequences is currently deactivated due to heavy load.

Rendering of new sequences is currently deactivated due to heavy load.

Rendering of new sequences is currently deactivated due to heavy load.

Rendering of new sequences is currently deactivated due to heavy load.

Rendering of new sequences is currently deactivated due to heavy load.

Rendering of new sequences is currently deactivated due to heavy load.

Rendering of new sequences is currently deactivated due to heavy load.

Rendering of new sequences is currently deactivated due to heavy load.

Rendering of new sequences is currently deactivated due to heavy load.

Rendering of new sequences is currently deactivated due to heavy load.

Rendering of new sequences is currently deactivated due to heavy load.

Rendering of new sequences is currently deactivated due to heavy load.

Rendering of new sequences is currently deactivated due to heavy load.

Rendering of new sequences is currently deactivated due to heavy load.

Rendering of new sequences is currently deactivated due to heavy load.

Rendering of new sequences is currently deactivated due to heavy load.

Rendering of new sequences is currently deactivated due to heavy load.

Benchmark:

MOT17 |

Short name:

DEEP_TAMA

Detector:

Public

Description:

In online multi-target tracking, it is of great importance to model appearance and geometric similarity between pedestrians which have been tracked and appeared in a new frame. The dimension of the inherent feature vector in the appearance model is higher than that in the geometric model, which causes many problems in general. However, the recent success of deep learning-based methods makes it possible to handle high dimensional appearance information successfully. Among many deep networks, the Siamese network with triplet loss is popularly
adopted as an appearance feature extractor. Since the Siamese network can extract features of each input independently, it is possible to update and maintain target-specific features. However, it is not suitable for multi-target settings that require comparison with other inputs. In this paper, to address this issue, we propose a novel track appearance model based on the joint-inference network. The proposed method enables a comparison of two inputs to be used for adaptive appearance modeling, and contributes to disambiguating the process of target-observation matching and consolidating identity consistency. Diverse experimental results support the effectiveness of our method. Our work has been awarded as a 3rd-highest tracker on MOTChallenge19, held in CVPR2019.1 The code is available on https://github.com/yyc9268/Deep-TAMA.

Reference:

Y. Yoon, D. Kim, Y. Song, K. Yoon, M. Jeon. Online Multiple Pedestrians Tracking using Deep Temporal Appearance Matching Association. In Information Sciences, 2020.

Last submitted:

February 07, 2019 (6 years ago)

Published:

February 07, 2019 at 02:56:08 CET

Submissions:

Project page / code:

https://github.com/yyc9268/Deep-TAMA

Open source:

Hardware:

3.7 GHZ, 1 Core(no GPU)

Runtime:

1.5 Hz

Benchmark performance:

Sequence	MOTA	IDF1	HOTA	MT	ML	FP	FN	Rcll	Prcn	AssA	DetA	AssRe	AssPr	DetRe	DetPr	LocA	FAF	ID Sw.	Frag
MOT17	50.3	53.5	42.0	453 (19.2)	883 (37.5)	25,479	252,996	55.2	92.4	43.3	41.0	46.9	73.1	43.8	73.4	79.7	1.4	2,192 (39.7)	3,978 (72.1)

Detailed performance:

Sequence	MOTA	IDF1	HOTA	MT	ML	FP	FN	Rcll	Prcn	AssA	DetA	AssRe	AssPr	DetRe	DetPr	LocA	FAF	ID Sw.	Frag
MOT17-01-DPM	38.7	43.9	32.9	4	11	112	3,823	40.7	95.9	37.1	29.3	39.9	68.9	30.3	71.3	76.6	0.2	20	48
MOT17-01-FRCNN	28.3	40.9	36.5	8	7	1,549	3,051	52.7	68.7	39.1	34.5	42.8	74.4	42.9	55.9	78.4	3.4	26	45
MOT17-01-SDP	44.3	49.5	39.9	9	4	1,202	2,354	63.5	77.3	39.7	40.5	42.3	72.1	49.1	59.8	78.2	2.7	35	65
MOT17-03-DPM	54.2	53.5	40.3	29	29	2,331	45,407	56.6	96.2	38.7	42.1	40.9	75.6	44.2	75.0	79.2	1.6	248	520
MOT17-03-FRCNN	59.6	57.8	45.0	40	25	2,311	39,826	62.0	96.6	43.4	47.1	46.3	75.0	49.5	77.2	80.5	1.5	186	329
MOT17-03-SDP	75.9	70.6	55.5	76	14	1,486	23,515	77.5	98.2	52.5	58.9	56.4	74.7	62.0	78.6	81.2	1.0	183	488
MOT17-06-DPM	43.2	55.1	41.0	34	109	758	5,859	50.3	88.7	47.7	35.7	54.0	66.4	38.4	67.8	76.6	0.6	82	164
MOT17-06-FRCNN	47.3	54.9	43.2	52	69	1,369	4,732	59.8	83.7	43.5	43.3	53.5	61.3	48.6	68.1	79.0	1.1	115	208
MOT17-06-SDP	52.2	58.8	45.3	74	69	1,314	4,206	64.3	85.2	46.2	44.8	56.8	61.4	50.6	67.0	78.4	1.1	108	191
MOT17-07-DPM	38.5	43.6	32.1	5	26	538	9,777	42.1	93.0	33.6	30.9	35.6	69.4	32.3	71.3	77.0	1.1	79	166
MOT17-07-FRCNN	31.7	39.4	31.3	3	15	2,011	9,373	44.5	78.9	32.3	31.5	35.5	63.2	35.3	62.5	76.7	4.0	150	290
MOT17-07-SDP	46.4	49.2	37.6	14	18	1,097	7,859	53.5	89.2	36.7	38.7	40.3	67.0	42.0	70.1	78.7	2.2	96	192
MOT17-08-DPM	26.5	30.4	26.0	9	39	475	14,978	29.1	92.8	29.2	23.2	31.1	75.2	24.0	76.7	82.3	0.8	70	71
MOT17-08-FRCNN	23.1	30.3	27.3	7	39	892	15,280	27.7	86.8	34.6	21.7	37.0	77.1	22.9	71.7	81.8	1.4	65	83
MOT17-08-SDP	32.6	36.6	32.4	11	34	511	13,626	35.5	93.6	37.8	28.0	39.9	76.4	29.1	76.9	81.8	0.8	103	136
MOT17-12-DPM	41.0	50.2	38.8	19	45	263	4,816	44.4	93.6	45.3	33.3	47.7	76.3	35.0	73.8	79.8	0.3	37	48
MOT17-12-FRCNN	36.5	50.0	39.0	15	45	635	4,838	44.2	85.8	46.3	33.1	50.1	75.9	36.0	69.9	80.4	0.7	28	37
MOT17-12-SDP	40.8	54.4	43.4	21	42	826	4,271	50.7	84.2	51.2	36.9	55.0	77.4	41.2	68.4	81.3	0.9	35	45
MOT17-14-DPM	24.9	35.8	25.9	6	104	482	13,330	27.9	91.4	32.4	20.7	35.1	67.2	21.5	70.7	78.1	0.6	64	107
MOT17-14-FRCNN	16.9	34.3	27.3	7	74	3,308	11,803	36.1	66.9	30.6	24.9	35.7	56.2	28.8	53.3	74.0	4.4	243	410
MOT17-14-SDP	32.4	44.9	32.3	10	65	2,009	10,272	44.4	80.3	34.6	30.4	39.1	63.5	34.0	61.5	76.2	2.7	219	335

Raw data:

download

SUT

DEEP_TAMA

AFN17

DEEP_TAMA: Online Multiple Pedestrian Tracking with Deep Temporal Appearance Matching Association