MOTS

This benchmark extends the traditional Multi-Object Tracking benchmark to a new benchmark defined on a pixel-level with precise segmentation masks. We annotated 8 challenging video sequences (4 training, 4 test) in unconstrained environments filmed with both static and moving cameras. Tracking, segmentation and evaluation are done in image coordinates. All sequences have been annotated with high accuracy on a pixel level, strictly following a well-defined protocol.

Training Set

Sample	Name	FPS	Resolution	Length	Tracks	Boxes	Density	Description	Source	Ref.
	MOTS20-11	30	1920x1080	900 (00:30)	62	8511	9.5	Forward moving camera in a busy shopping mall	link	[1]
	MOTS20-09	30	1920x1080	525 (00:18)	26	4774	9.1	A pedestrian street scene filmed from a low angle.	link	[1]
	MOTS20-05	14	640x480	837 (01:00)	103	6570	7.8	Street scene from a moving platform	link	[1]
	MOTS20-02	30	1920x1080	600 (00:20)	37	7039	11.7	People walking around a large square.	link	[1]
	Total			2862 frm. (128 s.)	228	26894	9.4

Test Set

Sample	Name	FPS	Resolution	Length	Tracks	Boxes	Density	Description	Source	Ref.
	MOTS20-07	30	1920x1080	500 (00:17)	58	12878	25.8	A busy pedestrian street filmed at eye level by a moving camera	link	[1]
	MOTS20-12	30	1920x1080	900 (00:30)	68	6471	7.2	Forward moving camera in a busy shopping mall	link	[1]
	MOTS20-06	14	640x480	1194 (01:25)	190	9814	8.2	Street scene from a moving platform	link	[1]
	MOTS20-01	30	1920x1080	450 (00:15)	12	3106	6.9	People walking around a large square.	link	[1]
	Total			3044 frm. (147 s.)	328	32269	10.6

Download

Get all data (783.5 MB)
Get files (no img) only (11.4 MB)
Development Kit

MOTS

Training Set

Test Set

Download

References: