A simple baseline for one-shot multi-object tracking:
A Simple Baseline for Multi-Object Tracking,
Yifu Zhang, Chunyu Wang, Xinggang Wang, Wenjun Zeng, Wenyu Liu,
arXiv technical report (arXiv 2004.01888)
There has been remarkable progress on object detection and re-identification in recent years which are the core components for multi-object tracking. However, little attention has been focused on accomplishing the two tasks in a single network to improve the inference speed. The initial attempts along this path ended up with degraded results mainly because the re-identification branch is not appropriately learned. In this work, we study the essential reasons behind the failure, and accordingly present a simple baseline to addresses the problems. It remarkably outperforms the state-of-the-arts on the MOT challenge datasets at 30 FPS. We hope this baseline could inspire and help evaluate new ideas in this field.
Dataset | MOTA | IDF1 | IDS | MT | ML | FPS |
---|---|---|---|---|---|---|
2DMOT15 | 59.0 | 62.2 | 582 | 45.6% | 11.5% | 30.5 |
MOT16 | 68.7 | 70.4 | 953 | 39.5% | 19.0% | 25.9 |
MOT17 | 67.5 | 69.8 | 2868 | 37.7% | 20.8% | 25.9 |
MOT20 | 58.7 | 63.7 | 6013 | 66.3% | 8.5% | 13.2 |
All of the results are obtained on the MOT challenge evaluation server under the “private detector” protocol. We rank first among all the trackers on 2DMOT15, MOT17 and the recently released (2020.02.29) MOT20. Note that our IDF1 score remarkably outperforms other one-shot MOT trackers by more than 10 points. The tracking speed of the entire system can reach up to 30 FPS.
- Clone this repo, and we'll call the directory that you cloned as ${FAIRMOT_ROOT}
- Install dependencies. We use python 3.7 and pytorch >= 1.2.0
conda create -n FairMOT
conda activate FairMOT
conda install pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=10.0 -c pytorch
cd ${FAIRMOT_ROOT}
pip install -r requirements.txt
cd src/lib/models/networks/DCNv2 sh make.sh
- We use DCNv2 in our backbone network and more details can be found in their repo.
- In order to run the code for demos, you also need to install ffmpeg.
We use the same training data as JDE. Please refer to their DATA ZOO to download and prepare all the training data including Caltech Pedestrian, CityPersons, CUHK-SYSU, PRW, ETHZ, MOT17 and MOT16.
2DMOT15 and MOT20 can be downloaded from the official webpage of MOT challenge. After downloading, you should prepare the data in the following structure:
MOT15
|——————images
| └——————train
| └——————test
└——————labels_with_ids
└——————train(empty)
MOT20
|——————images
| └——————train
| └——————test
└——————labels_with_ids
└——————train(empty)
Then, you can change the seq_root and label_root in src/gen_labels_15.py and src/gen_labels_20.py and run:
cd src
python gen_labels_15.py
python gen_labels_20.py
to generate the labels of 2DMOT15 and MOT20. The seqinfo.ini files of 2DMOT15 can be downloaded here [Google], [Baidu],code:8o0w.
- Pretrained models
DLA-34 COCO pretrained model: DLA-34 official. HRNetV2 ImageNet pretrained model: HRNetV2-W18 official, HRNetV2-W32 official. After downloading, you should put the pretrained models in the following structure:
${FAIRMOT_ROOT}
└——————models
└——————ctdet_coco_dla_2x.pth
└——————hrnetv2_w32_imagenet_pretrained.pth
└——————hrnetv2_w18_imagenet_pretrained.pth
- Baseline model
Our baseline FairMOT model can be downloaded here: DLA-34: [Google] [Baidu, code: 88yn]. HRNetV2_W18: [Google] [Baidu, code: z4ft]. After downloading, you should put the baseline model in the following structure:
${FAIRMOT_ROOT}
└——————models
└——————all_dla34.pth
└——————all_hrnet_v2_w18.pth
└——————...
- Download the training data
- Change the dataset root directory 'root' in src/lib/cfg/data.json and 'data_dir' in src/lib/opts.py
- Run:
sh experiments/all_dla34.sh
- The default settings run tracking on the validation dataset from 2DMOT15. Using the DLA-34 baseline model, you can run:
cd src
python track.py mot --load_model ../models/all_dla34.pth --conf_thres 0.6
to see the tracking results (76.1 MOTA using the DLA-34 baseline model). You can also set save_images=True in src/track.py to save the visualization results of each frame.
Using the HRNetV2-W18 baseline model, you can run:
cd src
python track.py mot --load_model ../models/all_hrnet_v2_w18.pth --conf_thres 0.6 --arch hrnet_18 --reid_dim 128
to see the tracking results (76.6 MOTA using the HRNetV2-W18 baseline model).
- To get the txt results of the test set of MOT16 or MOT17, you can run:
cd src
python track.py mot --test_mot17 True --load_model ../models/all_dla34.pth --conf_thres 0.4
python track.py mot --test_mot16 True --load_model ../models/all_dla34.pth --conf_thres 0.4
and send the txt files to the MOT challenge evaluation server to get the results.
- To get the SOTA results of 2DMOT15 and MOT20, you need to finetune the baseline model on the specific dataset because our training set do not contain them. You can run:
sh experiments/ft_mot15_dla34.sh
sh experiments/ft_mot20_dla34.sh
and then run the tracking code:
cd src
python track.py mot --test_mot15 True --load_model your_mot15_model.pth --conf_thres 0.3
python track.py mot --test_mot20 True --load_model your_mot20_model.pth --conf_thres 0.3 --K 500
Results of the test set all need to be evaluated on the MOT challenge server. You can see the tracking results on the training set by setting --val_motxx True and run the tracking code. We set 'conf_thres' 0.4 for MOT16 and MOT17. We set 'conf_thres' 0.3 for 2DMOT15 and MOT20. You can also use the SOTA MOT20 pretrained model here [Google], [Baidu],code:mqnz:
python track.py mot --test_mot20 True --load_model ../models/mot20_dla34.pth --reid_dim 128 --conf_thres 0.3 --K 500
You can input a raw video and get the demo video by running src/demo.py and get the mp4 format of the demo video:
cd src
python demo.py mot --load_model ../models/all_dla34.pth --conf_thres 0.4
You can change --input-video and --output-root to get the demos of your own videos.
If you have difficulty building DCNv2 and thus cannot use the DLA-34 baseline model, you can run the demo with the HRNetV2_w18 baseline model:
cd src
python demo.py mot --load_model ../models/all_hrnet_v2_w18.pth --arch hrnet_18 --reid_dim 128 --conf_thres 0.4
--conf_thres can be set from 0.3 to 0.7 depending on your own videos.
A large part of the code is borrowed from Zhongdao/Towards-Realtime-MOT and xingyizhou/CenterNet. Thanks for their wonderful works.
@article{zhang2020simple,
title={A Simple Baseline for Multi-Object Tracking},
author={Zhang, Yifu and Wang, Chunyu and Wang, Xinggang and Zeng, Wenjun and Liu, Wenyu},
journal={arXiv preprint arXiv:2004.01888},
year={2020}
}