In VOT, the ability to predict motion trajectories hinges heavily on context, particularly in scenarios teeming with similar object instances. For instance, consider a situation where objects change rapidly. The traditional VOT methods would falter due to their sole reliance on the target area in the initial frame. ProContEXT, in stark contrast, leverages both temporal (changes over time) and spatial (relative positions) contexts, significantly enhancing tracking precision.
Fig. 1: The fast-changing and crowded scenes are omnipresent in visual object tracking. It's evident that harnessing the temporal and spatial context in video sequences is fundamental to precise tracking.
Fig. 2: A comprehensive visualization of the Progressive Context Encoding Transformer Tracker (ProContEXT) framework.
ProContEXT showcases state-of-the-art (SOTA) performance across a range of benchmarks.
Tracker | GOT-10K (AO) | TrackingNet (AUC) |
---|---|---|
ProContEXT | 74.6 | 84.6 |
For setting up the environment and data preparation, you can follow instructions from OSTrack.
Execute the command below to designate paths for this project:
python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir ./output
Use models from MAE as your pre-trained models. Arrange them as:
${PROJECT_ROOT}
-- pretrained_models
| -- mae_pretrain_vit_base.pth
For model training, run:
python tracking/train.py --script procontext --config procontext_got10k --save_dir ./output --mode multiple --nproc_per_node 4 # GOT-10k model
python tracking/train.py --script procontext --config procontext --save_dir ./output --mode multiple --nproc_per_node 4 # TrackingNet model
Download the trained model and organize them as:
${PROJECT_ROOT}/output/checkpoints/train/procontext/procontext_got10k/ProContEXT_ep0100.pth.tar # GOT-10k model
${PROJECT_ROOT}/output/checkpoints/train/procontext/procontext/ProContEXT_ep0300.pth.tar # TrackingNet model
For testing, execute:
python tracking/test.py procontext procontext_got10k --dataset got10k_test --threads 16 --num_gpus 4
python tracking/test.py procontext procontext --dataset trackingnet --threads 16 --num_gpus 4
We owe a significant portion of our implementation to foundational works like OSTrack, Stark, pytracking, and Timm. A sincere note of gratitude to their authors for their invaluable contributions. We are also grateful to Alibaba Group's DAMO Academy for their invaluable support.
If our repository aids your research, kindly reference our work:
@inproceedings{lan2023procontext,
title={Procontext: Exploring progressive context transformer for tracking},
author={Lan, Jin-Peng and Cheng, Zhi-Qi and He, Jun-Yan and Li, Chenyang and Luo, Bin and Bao, Xu and Xiang, Wangmeng and Geng, Yifeng and Xie, Xuansong},
booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={1--5},
year={2023},
organization={IEEE}
}
The codebase is available under the MIT license. More details can be found in the LICENSE file.