Coupled Knowledge Distillation CKD is a simple and effective training framework that pursues common styles of different modalities to break modality gap for high performance RGBT tracking. Importantly, CKD does not introduce additional computational cost in the inference process.
In fact, our training process is divided into two steps; if you don't need the Multi-modal Candidate token Elimination (MCE) module, ignore the second step.
-
We adopt OSTrack-RGBT as our base tracker. For our best result, we need to load the parameter from DropMAE. Then train with CKD like the above image.
-
MCE
module could accelerate the inference process, but need to further fintune the tracking head to maintain the proformance. Note that, this stage not adopt CKD and need frozen the backbone.
In the inference process, we only need two student branch and their tracking head. So it is just fast as our baseline.
Model | Checkpoint and Raw result | PR/SR | MACs(G) | FPS |
---|---|---|---|---|
CKD w/o CE | download | 72.3/57.4 | 57.802 | 84.8 |
CKD w/ CE DropMAE | download | 73.0/58.0 | 42.735 | 96.4 |
CKD w/ MCE DropMAE | download | 73.2/58.1 | 42.735 | 96.4 |
Please kindly cite this paper in your publications if it helps your research:
@inproceedings{
lu2024breaking,
title={Breaking Modality Gap in {RGBT} Tracking: Coupled Knowledge Distillation},
author={Andong Lu and Jiacong Zhao and Chenglong Li and Yun Xiao and Bin Luo},
booktitle={ACM Multimedia 2024},
year={2024},
url={https://openreview.net/forum?id=2jzyYyRqX0}
}
Contact: adlu_ah@foxmail.com