This repo contains the code for the paper: MAMBA, STPN, TDViT, EOVOD
Additionally, we provide archive files of two widely-used datasets, ImageNetVID and GOT-10K. The official links of these datasets are not accessible or deleted. We hope these resources can help future research.
Model | Backbone | AP50 | AP (fast) | AP (med) | AP (slow) | Link |
---|---|---|---|---|---|---|
FasterRCNN | ResNet-101 | 76.7 | 52.3 | 74.1 | 84.9 | model, reference |
SELSA | ResNet-101 | 81.5 | -- | -- | -- | model, reference |
MEGA | ResNet-101 | 82.9 | 62.7 | 81.6 | 89.4 | model, reference |
MAMBA | ResNet-101 | 83.8 | 65.3 | 83.8 | 89.5 | config, model, paper |
STPN | Swin-T | 85.2 | 64.1 | 84.1 | 91.4 | config, model, paper |
The code are tested with the following environments:
- python 3.8
- pytorch 1.10.1
- cuda 11.3
- mmcv-full 1.3.17
conda create --name vfe -y python=3.8
conda activate vfe
# install PyTorch with cuda support
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
# install mmcv-full 1.3.17
pip install mmcv-full==1.3.17 -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.10/index.html
# install other requirements
pip install -r requirements.txt
# install mmpycocotools
pip install mmpycocotools
See here for different versions of MMCV compatible to different PyTorch and CUDA versions.
The original links of ImageNetVID dataset are either broken or unavailible. Here, we provide the new link to download the file for the furture reference of the community. Please download ILSVRC2015 DET and ILSVRC2015 VID datasets from this LINK.
After that, we recommend to symlink the path to the datasets to datasets/
. And the path structure should be as follows:
./data/ILSVRC/
./data/ILSVRC/Annotations/DET
./data/ILSVRC/Annotations/VID
./data/ILSVRC/Data/DET
./data/ILSVRC/Data/VID
./data/ILSVRC/ImageSets
Note: List txt files under ImageSets
folder can be obtained from
here.
We use CocoVID to maintain datasets.
Option 1: Download and uncompress json file generated by us from here.
Option 2: Use following commands to generate annotation files:
# ImageNet DET
python ./tools/convert_datasets/ilsvrc/imagenet2coco_det.py -i ./data/ILSVRC -o ./data/ILSVRC/annotations
# ImageNet VID
python ./tools/convert_datasets/ilsvrc/imagenet2coco_vid.py -i ./data/ILSVRC -o ./data/ILSVRC/annotations
This section will show how to test existing models on supported datasets. The following testing environments are supported:
- single GPU
- single node multiple GPU
During testing, different tasks share the same API and we only support samples_per_gpu = 1
.
You can use the following commands for testing:
# single-gpu testing
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]
# multi-gpu testing
./tools/dist_test.sh ${CONFIG_FILE} ${GPU_NUM} [--checkpoint ${CHECKPOINT_FILE}] [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]
Optional arguments:
CHECKPOINT_FILE
: Filename of the checkpoint. You do not need to define it when applying some MOT methods but specify the checkpoints in the config.RESULT_FILE
: Filename of the output results in pickle format. If not specified, the results will not be saved to a file.EVAL_METRICS
: Items to be evaluated on the results. Allowed values depend on the dataset, e.g.,bbox
is available for ImageNet VID,track
is available for LaSOT,bbox
andtrack
are both suitable for MOT17.--cfg-options
: If specified, the key-value pair optional cfg will be merged into config file--eval-options
: If specified, the key-value pair optional eval cfg will be kwargs for dataset.evaluate() function, it’s only for evaluation--format-only
: If specified, the results will be formatted to the official format.
Assume that you have already downloaded the checkpoints to the directory work_dirs/
.
-
Test MAMBA on ImageNet VID, and evaluate the bbox mAP.
python tools/test.py configs/vid/mamba/mamba_r101_dc5_6x.py \ --checkpoint work_dirs/mamba_r101_dc5_6x/epoch_6_model.pth \ --out results.pkl \ --eval bbox
-
Test MAMBA with 8 GPUs on ImageNet VID, and evaluate the bbox mAP.
./tools/dist_test.sh configs/vid/mamba/mamba_r101_dc5_6x.py 8 \ --checkpoint work_dirs/mamba_r101_dc5_6x/epoch_6_model.pth \ --out results.pkl \ --eval bbox
python tools/train.py ${CONFIG_FILE} [optional arguments]
During training, log files and checkpoints will be saved to the working directory, which is specified by work_dir
in the config file or via CLI argument --work-dir
.
We provide tools/dist_train.sh
to launch training on multiple GPUs.
The basic usage is as follows.
bash ./tools/dist_train.sh \
${CONFIG_FILE} \
${GPU_NUM} \
[optional arguments]
-
Train MAMBA on ImageNet VID and ImageNet DET with single GPU, then evaluate the bbox mAP at the last epoch.
python tools/train.py configs/vid/mamba/mamba_r101_dc5_6x.py
-
Train MAMBA on ImageNet VID and ImageNet DET with 8 GPUs, then evaluate the bbox mAP at the last epoch.
./tools/dist_train.sh configs/vid/mamba/mamba_r101_dc5_6x.py 8
The codebase is implemented based on two popular open-source repos: mmdetection and mmtracking in PyTorch.