[PROJECT PAGE] [PAPER Download] [PPT]
Pytorch implementation for CVPR 2022 paper "Explore Spatio-temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline".
We endeavor on a rarely explored task named Insubstantial Object Detection (IOD), which aims to localize the object with following characteristics:
(1) amorphous shape with indistinct boundary;
(2) similarity to surroundings;
(3) absence in color;
Accordingly, it is far more challenging to distinguish insubstantial objects in a single static frame and the collaborative representation of spatial and temporal information is crucial. Thus, we construct an IOD-Video dataset comprised of 600 videos (141,017 frames) covering various distances, sizes, visibility, and scenes captured by different spectral ranges. You can preview the IOD-Video Dataset with NJU Box and download it from dataset pages.
For the basic usage, You can simply install the following dependencies:
conda create -n IOD python=3.9
conda activate IOD
conda install pytorch=1.8 torchvision cudatoolkit -c pytorch
pip install -r pip-list.txt
Please refer to Installation.md for more information.
Download Frames.zip
and PKL_Annotations.zip
to the data
folder as
IOD-Video
|------data
| |------TLGDM
| |------TrueLeakedGas.pkl
| |------TrueLeakedGas_ACT1.pkl
| |------TrueLeakedGas_c1_290.pkl
| |------TrueLeakedGas_v1_310.pkl
| |------Frames
| |------TrueLeakedGas
| |------001_wild_dynamic_vague
| |------...
Please refer to Dataset.md for more information.
Download TEA_STA_K8S3_model_last.pth to ./src_IOD/experiment/result_model/TEA_STA_K8S3/TEA_STA_K8S3_model_last.pth
and then run
cd src_IOD/vis
python3 vis_det.py --vname 001_wild_static_vague.avi
Please refer to Visualization.md for more information.
Download models from Google Drive,
Baidu Cloud,(code:buac
) and NJU Box and put them to the
right places. For example, put TEA_STA_K8S1_model_last.pth to ./src_IOD/experiment/result_model/TEA_STA_K8S1/TEA_STA_K8S1_model_last.pth
and run
#inference
python3 det.py --task normal --K 8 --gpus 0,1 --batch_size 20 --master_batch 10 --num_workers 2 --rgb_model ../experiment/result_model/TEA_STA_K8S1/TEA_STA_K8S1_model_last.pth --inference_dir ../result/inference_TLGDM_pkl1 --dataset IODVideo --split 1 --arch TEAresnet_50
#@0.5 mAP
python3 ACT.py --pkl_ACT 1 --task frameAP --K 8 --th 0.5 --inference_dir ../result/inference_TLGDM_pkl1 --dataset IODVideo --split 1
#@0.5 @0.75 @vague @clear @0.5-0.95mAP
bash ACT_total1.sh 8 1
Please refer to Evaluation.md for more information.
You can train K=8 with TEA+STAloss as
python3 train.py --K 8 --exp_id Train_K8_Imagenet_TLGDM_STA_S1 --rgb_model ../experiment/result_model/TEA_STA_K8S1/ --batch_size 16 --master_batch 8 --lr 5e-4 --gpus 0,1 --num_workers 4 --num_epochs 12 --lr_step 6,8 --dataset IODVideo --split 1 --arch TEAresnet_50 --pretrain_model imagenet
Please refer to Train.md for more information.
Part of the code is adapted from previous works: MOC, CenterNet (code base), ACT (evaluation), I3D, S3D, TAM, MSNet, TSM, TEA, TIN, TDN (backbone), thanks for their awesome repos.
@inproceedings{IOD-Video-CVPR2022,
title = {Explore Spatio-temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline},
author = {Zhou, Kailai and Wang, Yibo and Lv, Tao and Li, Yunqian and Chen, Linsen and Shen, Qiu and Cao, Xun},
booktitle = CVPR,
year = {2022}
}