This repo contains the supported code and configuration files for SegDistill .It is based on mmsegmentaion.
conda create -n mmcv python=3.8 -y
conda activate mmcv
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
pip install mmcv-full==1.2.2 -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html
pip install future tensorboard
pip install IPython
pip install attr
pip install timm
git clone https://github.com/wzpscott/SegDistill.git -b main
cd SegDistill
pip install -e .
We conducted experiments on ADE20k dataset. The training and validation set of ADE20K could be download from this link. Test set can be download from here. After downloading the dataset, you need to arrange the structure of your dataset like:
mmsegmentation
├── mmseg
├── tools
├── configs
├── data
│ ├── ade
│ │ ├── ADEChallengeData2016
│ │ │ ├── annotations
│ │ │ │ ├── training
│ │ │ │ ├── validation
│ │ │ ├── images
│ │ │ │ ├── training
│ │ │ │ ├── validation
│ ├── ...
See here for more instructions on data preparation.
We provide links to pretrained weights of models used in the paper.
Model | Pretrained on ImageNet-1K | Trained on ADE20k |
---|---|---|
Segformer | link | link |
Swin-Transformer | link | link |
PSPNet | link | link |
We use mmcv-fashion configs to control the KD process.
Run an example config with the following command:
bash tools/dist_train.sh distillation_configs/example_config.py {num_gpu}
See here for detailed instructions for custom KD process on various network architectures.
Our Channel Group Distillation (CGD) considers a more extensive range of correlations inthe activation map and works well fortransformer structures than previous KD methods.
Comparison to Other KD methods
Results on ADE20k
Qualitative segmentation results on ADE20k produced from Segformer B0: (a) raw images, (b) ground truth (GT), (c) outputof the original student model (d) Channel-wise Distillation (CD) and (e) Channel Group Distillation(CGD)