Transformer Knowledge Distillation for Efficient Semantic Segmentation [arxiv]
We propose the structural framework, TransKD, to distill the knowledge from feature maps and patch embeddings of vision transformers. TransKD enables the non-pretrained vision transformers perform on-par with the pretrained ones.
(a)-(c) Knowledge distillation in computer vision is split into three categories: response-based knowledge distillation, feature-based knowledge distillation, and relation-based knowledge distillation. (d) TransKD extracts the relation-based knowledge of feature maps and transformer-specific patch embedding knowledge at each stage.Environment: create a conda environment and activate it
conda create -n TransKD python=3.6
conda activate TransKD
Additional python pachages: poly scheduler and
pytorch == 1.7.1+cu92
torchvision == 0.8.2+cu92
mmsegmentation == 0.15.0
mmcv-full == 1.3.10
numpy
visdom
Or simply:
pip install -r requirements.txt
Datasets:
- Cityscapes: download
gtFine_trainvaltest.zip
andleftImg8bit_trainvaltest.zip
from cityscapes official website, then prepare the 19-class label with thecreateTrainIdLabelImgs.py
from cityscapesscripts. - ACDC: download
gt_trainval.zip
andrgb_anon_trainvaltest.zip
from ACDC official website.
Download the following weights to the folder \outputs
.
Cityscapes:
Network | #Params(M) | GFLOPs | mIoU(%) | weight |
---|---|---|---|---|
Teacher(B2) | 27.36 | 113.84 | 76.49 | Google Drive |
Student(B0) | 3.72 | 13.67 | 55.86 | Google Drive |
+TransKD-Base | 4.56 | 16.47 | 68.58 | Google Drive |
+TransKD-GL | 5.22 | 16.80 | 68.87 | Google Drive |
+TransKD-EA | 5.53 | 17.84 | 68.98 | Google Drive |
ACDC:
Network | mIoU(%) | weight |
---|---|---|
Teacher(B2) | 69.34 | Google Drive |
Student(B0) | 46.26 | Google Drive |
+TransKD-Base | 58.56 | Google Drive |
+TransKD-GL | 58.13 | Google Drive |
+TransKD-EA | 59.09 | Google Drive |
Download pretrained weights (SegFormer and PVTv2) to the folder \train\ckpt_pretained\
.
cd train
CUDA_VISIBLE_DEVICES=0 python TransKDBase.py --dataset cityscapes --dataset /path/to/cityscapes #--dataset ACDC --dataset /path/to/ACDC
CUDA_VISIBLE_DEVICES=0 python TransKD_GLMixer.py --dataset cityscapes --dataset /path/to/cityscapes #--dataset ACDC --dataset /path/to/ACDC
CUDA_VISIBLE_DEVICES=0 python TransKD_EA.py --dataset cityscapes --dataset /path/to/cityscapes #--dataset ACDC --dataset /path/to/ACDC
Download trained weights (Google Drive) to the folder \outputs
.
cd eval
CUDA_VISIBLE_DEVICES=0 python eval_cityscapes_iou.py --distillation-type TransKDBase
CUDA_VISIBLE_DEVICES=0 python eval_ACDC_iou.py --distillation-type TransKDBase
# distillation-type can be choices=['teacher','student','TransKDBase','TransKD_GL','TransKD_EA']
Download trained weights (Google Drive) to the folder \outputs
.
cd eval
CUDA_VISIBLE_DEVICES=0 python eval_cityscapes_color.py --distillation-type TransKDBase
CUDA_VISIBLE_DEVICES=0 python eval_ACDC_color.py --distillation-type TransKDBase
# distillation-type can be choices=['teacher','student','TransKDBase','TransKD_GL','TransKD_EA']
Our framework is build upon Knowledge Review and Erfnet. Thanks to their superior work!
If you find this repo useful, please consider referencing the following paper [PDF]:
@article{liu2022transkd,
title={TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation},
author={Liu, Ruiping and Yang, Kailun and Roitberg, Alina and Zhang, Jiaming and Peng, Kunyu and Liu, Huayao and Stiefelhagen, Rainer},
journal={arXiv preprint arXiv:2202.13393},
year={2022}
}