Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

The official implementation of Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Introduction

TL;DR Arch-Net is a family of neural networks made up of simple and efficient operators. When a Arch-Net is produced, less common network constructs, like Layer Normalization and Embedding Layers, are eliminated in a progressive manner through label-free Blockwise Model Distillation, while performing sub-eight bit quantization at the same time to maximize performance. For the classification task, only 30k unlabeled images randomly sampled from ImageNet dataset is needed.

Main Results

ImageNet Classification

Model	Bit Width	Top1	Top5
Arch-Net_Resnet18	32w32a	69.76	89.08
Arch-Net_Resnet18	2w4a	68.77	88.66
Arch-Net_Resnet34	32w32a	73.30	91.42
Arch-Net_Resnet34	2w4a	72.40	91.01
Arch-Net_Resnet50	32w32a	76.13	92.86
Arch-Net_Resnet50	2w4a	74.56	92.39
Arch-Net_MobilenetV1	32w32a	68.79	88.68
Arch-Net_MobilenetV1	2w4a	67.29	88.07
Arch-Net_MobilenetV2	32w32a	71.88	90.29
Arch-Net_MobilenetV2	2w4a	69.09	89.13

Multi30k Machine Translation

Model	translation direction	Bit Width	BLEU
Transformer	English to Gemany	32w32a	32.44
Transformer	English to Gemany	2w4a	33.75
Transformer	English to Gemany	4w4a	34.35
Transformer	English to Gemany	8w8a	36.44
Transformer	Gemany to English	32w32a	30.32
Transformer	Gemany to English	2w4a	32.50
Transformer	Gemany to English	4w4a	34.34
Transformer	Gemany to English	8w8a	34.05

Dependencies

python == 3.6

refer to requirements.txt for more details

Data Preparation

Download ImageNet and multi30k data(google drive or BaiduYun, code: 8brd) and put them in ./arch-net/data/ as follow:

./data/
├── imagenet
│   ├── train
│   ├── val
├── multi30k

Download teacher models at google drive or BaiduYun(code: 57ew) and put them in ./arch-net/models/teacher/pretrained_models/

Get Started

ImageNet Classification (take archnet_resnet18 as an example)

train and evaluate

cd ./train_imagenet

python3 -m torch.distributed.launch --nproc_per_node=8 train_archnet_resnet18.py  -j 8 --weight-bit 2 --feature-bit 4 --lr 0.001 --num_gpus 8 --sync-bn

evaluate if you already have the trained models

python3 -m torch.distributed.launch --nproc_per_node=8 train_archnet_resnet18.py  -j 8 --weight-bit 2 --feature-bit 4 --lr 0.001 --num_gpus 8 --sync-bn --evaluate

Machine Translation

train a arch-net_transformer of 2w4a

cd ./train_transformer

python3 train_archnet_transformer.py --translate_direction en2de --teacher_model_path ../models/teacher/pretrained_models/transformer_en_de.chkpt --data_pkl ../data/multi30k/m30k_ende_shr.pkl --batch_size 48 --final_epochs 50 --weight_bit 2 --feature_bit 4 --lr 1e-3 --weight_decay 1e-6 --label_smoothing

for arch-net_transformer of 8w8a, use the lr of 1e-3 and the weight decay of 1e-4

evaluate

cd ./evaluate

python3 translate.py --data_pkl ./data/multi30k/m30k_ende_shr.pkl --model path_to_the_outptu_directory/model_max_acc.chkpt

to get the BLEU of the evaluated results, go to this website, and then upload 'predictions.txt' in the output directory and the 'gt_en.txt' or 'gt_de.txt' in ./arch-net/data_gt/multi30k/

Citation

If you find this project useful for your research, please consider citing the paper.

@misc{xu2021archnet,
      title={Arch-Net: Model Distillation for Architecture Agnostic Model Deployment}, 
      author={Weixin Xu and Zipeng Feng and Shuangkang Fang and Song Yuan and Yi Yang and Shuchang Zhou},
      year={2021},
      eprint={2111.01135},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Acknowledgements

attention-is-all-you-need-pytorch

LSQuantization

pytorch-mobilenet-v1

Contact

If you have any questions, feel free to open an issue or contact us at xuweixin02@megvii.com.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Introduction

Main Results

Dependencies

Data Preparation

Get Started

ImageNet Classification (take archnet_resnet18 as an example)

Machine Translation

Citation

Acknowledgements

Contact

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data_gt/multi30k		data_gt/multi30k
datasets		datasets
evaluate		evaluate
models		models
operators		operators
train_imagenet		train_imagenet
train_transformer		train_transformer
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

megvii-research/Arch-Net

Folders and files

Latest commit

History

Repository files navigation

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Introduction

Main Results

Dependencies

Data Preparation

Get Started

ImageNet Classification (take archnet_resnet18 as an example)

Machine Translation

Citation

Acknowledgements

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages