Skip to content

Latest commit

 

History

History
97 lines (80 loc) · 4.02 KB

README.md

File metadata and controls

97 lines (80 loc) · 4.02 KB

Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP

This is a Pytorch implementation of our ACMMM2022 paper. We have presented a new gating unit PoSGU which replace the FC layer in SGU of gMLP with relative positional encoding methods (Spercifically, LRPE and GQPE) and used it as the key building block to develop a new vision MLP architecture referred to as the PosMLP. We also hope this work will inspire further theoretical study of positional encoding in vision MLPs and could have a mature application as in vision Transformers.

Our code is based on the pytorch-image-models, attention-cnn, swim-transformer,vision-Permutator

Comparison with Recent MLP-like Models

Model Parameters Image resolution Top 1 Acc. Download
gMLP-S 20M 224 79.6%
Hire-MLP-S 33M 224 81.8%
ViP-Small/7 25M 224 81.5%
PosMLP-T 21M 224 82.1% 百度云盘/GoogleDrive
S2-MLP-deep 51M 224 80.7%
Mixer-B/16 59M 224 78.5%
ViP-Medium/7 55M 224 82.7%
AS-MLP-S 50M 224 83.1%
PosMLP-S 37M 224 83.0% released soon
gMLP-B 73M 224 81.6%
ResMLP-B24 116M 224 81.0%
ViP-Large/7 88M 224 83.2%
Hire-MLP-L 96M 224 83.4%
PosMLP-B 82M 224 83.6%

The experiments are conducted on 8 RTX 3090 gpus.

Requirements

torch>=1.4.0
torchvision>=0.5.0
pyyaml
timm==0.4.5
apex if you use 'apex amp'

data prepare: ImageNet with the following folder structure, you can extract imagenet by this script. Please update the data folder path in config files.

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Training

Command line for training PosMLP-T on 4 GPUs (RTX 3090)

bash scripts/distributed_train.sh

validation

Please download the checkpoint from above here and specify the data and model paths in the script, and test with command

CUDA_VISIBLE_DEVICES=0 bash scripts/test.sh

License

This repository is released under the MIT License as found in the LICENSE file.