Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP

This is a Pytorch implementation of our ACMMM2022 paper. We have presented a new gating unit PoSGU which replace the FC layer in SGU of gMLP with relative positional encoding methods (Spercifically, LRPE and GQPE) and used it as the key building block to develop a new vision MLP architecture referred to as the PosMLP. We also hope this work will inspire further theoretical study of positional encoding in vision MLPs and could have a mature application as in vision Transformers.

Our code is based on the pytorch-image-models, attention-cnn, swim-transformer,vision-Permutator

Comparison with Recent MLP-like Models

Model	Parameters	Image resolution	Top 1 Acc.	Download
gMLP-S	20M	224	79.6%
Hire-MLP-S	33M	224	81.8%
ViP-Small/7	25M	224	81.5%
PosMLP-T	21M	224	82.1%	百度云盘/GoogleDrive
S2-MLP-deep	51M	224	80.7%
Mixer-B/16	59M	224	78.5%
ViP-Medium/7	55M	224	82.7%
AS-MLP-S	50M	224	83.1%
PosMLP-S	37M	224	83.0%	released soon
gMLP-B	73M	224	81.6%
ResMLP-B24	116M	224	81.0%
ViP-Large/7	88M	224	83.2%
Hire-MLP-L	96M	224	83.4%
PosMLP-B	82M	224	83.6%

The experiments are conducted on 8 RTX 3090 gpus.

Requirements

torch>=1.4.0
torchvision>=0.5.0
pyyaml
timm==0.4.5
apex if you use 'apex amp'

data prepare: ImageNet with the following folder structure, you can extract imagenet by this script. Please update the data folder path in config files.

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Training

Command line for training PosMLP-T on 4 GPUs (RTX 3090)

bash scripts/distributed_train.sh

validation

Please download the checkpoint from above here and specify the data and model paths in the script, and test with command

CUDA_VISIBLE_DEVICES=0 bash scripts/test.sh

License

This repository is released under the MIT License as found in the LICENSE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP

Comparison with Recent MLP-like Models

Requirements

Training

validation

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP

Comparison with Recent MLP-like Models

Requirements

Training

validation

License