PyTorch implementation of MLP-Mixer: An all-MLP Architecture for Vision with the function of loading official ImageNet pre-trained parameters.
import torch
import numpy as np
from mlp_mixer import MlpMixer
pretrain_model='./pretrain_models/imagenet21k_Mixer-B_16.npz'
model = MlpMixer(num_classes=10,
num_blocks=12,
patch_size=16,
hidden_dim=768,
tokens_mlp_dim=384,
channels_mlp_dim=3072,
image_size=224
)
# load official ImageNet pre-trained model:
model.load_from(np.load(pretrain_model))
print ('Finish loading the pre-trained model!')
num_param = sum(p.numel() for p in model.parameters()) / 1e6
print ('Total params.: %f M'%num_param)
pred = model(img)
Download the official pre-trained models at https://console.cloud.google.com/storage/mixer_models/.
Hypyer-parameters setting for better fine-tuning:
optim = torch.optim.SGD(param_list,
lr=5e-4,
weight_decay=1e-7,
momentum=0.9,
nesterov=True
)
lr_schdlr = WarmupCosineLrScheduler(optim,
n_iters_all,
warmup_iter=0
)
Using the pre-trained model to fine-tune MLP-Mixer can obtain remarkable improvements (e.g., +10% accuracy on a small dataset).
Note that we can also change the patch_size (e.g., patch_size=8) for inputs with different resolutions, but smaller patch_size may not always bring performance improvements.
@misc{tolstikhin2021mlpmixer,
title={MLP-Mixer: An all-MLP Architecture for Vision},
author={Ilya Tolstikhin and Neil Houlsby and Alexander Kolesnikov and Lucas Beyer and Xiaohua Zhai and Thomas Unterthiner and Jessica Yung and Daniel Keysers and Jakob Uszkoreit and Mario Lucic and Alexey Dosovitskiy},
year={2021},
eprint={2105.01601},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
- The implementation is based on the original paper and the official Tensorflow repo: https://github.com/google-research/vision_transformer.
- It also refers to the re-implementation repo: https://github.com/d-li14/mlp-mixer.pytorch.