PyTorch implementation of MLP-Mixer: An all-MLP Architecture for Vision.
My implementation of MLP-Mixer network. I use the architecture to perform image classification on CIFAR-10. Please note, that no hyperparameter tuning or extensive image augmentation were performed, so there is still a lot of potential for possible improvements.
The whole code can be found in the jupiter notebook.
Convolutional Neural Networks (CNNs) and transformers are two mainstream model architectures currently dominating computer vision and natural language processing. The authors of the paper, however, empirically show that neither convolution nor self-attenion are necessary; in fact, muti-layered perceptrons (MLPs) can also serve as a strong baseline. The authors present MLP-Mixer, an all-MLP mode architecture, that contains two types of layers: a token-mixing layer and a channel-mixing layer. Each of the layers "mix" per-location and per-feature information in the input. MLP-Mixer performs comparably to other state-of-the-art models, such as ViT or EfficientNet.