This repo is a PyTorch implementation of applying MogaNet to unsupervised video prediction with SimVP on Moving MNIST. The code is based on SimVPv2 (or its latest version OpenSTL). It is worth noticing that the Translator module in SimVP can be replaced by any MetaFormer block, which can benchmark the video prediction performance of MetaFormers. For more details, see Efficient Multi-order Gated Aggregation Network (ICLR 2024).
Install SimVPv2 with pipe as follow. It can also be installed with environment.yml
.
python setup.py develop
Prepare Moving MNIST with script according to the guidelines.
Notes: All the models are trained 200 and 2000 epochs by Adam optimizer and Onecycle learning rate scheduler. The trained models can also be downloaded by Baidu Cloud (z8mf) at MogaNet/MMNIST_VP
. The params (M) and FLOPs (G) are measured by non_dist_train.py by setting --fps
. Please refer to video_benchmarks for the full results.
Architecture | Setting | Params | FLOPs | FPS | MSE | MAE | SSIM | PSNR | Download |
---|---|---|---|---|---|---|---|---|---|
IncepU (SimVPv1) | 200 epoch | 58.0M | 19.4G | 209 | 32.15 | 89.05 | 0.9268 | 21.84 | model | log |
gSTA (SimVPv2) | 200 epoch | 46.8M | 16.5G | 282 | 26.69 | 77.19 | 0.9402 | 22.78 | model | log |
ViT | 200 epoch | 46.1M | 16.9G | 290 | 35.15 | 95.87 | 0.9139 | 21.67 | model | log |
Swin Transformer | 200 epoch | 46.1M | 16.4G | 294 | 29.70 | 84.05 | 0.9331 | 22.22 | model | log |
Uniformer | 200 epoch | 44.8M | 16.5G | 296 | 30.38 | 85.87 | 0.9308 | 22.13 | model | log |
MLP-Mixer | 200 epoch | 38.2M | 14.7G | 334 | 29.52 | 83.36 | 0.9338 | 22.22 | model | log |
ConvMixer | 200 epoch | 3.9M | 5.5G | 658 | 32.09 | 88.93 | 0.9259 | 21.93 | model | log |
Poolformer | 200 epoch | 37.1M | 14.1G | 341 | 31.79 | 88.48 | 0.9271 | 22.03 | model | log |
ConvNeXt | 200 epoch | 37.3M | 14.1G | 344 | 26.94 | 77.23 | 0.9397 | 22.74 | model | log |
VAN | 200 epoch | 44.5M | 16.0G | 288 | 26.10 | 76.11 | 0.9417 | 22.89 | model | log |
HorNet | 200 epoch | 45.7M | 16.3G | 287 | 29.64 | 83.26 | 0.9331 | 22.26 | model | log |
MogaNet | 200 epoch | 46.8M | 16.5G | 255 | 25.57 | 75.19 | 0.9429 | 22.99 | model | log |
IncepU (SimVPv1) | 2000 epoch | 58.0M | 19.4G | 209 | 21.15 | 64.15 | 0.9536 | 23.99 | model | log |
gSTA (SimVPv2) | 2000 epoch | 46.8M | 16.5G | 282 | 15.05 | 49.80 | 0.9675 | 25.97 | model | log |
ViT | 2000 epoch | 46.1M | 16.9.G | 290 | 19.74 | 61.65 | 0.9539 | 24.59 | model | log |
Swin Transformer | 2000 epoch | 46.1M | 16.4G | 294 | 19.11 | 59.84 | 0.9584 | 24.53 | model | log |
Uniformer | 2000 epoch | 44.8M | 16.5G | 296 | 18.01 | 57.52 | 0.9609 | 24.92 | model | log |
MLP-Mixer | 2000 epoch | 38.2M | 14.7G | 334 | 18.85 | 59.86 | 0.9589 | 24.58 | model | log |
ConvMixer | 2000 epoch | 3.9M | 5.5G | 658 | 22.30 | 67.37 | 0.9507 | 23.73 | model | log |
Poolformer | 2000 epoch | 37.1M | 14.1G | 341 | 20.96 | 64.31 | 0.9539 | 24.15 | model | log |
ConvNeXt | 2000 epoch | 37.3M | 14.1G | 344 | 17.58 | 55.76 | 0.9617 | 25.06 | model | log |
VAN | 2000 epoch | 44.5M | 16.0G | 288 | 16.21 | 53.57 | 0.9646 | 25.49 | model | log |
HorNet | 2000 epoch | 45.7M | 16.3G | 287 | 17.40 | 55.70 | 0.9624 | 25.14 | model | log |
MogaNet | 2000 epoch | 46.8M | 16.5G | 255 | 15.67 | 51.84 | 0.9661 | 25.70 | model | log |
- Video Benchmarks and visualizations.
- Weather Benchmarks and visualizations.
- Traffic Benchmarks and visualizations.
We train the model on a single GPU by default (a batch size of 16 for SimVP). Start training with the bash script as:
python tools/non_dist_train.py -d mmnist -m SimVP --model_type moga -c configs/mmnist/simvp/SimVP_MogaNet.py --lr 1e-3 --ex_name mmnist_simvp_moga
We test the trained model on a single GPU with the bash script as:
python tools/non_dist_test.py -d mmnist -m SimVP --model_type moga -c configs/mmnist/simvp/SimVP_MogaNet.py --ex_name /path/to/exp_name
If you find this repository helpful, please consider citing:
@inproceedings{iclr2024MogaNet,
title={Efficient Multi-order Gated Aggregation Network},
author={Siyuan Li and Zedong Wang and Zicheng Liu and Cheng Tan and Haitao Lin and Di Wu and Zhiyuan Chen and Jiangbin Zheng and Stan Z. Li},
booktitle={International Conference on Learning Representations},
year={2024}
}
Our segmentation implementation is mainly based on the following codebases. We gratefully thank the authors for their wonderful works.