Vim-F: Visual State Space Model Benefiting from Learning in the Frequency Domain

Introduction

This project is based on Vim (paper, code) and we appreciate this excellent work. You can simply replace main.py and models_mamba.py with our versions to reimplement our work. Among them, main.py has no substantial modifications, and only the code related to position embedding in the original file has been removed to fit our work.

ImageNet classification

Pre-training

V1

Model	Dataset	Resolution	Top1	Ckpt/Logs
Vim-Ti-F(H)	ImgNet 1K	224×224	76.0	ckpt/log
Vim-S-F(H)	ImgNet 1K	224×224	80.5	ckpt/log

V2

Model	Dataset	Resolution	Top1	Ckpt/Logs
Vim-Ti-F	ImgNet 1K	224×224	76.7	ckpt/log
Vim-S-F	ImgNet 1K	224×224	80.9	retraining

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Vim-F: Visual State Space Model Benefiting from Learning in the Frequency Domain

Introduction

ImageNet classification

Pre-training

V1

V2

Files

README.md

Latest commit

History

README.md

File metadata and controls

Vim-F: Visual State Space Model Benefiting from Learning in the Frequency Domain

Introduction

ImageNet classification

Pre-training

V1

V2