Skip to content

Latest commit

 

History

History

Adaptive Frequency Filters

Adaptive Frequency Filters As Efficient Global Token Mixers (ICCV 2023)

This repo is the official implementation of "Adaptive Frequency Filters As Efficient Global Token Mixers", by Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Zheng-Jun Zha, Yan Lu, Baining Guo

AFFNet is a lightweight neural network designed for efficient deployment on mobile devices, achieving superior accuracy and efficiency trade-offs compared to other lightweight network designs on a wide range of visual tasks, including visual recognition and dense prediction tasks. AFFNet, AFFNet-T and AFFNet-ET achieve 79.8%, 77.0% and 73.0% top-1 accuracy on ImageNet-1K dataset.

AFFNet

Models/logs/configs {#models-head}

ImageNet-1K

name size acc@1(%) #params FLOPs download
AFFNet-ET 256 $\times$ 256 73.0 1.4M 0.4G model/log/config
AFFNet-T 256 $\times$ 256 77.0 2.6M 0.8G model/log/config
AFFNet 256 $\times$ 256 79.8 5.5M 1.5G model/log/config

ADE20K

name size mIOU(%) #params download
AFFNet-ET + deeplab 256 $\times$ 256 33.0 2.2M model/log/config
AFFNet-T + deeplab 256 $\times$ 256 36.9 3.5M model/log/config
AFFNet + deeplab 256 $\times$ 256 38.4 6.9M model/log/config

VOC

name size mIOU(%) #params download
AFFNet-ET + deeplab 256 $\times$ 256 76.1 2.2M model/log/config
AFFNet-T + deeplab 256 $\times$ 256 77.8 3.5M model/log/config
AFFNet + deeplab 256 $\times$ 256 80.5 6.9M model/log/config

Install

  1. Clone the repository:
git clone https://github.com/microsoft/TokenMixers.git
cd TokenMixers/AFFNet/
  1. Prepare the base enviroment, we use ubuntu20, python3.8, and cuda11.5. 8 A100 GPUs are used for training and evaluation.

  2. Install required packages:

conda create -fyn AFFNet python=3.8
conda activate AFFNet
python -m pip install wandb ptflops einops
python -m pip install -r requirements.txt
python -m pip install psutil torchstat tqdm
python -m pip install --upgrade fvcore
python -m pip install complexPyTorch

Data preparation

Download the standard ImageNet-1K dataset from http://image-net.org, ADE20K dataset from https://groups.csail.mit.edu/vision/datasets/ADE20K/, and VOC dataset from http://host.robots.ox.ac.uk/pascal/VOC/ and construct the data like:

Dataset_Root  
├── ImageNet  
│   ├── train  
│   │   ├── n01440764  
│   │   │   ├── n01440764_10026.JPEG  
│   │   │   ├── n01440764_10027.JPEG  
│   │   │   ├── ...  
│   │   ├── ...  
│   ├── val  
│   │   ├── n02093754  
│   │   │   ├── ILSVRC2012_val_00000832.JPEG  
│   │   │   ├── ILSVRC2012_val_00003267.JPEG  
│   │   │   ├── ...  
│   │   ├── ...  
├── ADEChallengeData2016  
│   ├── annotations  
│   ├── images  
│   ├── objectinfo150.txt  
│   ├── sceneCategories.txt  
├── VOCdevkit  
    ├── rec_data  
    ├── VOC2007  
    ├── VOC2012  

Training

run the following command to train the model on 8 A100 GPUs Node:

python main_train.py --log-wandb --common.config-file <config_path> --common.results-loc <save_path>

replace the <config_path> with the path of the config file (you can get from here ), and <save_path> with the path to save the model and log files.

Evaluation

run the following command to evaluate the model on 8 A100 GPUs Node:

python main_eval.py --common.config-file <config_path>  --common.results-loc <save_path> --model.classification.pretrained <model_path>

replace the <config_path> with the path of the config file (you can get from here ), <save_path> with the path to save the model and log files, and <model_path>(you can get from here) with the path of the pretrained model.

Citing

If you find this code and work useful, please consider citing the following paper and star this repo. Thank you very much!

@inproceedings{huang2023adaptive,
  title={Adaptive Frequency Filters As Efficient Global Token Mixers},
  author={Huang, Zhipeng and Zhang, Zhizheng and Lan, Cuiling and Zha, Zheng-Jun and Lu, Yan and Guo, Baining},
  booktitle={ICCV},
  year={2023}
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.