This is the official Pytorch/PytorchLightning implementation of the paper:
Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks
Jierun Chen, Shiu-hong Kao, Hao He, Weipeng Zhuo, Song Wen, Chul-Ho Lee, S.-H. Gary Chan
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
We propose a simple yet fast and effective partial convolution (PConv), as well as a latency-efficient family of architectures called FasterNet.
Create an new conda virtual environment
conda create -n fasternet python=3.9.12 -y
conda activate fasternet
Clone this repo and install required packages:
git clone https://github.com/JierunChen/FasterNet
cd FasterNet/
pip install -r requirements.txt
Download the ImageNet-1K classification dataset and structure the data as follows:
/path/to/imagenet-1k/
train/
class1/
img1.jpeg
class2/
img2.jpeg
val/
class1/
img3.jpeg
class2/
img4.jpeg
name | resolution | acc | #params | FLOPs | model |
---|---|---|---|---|---|
FasterNet-T0 | 224x224 | 71.9 | 3.9M | 0.34G | model |
FasterNet-T1 | 224x224 | 76.2 | 7.6M | 0.85G | model |
FasterNet-T2 | 224x224 | 78.9 | 15.0M | 1.90G | model |
FasterNet-S | 224x224 | 81.3 | 31.1M | 4.55G | model |
FasterNet-M | 224x224 | 83.0 | 53.5M | 8.72G | model |
FasterNet-L | 224x224 | 83.5 | 93.4M | 15.49G | model |
We give an example evaluation command for a ImageNet-1K pre-trained FasterNet-T0 on a single GPU:
python train_test.py -c cfg/fasternet_t0.yaml \
--checkpoint_path model_ckpt/fasternet_t0-epoch=281-val_acc1=71.9180.pth \
--data_dir ../../data/imagenet --test_phase -g 1 -e 125
- For evaluating other model variants, change
-c
,--checkpoint_path
accordingly. You can get the pre-trained models from the tables above. - For multi-GPU evaluation, change
-g
to a larger number or a list, e.g.,8
or0,1,2,3,4,5,6,7
. Note that the batch size for evaluation should be changed accordingly, e.g., change-e
from125
to1000
.
To measure the latency on CPU/ARM and throughput on GPU (if any), run
python train_test.py -c cfg/fasternet_t0.yaml \
--checkpoint_path model_ckpt/fasternet_t0-epoch=281-val_acc1=71.9180.pth \
--data_dir ../../data/imagenet --test_phase -g 1 -e 32 --measure_latency --fuse_conv_bn
-e
controls the batch size of input on GPU while the batch size of input is fixed internally to 1 on CPU/ARM.
FasterNet-T0 training on ImageNet-1K with a 8-GPU node:
python train_test.py -g 0,1,2,3,4,5,6,7 --num_nodes 1 -n 4 -b 4096 -e 2000 \
--data_dir ../../data/imagenet --pin_memory --wandb_project_name fasternet \
--model_ckpt_dir ./model_ckpt/$(date +'%Y%m%d_%H%M%S') --cfg cfg/fasternet_t0.yaml
To train other FasterNet variants, --cfg
need to be changed. You may also want to change the training batch size -b
.
This repository is built using the timm , poolformer, ConvNeXt and mmdetection repositories.
If you find this repository helpful, please consider citing:
@article{chen2023run,
title={Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks},
author={Chen, Jierun and Kao, Shiu-hong and He, Hao and Zhuo, Weipeng and Wen, Song and Lee, Chul-Ho and Chan, S-H Gary},
journal={arXiv preprint arXiv:2303.03667},
year={2023}
}