Official PyTorch implementation for
A Coefficient Makes SVRG Effective
Yida Yin, Zhiqiu Xu, Zhiyuan Li,Trevor Darrell, Zhuang Liu
UC Berkeley, University of Pennsylvania, Toyota Technological Institute at Chicago, and Meta AI Research
We introduce
Train Loss
CIFAR-100 | Pets | Flowers | STL-10 | Food-101 | DTD | SVHN | EuroSAT | |
---|---|---|---|---|---|---|---|---|
baseline | 2.66 | 2.20 | 2.40 | 1.64 | 2.45 | 1.98 | 1.59 | 1.25 |
SVRG | 2.94 | 3.42 | 2.26 | 1.90 | 3.03 | 2.01 | 1.64 | 1.25 |
|
2.62 | 1.96 | 2.16 | 1.57 | 2.42 | 1.83 | 1.57 | 1.23 |
Validation Accuracy
CIFAR-100 | Pets | Flowers | STL-10 | Food-101 | DTD | SVHN | EuroSAT | |
---|---|---|---|---|---|---|---|---|
baseline | 81.0 | 72.8 | 80.8 | 82.3 | 85.9 | 57.9 | 94.9 | 98.1 |
SVRG | 78.2 | 17.6 | 82.6 | 65.1 | 79.6 | 57.8 | 95.7 | 97.9 |
|
81.4 | 77.8 | 83.3 | 84.0 | 85.9 | 61.8 | 95.8 | 98.2 |
Train Loss
ConvNeXt-F | ViT-T | Swin-F | Mixer-S | ViT-B | ConvNeXt-B | |
---|---|---|---|---|---|---|
baseline | 3.487 | 3.443 | 3.427 | 3.635 | 2.817 | 2.644 |
SVRG | 3.505 | 3.431 | 3.389 | 3.776 | 2.309 | 3.113 |
|
3.467 | 3.415 | 3.392 | 3.609 | 2.806 | 2.642 |
Validation Accuracy
ConvNeXt-F | ViT-T | Swin-F | Mixer-S | ViT-B | ConvNeXt-B | |
---|---|---|---|---|---|---|
baseline | 76.0 | 73.9 | 74.3 | 71.0 | 81.6 | 83.7 |
SVRG | 75.7 | 74.3 | 74.3 | 68.8 | 78.0 | 80.8 |
|
76.3 | 74.2 | 74.8 | 70.5 | 81.6 | 83.1 |
Please check INSTALL.md for installation instructions.
We list commands for convnext_femto
and vit_base
with coefficient 0.75
.
- For training other models, change
--model
accordingly, e.g., tovit_tiny
,convnext_base
,vit_base
. - For using different coefficients, change
--coefficient
accordingly, e.g., to1
,0.5
. --use_cache_svrg
can be enabled on smaller models provided with sufficient memory and disabled on larger models.- Our results of smaller models on ImageNet-1K were produced with 4 nodes, each with 8 gpus. Our results of smaller models on ImageNet-1K were produced with 8 nodes, each with 8 gpus. Our results of ConvNeXt-Femto on small datasets were produced with 8 gpus.
Below we give example commands for both smaller models and larger models on ImageNet-1K and ConvNeXt-Femto on small datasets.
Smaller models
python run_with_submitit.py --nodes 4 --ngpus 8 \
--model convnext_femto --epochs 300 \
--batch_size 128 --lr 4e-3 \
--use_svrg true --coefficient 0.75 --svrg_schedule linear --use_cache_svrg true \
--data_path /path/to/data/ --data_set IMNET \
--output_dir /path/to/results/
Larger models
python run_with_submitit.py --nodes 8 --ngpus 8 \
--model vit_base --epochs 300 \
--batch_size 64 --lr 4e-3 \
--use_svrg true --coefficient 0.75 --svrg_schedule linear \
--data_path /path/to/data/ --data_set IMNET \
--output_dir /path/to/results/
ConvNeXt-Femto on small datasets
- Fill in
epochs
,warmup_epochs
, andbatch_size
based ondata_set
. - Note that
batch_size
is the batch size for each gpu.
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model convnext_femto --epochs $epochs --warmup_epochs $warmup_epochs \
--batch_size $batch_size --lr 4e-3 \
--use_svrg true --coefficient 0.75 --svrg_schedule linear --use_cache_svrg true \
--data_path /path/to/data/ --data_set $data_set \
--output_dir /path/to/results/
single-GPU
python main.py --model convnext_femto --eval true \
--resume /path/to/model \
--data_path /path/to/data
multi-GPU
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model convnext_femto --eval true \
--resume /path/to/model \
--data_path /path/to/data
This repository is built using the timm library and ConvNeXt codebase.
If you find this repository helpful, please consider citing:
@article{yin2023coefficient,
title={A Coefficient Makes SVRG Effective},
author={Yida Yin and Zhiqiu Xu and Zhiyuan Li and Trevor Darrell and Zhuang Liu},
year={2023},
journal={arXiv preprint arXiv:2311.05589},
}