paper: https://arxiv.org/abs/2109.01903
Robust fine-tuning of CLIP ViT-B/16 model on ImageNet with single 8-GPU machine:
Robust fintuning of ViT-B/16 on ImageNet with single 8-GPU machine:
python -m torch.distributed.launch --nproc_per_node=8 examples/imageclassification/imagenet/wiseft/main.py \
--data_dir=$ImageNetDataDir \
--model=clip_vit_base_patch16_224 \
--epochs=10 \
--workers=8 \
--batch-size=64 \
--lr=0.00003 \
--weight-decay=0.1 \
--opt=adamw \
--opt-eps=1e-8 \
--sched=cosine \
--clip-grad=1.0 \
--pin-mem \
--output=output/wiseft \
--experiment=tmp
Model1 | Model2 |
---|---|
zeroshot.pt | 9.pt |