Reproducible scaling laws for contrastive language-image learning [arXiv]
by Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, Jenia Jitsev [arXiv:2212.07143] (Accepted at CVPR 2023)
Work still in progress. In this repository, we will provide the code for reproducing the experiments on large-scale CLIP pre-training and transfer to various downstream tasks for the paper "Reproducible scaling laws for contrastive language-image learning".
Stay tuned.
Until finalized, you may check
- the OpenCLIP repository that points to the pre-trained models used in this study
- the LAION-400m and LAION-5B composition instructions, the datasets used for openCLIP pre-training in this study
- CLIP Benchmarking, transfer evaluation used in this study
To reproduce scaling plots from the paper, see the figures notebook.
First, you need to clone the repo and install the requirements.
git clone https://github.com/LAION-AI/scaling-laws-openclip
cd scaling-laws-openclip
pip install -r requirements.txt
We provide a script, download_models.py
, to download all pre-trained models used in the paper.
To download all the 29 models used in the paper, use :
python download_models.py
You can also download a subset of the models. For instance:
python download_models.py --samples_seen 3B 13B --model ViT-B-32 --data 80M 400M 2B
will only download ViT-B/32 models with samples seen of 3B or 13B, trained on any of 80M/400M/2B LAION datasets.
Once you download the pre-trained models, you can also use them in OpenCLIP. Following is an example with ViT-H/14.
First, you need to download the model:
> python download_models.py --samples_seen 34B --model ViT-H-14 --data 2B
'Model-H-14_Data-2B_Samples-34B_lr-5e-4_bs-79k.pt' downloaded.
Once the model is downloaded, it is possible to directly use it in OpenCLIP:
import torch
import open_clip
model, _, preprocess = open_clip.create_model_and_transforms('ViT-H-14', pretrained='Model-H-14_Data-2B_Samples-34B_lr-5e-4_bs-79k.pt')
For a complete example, see the inference notebook.
If you find this work helpful, please cite our paper:
@article{cherti2022reproducible,
title={Reproducible scaling laws for contrastive language-image learning},
author={Cherti, Mehdi and Beaumont, Romain and Wightman, Ross and Wortsman, Mitchell and Ilharco, Gabriel and Gordon, Cade and Schuhmann, Christoph and Schmidt, Ludwig and Jitsev, Jenia},
journal={arXiv preprint arXiv:2212.07143},
year={2022}
}