DataComp-1B

DataComp-1B is a dataset with 1.4 billion image-text pairs collected from Common Crawl and subsequently filtered. DataComp-1B is derived from CommonPool, as part of DataComp, a benchmark for designing multimodal datasets. DataComp-1B comprises the best performing subset of the xlarge version of CommonPool found by Gadre et al., 2023. See http://datacomp.ai/ and https://arxiv.org/abs/2304.14108 for details.

Downloading DataComp-1B

CommonPool can be downloaded using img2dataset by following the instructions on https://github.com/mlfoundations/datacomp/tree/main#downloading-datacomp-1b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datacomp.md

datacomp.md

DataComp-1B

Downloading DataComp-1B

Files

datacomp.md

Latest commit

History

datacomp.md

File metadata and controls

DataComp-1B

Downloading DataComp-1B