Skip to content

Latest commit

 

History

History
68 lines (58 loc) · 3.43 KB

Data.md

File metadata and controls

68 lines (58 loc) · 3.43 KB

Data

Data file name Size
sharegpt4v_instruct_gpt4-vision_cap100k.json 134 MB
share-captioner_coco_lcs_sam_1246k_1107.json 1.5 GB
sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k.json 1.2 GB

ShareGPT4V Dataset

This dataset is curated from LAION, CC, SBU, SAM, COCO, web-landmark, web-celebrity, wikiart, etc, resulting in total 102K high-quality image-text pairs with the help of powerful GPT4-Vision.

ShareGPT4V-PT Dataset

The pretraining dataset used in this release is a mixture of LAION, CC, SBU, SAM, COCO datasets, resulting in total 1246K image-text pairs with the help of our general ShareCaptioner

SFT Dataset

We replace 23K image-text pairs related to the image captioning task in LLaVA-mix-665K with a equivalent subset in our collected GPT4V-generated high-quality image-text pairs.

Prepare Images

First, download all images we used.

Then, organize the data as follows in projects/ShareGPT4V/data:

ShareGPT4V
├── ...
├── data
│   ├── llava
│   │   ├── llava_pretrain
│   │   │   ├── images
│   ├── coco
│   │   ├── train2017
│   ├── sam
│   │   ├── images
│   ├── gqa
│   │   ├── images
│   ├── ocr_vqa
│   │   ├── images
│   ├── textvqa
│   │   ├── train_images
│   ├── vg
│   │   ├── VG_100K
│   │   ├── VG_100K_2
│   ├── sharegpt4v
│   │   ├── share-captioner_coco_lcs_sam_1246k_1107.json
│   │   ├── sharegpt4v_instruct_gpt4-vision_cap100k.json
│   │   ├── sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k.json
│   ├── share_textvqa
│   │   ├── images
│   ├── web-celebrity
│   │   ├── images
│   ├── web-landmark
│   │   ├── images
│   ├── wikiart
│   │   ├── images
├── ...

Important notice: For the convenience, we provide a zip file for web data. These images must be used for academic purpose.