Visually Dehallucinative Instruction Generation

(CAP2QA) Visually Dehallucinative Instruction Generation [paper]
Sungguk Cha, Jusung Lee, Younghyun Lee and Cheoljong Yang

See also, (IDK) Visually Dehallucinative Instruction Generation: Know What You Don't Know [paper] [github]

CAP2QA

Image-aligned Sentence Level VQA Data

Details

Dataset	Avg. #word Question/Answer	#Image	#Question	Scalable	ImageAligned	Recognition	Description	Reasoning
DAQUAR	11.5/1.1 (word)	1,449	12,468	$\times$	$\checkmark$	$\checkmark$	$\times$	$\times$
VQAv2	6.1/1.2 (word)	200k	1.1M	$\times$	$\checkmark$	$\checkmark$	$\times$	$\times$
OKVQA	8.1/1.3 (word)	14,031	14,055	$\times$	$\times$	$\checkmark$	$\times$	$\checkmark$
LLaVA	10.7/60.7 (sentence)	80,000	221,333	$\checkmark$	$\times$	$\checkmark$	$\checkmark$	$\checkmark$
CAP2QA (Ours)	7.2/5.4 (sentence)	122,906	873,631	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$

Prepare MSCOCO 2017 images. Train/Val splits are preserved.

Citation

If you find CAP2QA useful for your research and applications, please cite using this BibTeX:

@inproceedings{cha2024visually,
      title={Visually Dehallucinative Instruction Generation}, 
      author={Cha, Sungguk and Lee, Jusung and Lee, Younghyun and Yang, Cheoljong},
      booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
      year={2024},
}

Licenses

This work, instructions, used COCO-Caption dataset (CC BY-NC-ND license) for the caption source and ChatGPT (refer OpenAI policies, https://openai.com/policies).

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
instructions		instructions
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visually Dehallucinative Instruction Generation

CAP2QA

Image-aligned Sentence Level VQA Data

Details

Citation

Licenses

About

Releases

Packages

Contributors 2

License

ncsoft/cap2qa

Folders and files

Latest commit

History

Repository files navigation

Visually Dehallucinative Instruction Generation

CAP2QA

Image-aligned Sentence Level VQA Data

Details

Citation

Licenses

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages