Skip to content

Latest commit

 

History

History
53 lines (39 loc) · 1.99 KB

README.md

File metadata and controls

53 lines (39 loc) · 1.99 KB

TextGPT4V: Enhancing Visual Instruction Tuning with GPT-4V: Advancements in Text-Rich Image Understanding

⭐️ To support the hard work, consider leaving a star !


Official Release TextGPT4V: Enhancing Visual Instruction Tuning with GPT-4V: Advancements in Text-Rich Image Understanding. Add a little bit of body text

Highlights

  • 30K GPT4-Vision-generated captions.
  • A superior vision language model specilizing in visual text reasoning, TextGPT4V-7B
  • An Image Caption Pipeline -- AWS Based, approaching GPT4-Vision's caption capability.

Release

[2023/11/25] TextGPT4V dataset: paper and project page are released!

Todo-List

  • Release TextGPT4v dataset
  • Release TextGPT4v Model finetuned LLaVa
  • Checkpoints of TextGPT4v-7B
  • GPT4V Prompting AWS Infrastructure

Model Zoo

To be released

Usage

To be released

Data Preparation

Our captions data are available at TextGPT4v in the JSON format.

Acknowledgments

  • LLaVA: the dataset is constructed in relation to LLaVa, and intentended to be used on this model, it could ofcourse be used on any other VLM.

Citation

If you find our work useful for your research or applications, please cite using this BibTeX:

@misc{chen2023sharegpt4v,
      title={TextGPT4V: Enhancing Visual Instruction Tuning with GPT-4V: Advancements in Text-Rich Image Understanding}, 
      author={Itay Etelis and David Sarne and Avi Rosenfels},
      year={2023},
      eprint={},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}