Skip to content

ggg0919/cantor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

Project Page | Paper

We propose an inspiring multimodal CoT framework named Cantor, which features a perceptual decision architecture that effectively integrates visual context and logical reasoning to solve visual reasoning tasks.

overview

Getting Started

1. Installation

Git clone our repository and creating Gemini environment:

git clone https://github.com/ggg0919/cantor
cd cantor
pip install -q -U google-generativeai

2. Run Cantor Demo

python3 demo.py --query "Which month is the hottest on average in Detroit?" --image_path ./images/image.png --api_key "your Gemini's key"

--query: Quetion
--image_path: Image path
--api_key: Your Gemini key

ToDo

  • Release the data and evaluation code on ScienceQA.
  • Release the data and evaluation code on MathVista.

Cases

overview

Citation

@article{gao2024cantor,
  title={Cantor: Inspiring Multimodal Chain-of-Thought of MLLM},
  author={Gao, Timin and Chen, Peixian and Zhang, Mengdan and Fu, Chaoyou and Shen, Yunhang and Zhang, Yan and Zhang, Shengchuan and Zheng, Xiawu and Sun, Xing and Cao, Liujuan and Ji, Rongrong},
  journal={arXiv preprint arXiv:2404.16033},
  year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published