GitHub

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

We propose an inspiring multimodal CoT framework named Cantor, which features a perceptual decision architecture that effectively integrates visual context and logical reasoning to solve visual reasoning tasks.

Getting Started

1. Installation

Git clone our repository and creating Gemini environment:

git clone https://github.com/ggg0919/cantor
cd cantor
pip install -q -U google-generativeai

2. Run Cantor Demo

python3 demo.py --query "Which month is the hottest on average in Detroit?" --image_path ./images/image.png --api_key "your Gemini's key"

--query: Quetion
--image_path: Image path
--api_key: Your Gemini key

ToDo

Release the data and evaluation code on ScienceQA.
Release the data and evaluation code on MathVista.

Cases

Citation

@article{gao2024cantor,
  title={Cantor: Inspiring Multimodal Chain-of-Thought of MLLM},
  author={Gao, Timin and Chen, Peixian and Zhang, Mengdan and Fu, Chaoyou and Shen, Yunhang and Zhang, Yan and Zhang, Shengchuan and Zheng, Xiawu and Sun, Xing and Cao, Liujuan and Ji, Rongrong},
  journal={arXiv preprint arXiv:2404.16033},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
data		data
images		images
static		static
utils		utils
.DS_Store		.DS_Store
README.md		README.md
demo.py		demo.py
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

Getting Started

ToDo

Cases

Citation

About

Releases

Packages

Languages

ggg0919/cantor

Folders and files

Latest commit

History

Repository files navigation

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

Getting Started

ToDo

Cases

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages