R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models
R-CoT.mp4
2024.10.18
🎉🎉🎉 We source the GeoMM dataset.2024.10.19
🎉🎉🎉 We source the model weights for the R-CoT-8B, R-CoT-7B, and R-CoT-2B, as well as the evaluation code.2024.10.21
🎉🎉🎉 We source the training code.2024.10.23
🎉🎉🎉 We release the paper R-CoT.
You can download the training and testing data used by R-CoT from R-CoT_Data.
Examples of GeoMM:
Model Name | Vision Part | Language Model | Transformers (HF) | MathVista(Geo) | GeoQA |
---|---|---|---|---|---|
R-CoT-8B | InternViT‑300M‑448px | internlm2_5‑7b‑chat | 🤗R-CoT-8B | 75.0 | 75.1 |
R-CoT-7B | EVA-CLIP | InternLM-Chat-7B | 🤗R-CoT-7B | 62.5 | 68.2 |
R-CoT-2B | InternViT‑300M‑448px | internlm2-chat-1_8b | 🤗R-CoT-2B | 57.7 | 62.6 |
R-CoT-Qwen | Vit-BigG | Qwen-7B | 🤗R-CoT-Qwen | 50.5 | 57.0 |
conda create -n rcot python=3.9 -y
conda activate rcot
pip install -r requirements.txt
pip install flash-attn==2.3.6 --no-build-isolation
pip install --upgrade deepspeed
pip install torchvision==0.16.0
pip install torch==2.1.0
pip install transformers==4.32.0
pip install torch_npu==2.1.0
Needs to be added in a training script (e.g. finetune.py):
import torch_npu
from torch_npu.contrib import transfer_to_npu
Replace --bp16 with --fp16 in sh scripts and weight config files.
You need to download the test image MathVista_test.zip. Unzip and rename it to "images" and place it in the path MathVista_eval/data.
We give the response generation scripts for the different models, they start with "generate_response_geo", here R-CoT-7B is used as an example:
cd MathVista_eval/evaluation
python generate_response_geo_rcot7b.py -output_dir ../results --output_file output_bard.json --checkpoint weight_path
Extract the short answer text for score calculation:
python extract_answer.py --output_dir ../results --output_file output_bard.json
Calculate the final score:
python calculate_score.py --output_dir ../results --output_file output_bard.json --score_file scores.json
You need to download the test image GeoQA_test.zip. Unzip and rename it to "test" and place it in the path GeoQA_test/images/test. Generate responses from the model:
cd GeoQA_test
python model_vqa.py --checkpoint weight_path
Run automatic evaluation to calculate the accuracy:
python geo_acc_calculate.py --predictions_file path-to-output-file
The json file used for R-CoT training can be downloaded at Link. Please change the image path in the json file to your path and put the image under your path.
For R-CoT-8B: You need to place the downloaded 'rcot8b_rcot2b_training_json' under the path set in 'shell/data/rcot_finetune.json'
cd R-CoT8B-main
sh shell/R-CoT-8B/rcot8b_finetune_full.sh
For R-CoT-7B: You need to place the downloaded 'GeoMM.json' and 'geo170k.json' under the path set in 'data.txt'
cd R-CoT7B-main
sh finetune.sh
For R-CoT-2B: You need to place the downloaded 'rcot8b_rcot2b_training_json' under the path set in 'shell/data/rcot_finetune.json'
cd R-CoT2B-main
sh shell/R-CoT-2B/rcot2b_finetune_full.sh
If you wish to refer to the baseline results published here, please use the following BibTeX entries:
@article{deng2024r,
title={R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models},
author={Deng, Linger and Liu, Yuliang and Li, Bohan and Luo, Dongliang and Wu, Liang and Zhang, Chengquan and Lyu, Pengyuan and Zhang, Ziyang and Zhang, Gang and Ding, Errui and others},
journal={arXiv preprint arXiv:2410.17885},
year={2024}
}
R-CoT focuses on generating high-quality mathematical inference data to improve the inference performance of models. R-CoT is based on QwenVL, InternVL2, and InternLM-XC2. Thanks to Qwen-VL, InternVL, InternLM-XC2 and LLaVA.
R-CoT project is intended for non-commercial use only. For commercial inquiries or to explore more advanced versions of the R-CoT series LMMs, please contact us at ylliu@hust.edu.cn.