[Benchmark] Support MATH-Vision #292

scikkk · 2024-07-18T15:48:00Z

Measuring Multimodal Mathematical Reasoning with the MATH-Vision🔥 Dataset

[🌐 Homepage] [🤗 Huggingface Dataset] [📊 Leaderboard ] [🔍 Visualization] [📖 ArXiv Paper]

👀 Introduction

Recent advancements in Large Multimodal Models (LMMs) have shown promising results in mathematical reasoning within visual contexts, with models approaching human-level performance on existing benchmarks such as MathVista. However, we observe significant limitations in the diversity of questions and breadth of subjects covered by these benchmarks. To address this issue, we present the MATH-Vision (MATH-V) dataset, a meticulously curated collection of 3,040 high-quality mathematical problems with visual contexts sourced from real math competitions. Spanning 16 distinct mathematical disciplines and graded across 5 levels of difficulty, our dataset provides a comprehensive and diverse set of challenges for evaluating the mathematical reasoning abilities of LMMs.

📈 Evaluation

# MATH-Vision
torchrun --nproc-per-node=1  run.py --data MATH_V --model your_model --verbose

# MATH-Vision tesimini
torchrun --nproc-per-node=1  run.py --data MATH_V_MINI --model your_model --verbose

* [Benchmark] Support MATH-Vision * update url * Fix download_file * update MATH_V md5 * fix MathVision * fix lint --------- Co-authored-by: Ke Wang <wangk.gm@gmail.com> Co-authored-by: kennymckormick <dhd@pku.edu.cn>

iskewang and others added 6 commits July 18, 2024 23:39

[Benchmark] Support MATH-Vision

bfea5d0

update url

ee6edd5

Fix download_file

0a06aac

update MATH_V md5

1e22228

fix MathVision

3c55e82

fix lint

cdea062

kennymckormick merged commit 24f7def into open-compass:main Jul 19, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Benchmark] Support MATH-Vision #292

[Benchmark] Support MATH-Vision #292

scikkk commented Jul 18, 2024

[Benchmark] Support MATH-Vision #292

[Benchmark] Support MATH-Vision #292

Conversation

scikkk commented Jul 18, 2024

Measuring Multimodal Mathematical Reasoning with the MATH-Vision🔥 Dataset

👀 Introduction

📈 Evaluation