[TASK] MME-SCI Benchmark #878

Xian-Gao · 2025-10-27T07:17:44Z

Hello.

This PR adds a new benchmark MME-SCI. MME-SCI is a comprehensive multimodal benchmark designed to evaluate the scientific reasoning capabilities of Multimodal Large Language Models (MLLMs). It addresses key limitations of existing benchmarks by focusing on multilingual adaptability, comprehensive modality coverage, and fine-grained knowledge point annotation.

Arxiv: https://www.arxiv.org/abs/2508.13938

Data: https://huggingface.co/datasets/JCruan/MME-SCI

Thank you!

kcz358 · 2025-10-28T01:48:45Z

lmms_eval/tasks/mme_sci/MME-SCI-README_EN.md

This file should be move into your task folder

kcz358 · 2025-10-28T01:52:31Z

lmms_eval/tasks/mme_sci/mme_sci.yaml

+dataset_path: "parquet"
+dataset_kwargs:
+  data_dir: "~/.cache/huggingface/datasets"
+  data_files:
+    - "~/.cache/huggingface/datasets/datasets--JCruan--MME-SCI/snapshots/local_snapshot/mmesci_1019_zh.parquet"


Can actually config your hub parquet file to make this load from load_dataset("JCruan/MME-SCI", split="xxx"), can check here for reference. Otherwise people might need to hardcode the local_snapshot path

kcz358 · 2025-10-28T01:57:35Z

lmms_eval/tasks/mme_sci_image/utils.py

+        img.save(buffered, format="PNG")
+        img_b64 = base64.b64encode(buffered.getvalue()).decode("utf-8")
+        content.append({
+            "type": "image",
+            "url": f"data:image/png;base64,{img_b64}"
+        })


Does this work in most of the case when using the chat model? I am not sure if our protocol handles the base64 format correctly haha. When I design it I was expecting the url value is a pillow image.

kcz358 · 2025-10-28T01:59:03Z

lmms_eval/tasks/mme_sci/run_eval.sh

Can put the two run script into your task folder or in the examples folder. Thanks!

kcz358 · 2025-11-03T09:27:09Z

lmms_eval/tasks/mme_sci/run_judge_saglang.py

Looks like this the run judge sglang are mostly hardcoded? The model path, mem fraction and input output file etc. Wonder if this can further improved. Saw that you are using the SGLangLauncher from the lmms-eval, is it possible to integrate this into the utils.py?

kcz358 · 2025-11-03T09:29:29Z

Hi, most of the part LGTM, just wondering if you can merge the sglang launcher using the launcher args and put the scoring logic into the utils. Thanks!

lmms-eval/lmms_eval/__main__.py

Lines 98 to 102 in a468e57

    
           parser.add_argument( 
        
               "--launcher_args", 
        
               default=None, 
        
               help="String arguments for launcher for local llm as judge, e.g. `tp=8`, if None then no launcher will be used.", 
        
           )

add MME-SCI

5199285

kcz358 reviewed Oct 28, 2025

View reviewed changes

lmms_eval/tasks/mme_sci/MME-SCI-README_EN.md

Copy link

Collaborator

kcz358 Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should be move into your task folder

kcz358 reviewed Oct 28, 2025

View reviewed changes

lmms_eval/tasks/mme_sci/run_eval.sh

Copy link

Collaborator

kcz358 Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can put the two run script into your task folder or in the examples folder. Thanks!

update config and path

419408d

kcz358 reviewed Nov 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TASK] MME-SCI Benchmark #878

[TASK] MME-SCI Benchmark #878

Uh oh!

Xian-Gao commented Oct 27, 2025

Uh oh!

kcz358 Oct 28, 2025

Uh oh!

kcz358 Oct 28, 2025

Uh oh!

kcz358 Oct 28, 2025

Uh oh!

kcz358 Oct 28, 2025

Uh oh!

kcz358 Nov 3, 2025

Uh oh!

kcz358 commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[TASK] MME-SCI Benchmark #878

Are you sure you want to change the base?

[TASK] MME-SCI Benchmark #878

Uh oh!

Conversation

Xian-Gao commented Oct 27, 2025

Uh oh!

kcz358 Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

kcz358 Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

kcz358 Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

kcz358 Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

kcz358 Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

kcz358 commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants