-
Notifications
You must be signed in to change notification settings - Fork 419
[TASK] MME-SCI Benchmark #878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should be move into your task folder
lmms_eval/tasks/mme_sci/mme_sci.yaml
Outdated
| dataset_path: "parquet" | ||
| dataset_kwargs: | ||
| data_dir: "~/.cache/huggingface/datasets" | ||
| data_files: | ||
| - "~/.cache/huggingface/datasets/datasets--JCruan--MME-SCI/snapshots/local_snapshot/mmesci_1019_zh.parquet" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can actually config your hub parquet file to make this load from load_dataset("JCruan/MME-SCI", split="xxx"), can check here for reference. Otherwise people might need to hardcode the local_snapshot path
| img.save(buffered, format="PNG") | ||
| img_b64 = base64.b64encode(buffered.getvalue()).decode("utf-8") | ||
| content.append({ | ||
| "type": "image", | ||
| "url": f"data:image/png;base64,{img_b64}" | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work in most of the case when using the chat model? I am not sure if our protocol handles the base64 format correctly haha. When I design it I was expecting the url value is a pillow image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can put the two run script into your task folder or in the examples folder. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this the run judge sglang are mostly hardcoded? The model path, mem fraction and input output file etc. Wonder if this can further improved. Saw that you are using the SGLangLauncher from the lmms-eval, is it possible to integrate this into the utils.py?
|
Hi, most of the part LGTM, just wondering if you can merge the sglang launcher using the launcher args and put the scoring logic into the utils. Thanks! lmms-eval/lmms_eval/__main__.py Lines 98 to 102 in a468e57
|
Hello.
This PR adds a new benchmark MME-SCI. MME-SCI is a comprehensive multimodal benchmark designed to evaluate the scientific reasoning capabilities of Multimodal Large Language Models (MLLMs). It addresses key limitations of existing benchmarks by focusing on multilingual adaptability, comprehensive modality coverage, and fine-grained knowledge point annotation.
Arxiv: https://www.arxiv.org/abs/2508.13938
Data: https://huggingface.co/datasets/JCruan/MME-SCI
Thank you!