You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We support the evaluation of InternLM-XComposer2-4KHD in VLMEvalKit
InternLM-XComposer2-VL Evaluation
In InternLM-XComposer2, we evaluate models on a diverse set of 13 benchmarks with the following scripts. The evaluation is also supported in VLMEvalKit (The results will have slight difference).
MathVista
Run the notebook MathVista.ipynb.
MathVista results
test
testmini
57.93
57.6
MMMU
Run the notebook MMMU/MMMU_Validation.ipynb.
MMMU results
test
val
38.2
42.0
MME
Download the data following the official instructions here.
Downloaded images to MME_Benchmark_release_version.
put the official eval_tool and MME_Benchmark_release_version under ./data/.
Single-GPU inference.
cd MME
CUDA_VISIBLE_DEVICES=0 python -u eval.py
MME results
=========== Perception ===========
total score: 1711.9952981192478
existence score: 195.0
count score: 160.0
position score: 163.33333333333334
color score: 195.0
posters score: 171.08843537414964
celebrity score: 153.8235294117647
scene score: 164.75
landmark score: 176.0
artwork score: 185.5
OCR score: 147.5
=========== Cognition ===========
total score: 530.7142857142858
commonsense_reasoning score: 145.71428571428572
numerical_calculation score: 137.5
text_translation score: 147.5
code_reasoning score: 100.0