add model Slime and Benchmark mme_realworld_lite #409

yfzhang114 · 2024-11-14T13:30:25Z

PR Summary

This pull request introduces two key updates:

Added the Slime model.
Updated the download link for MME-RealWorld, and added a new version, MME-RealWorld-lite, which samples 50 instances per task from MME-RealWorld for inference acceleration.

Performance Results on MME-RealWorld-lite

Below are the performance results of the Slime model, along with other models, on the MME-RealWorld-lite dataset.

Method	LLM	Overall	Perception						Reasoning
			OCR	RS	DT	MO	AD	Avg	OCR	DT	MO	AD	Avg
GPT-4o	-	46.4	81	45	65	34	37	49.1	72	50	42	33	42.1
GPT-4o-mini	-	37.4	70	23	62	19	34	38.8	57	39	19	35	35.2
Qwen2-VL	Qwen2-7B	46.7	86	40	74	28	36	48.2	73	46	47	36	44.4
LLaVA-OV	Qwen2-7B	45.8	82	51	64	34	45	52.8	71	43	45	35	42.7
Slime	Llama3-8B	37.1	58	36	51	29	33	37.7	51	27	41	34	36.4

add model Slime and Benchmark mme_realworld_lite

7c051fd

Luodian approved these changes Nov 14, 2024

View reviewed changes

Luodian merged commit 92b15c1 into EvolvingLMMs-Lab:main Nov 14, 2024
1 check passed

ZhaoCinyu pushed a commit to ZhaoCinyu/lmms-eval that referenced this pull request Dec 9, 2024

add model Slime and Benchmark mme_realworld_lite (EvolvingLMMs-Lab#409)

4ab1f5d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add model Slime and Benchmark mme_realworld_lite #409

add model Slime and Benchmark mme_realworld_lite #409

yfzhang114 commented Nov 14, 2024

add model Slime and Benchmark mme_realworld_lite #409

add model Slime and Benchmark mme_realworld_lite #409

Conversation

yfzhang114 commented Nov 14, 2024

PR Summary

Performance Results on MME-RealWorld-lite