Caption Results

COCO Caption

By default, we evaluate COCO Caption Validation set (5000 samples), and report the following metrics: `BLEU-1, BLEU-4, CIDEr, ROUGE-L

We use the following prompt to evaluate all VLMs: Please describe this image in general. Directly provide the description, do not include prefix like "This image depicts".

No specific prompt is adopted for all VLMs.

Evaluation Results

Model	BLEU-4	BLEU-1	ROUGE-L	CIDEr	Word_cnt mean.	Word_cnt std.
EMU2-Chat	38.7	78.2	56.9	109.2	9.6	1.1
Qwen-VL-Chat	34	75.8	54.9	98.9	10	1.7
IDEFICS-80B-Instruct	32.5	76.1	54.1	94.9	9.7	3.2
IDEFICS-9B-Instruct	29.4	72.7	53.4	90.4	10.5	4.4
InstructBLIP-7B	20.9	56.8	39.9	58.1	11.6	5.9
InstructBLIP-13B	16.9	50	37	52.4	11.8	12.8
InternLM-XComposer-VL	12.4	38.3	37.9	41	26.3	22.2
GeminiProVision	8.4	33.2	31.2	9.7	35.2	15.7
LLaVA-v1.5-7B (QLoRA)	7.2	25	36.6	43.2	48.8	42.9
mPLUG-Owl2	7.1	25.8	33.6	35	45.8	32.1
LLaVA-v1-7B	6.7	27.3	26.7	6.1	40.9	16.1
VisualGLM	5.4	28.6	23.6	0.2	41.5	11.5
LLaVA-v1.5-13B (QLoRA)	5.3	19.6	25.8	17.8	72.2	39.4
LLaVA-v1.5-13B	5.1	20.7	21.2	0.3	70.6	22.3
LLaVA-v1.5-7B	4.6	19.6	19.9	0.1	72.5	21.7
PandaGPT-13B	4.6	19.9	19.3	0.1	65.4	16.6
MiniGPT-4-v1-13B	4.4	20	19.8	1.3	64.4	30.5
MiniGPT-4-v1-7B	4.3	19.6	17.5	0.8	61.9	30.6
LLaVA-InternLM-7B (QLoRA)	4	17.3	17.2	0.1	82.3	21
LLaVA-InternLM2-20B (QLoRA)	4	17.9	17.3	0	83.2	20.4
CogVLM-17B-Chat	3.6	21.3	20	0.1	56.2	13.7
Qwen-VL	3.5	11.6	30	41.1	46.6	105.2
GPT-4v (detail: low)	3.3	18	18.1	0	77.8	20.4
TransCore-M	2.1	14.2	13.8	0.2	92	6.7
ShareGPT4V-7B	1.4	9.7	10.6	0.1	147.9	45.4
MiniGPT-4-v2	1.4	12.6	13.3	0.1	83	27.1
OpenFlamingo v2	1.3	6.4	15.8	14.9	60	81.9
SharedCaptioner	1	8.8	9.2	0	164.2	31.6

We noticed that, VLMs that generate long image descriptions tend to achieve inferior scores under different caption metrics.

Error Analysis & Case Study

TBD.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caption.md

Caption.md

Caption Results

COCO Caption

Evaluation Results

Error Analysis & Case Study

Files

Caption.md

Latest commit

History

Caption.md

File metadata and controls

Caption Results

COCO Caption

Evaluation Results

Error Analysis & Case Study