add vinoground #326

HanSolo9682 · 2024-10-16T04:31:00Z

Hi, I want to add our video benchmark Vinoground to the lmms-eval database. This temporal counterfactual benchmark contains 1000 short and natural video-caption pairs. The best model, GPT-4o, can only perform at 35% on one of our metrics, while humans can achieve ~90% at ease. I have been able to reproduce our results with the code provided here on LLaVA-Video-7B-Qwen2. I believe that more models should be allowed to evaluate on Vinoground to truly test their dense temporal reasoning capabilities, and hence i find lmms-eval a great platform to do so.

Luodian · 2024-10-16T06:34:52Z

Hi thanks for this PR, can you also pin a result screenshot for a random model?

Also there are some linting issues may need to use pre-commit to resolve it.

HanSolo9682 · 2024-10-16T06:45:51Z

HanSolo9682 · 2024-10-16T06:49:21Z

I have just ran pre-commit and fixed the linting.

Co-authored-by: jzhang2427 <jzhang2427@wisc.edu>

add vinoground

2d466f7

HanSolo9682 force-pushed the main branch from 487b15c to 2d466f7 Compare October 16, 2024 06:49

Luodian approved these changes Oct 16, 2024

View reviewed changes

Luodian merged commit a72a9c0 into EvolvingLMMs-Lab:main Oct 16, 2024
1 check passed

KairuiHu pushed a commit that referenced this pull request Oct 24, 2024

add vinoground (#326)

5978627

Co-authored-by: jzhang2427 <jzhang2427@wisc.edu>

ZhaoCinyu pushed a commit to ZhaoCinyu/lmms-eval that referenced this pull request Dec 9, 2024

add vinoground (EvolvingLMMs-Lab#326)

a3f8b3d

Co-authored-by: jzhang2427 <jzhang2427@wisc.edu>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add vinoground #326

add vinoground #326

HanSolo9682 commented Oct 16, 2024

Luodian commented Oct 16, 2024 •

edited

Loading

HanSolo9682 commented Oct 16, 2024

HanSolo9682 commented Oct 16, 2024

add vinoground #326

add vinoground #326

Conversation

HanSolo9682 commented Oct 16, 2024

Luodian commented Oct 16, 2024 • edited Loading

HanSolo9682 commented Oct 16, 2024

HanSolo9682 commented Oct 16, 2024

Luodian commented Oct 16, 2024 •

edited

Loading