🔥 🔥 🔥 Awesome MLLMs/Benchmarks for Short/Long/Streaming Video Understanding 📹
Title | Venue | Date | Code | Frames |
---|---|---|---|---|
T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs |
arXiv | 2024-12 | Github | - |
Streaming long video understanding with large language models | arXiv | 2024-05 | - | 16(Streaming) |
Title | Venue | Date | Repo | LeaderBoard |
---|---|---|---|---|
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark |
- | 2023-12 | Github | - |
TempCompass: Do Video LLMs Really Understand Videos? |
ACL | 2024-03 | Github | - |
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis |
- | 2024-06 | Github | - |
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding |
NIPS D&B | 2024-06 | Github | - |
MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding |
arXiv | 2024-06 | Github | - |
HourVideo: 1-Hour Video-Language Understanding |
NIPS D&B | 2024-11 | Github | comming soon |