Skip to content

🔥 🔥 🔥 Awesome MLLMs/Benchmarks for Short/Long/Streaming Video Understanding 📹

Notifications You must be signed in to change notification settings

pipixin321/Awesome-Video-MLLMs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 

Repository files navigation

Awesome-Video-Multimodal-Large-Language-Models Awesome Awesome MLLM GitHub last commit

🔥 🔥 🔥 Awesome MLLMs/Benchmarks for Short/Long/Streaming Video Understanding 📹

Welcome to stars ⭐ & comments 😀 & sharing 📈 !!

Contents


General Works

Title Venue Date Code Frames
Star
TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability
arXiv 2024-11 Github 128
Star
ARIA : An Open Multimodal Native Mixture-of-Experts Model
arXiv 2024-10 Github 256
Star
Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution
arXiv 2024-10 Github 768(2fps)
Star
LLaVA-Video: Video Instruction Tuning With Synthetic Data
arXiv 2024-10 Github 64
Star
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
arXiv 2024-10 Github 1FPS
Star
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
arXiv 2023-08 Github 64
Star
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
arXiv 2024-08 Github 128
Star
InternVL2: Better than the Best—Expanding Performance Boundaries of Open-Source Multimodal Models with the Progressive Scaling Strategy
blog 2024-07 Github 16
Star
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
arXiv 2024-06 Github 32
Star
ShareGPT4Video: Improving video understanding and generation with better captions
arXiv 2024-06 Github 16
Star
LongVA: Long context transfer from language to vision
arXiv 2024-06 Github 1FPS
Star
LongVLM: Efficient long video understanding via large language models
ECCV 2024-04 Github 100
Star
VILA: On Pre-training for Visual Language Models
CVPR 2023-12 Github 8
Star
TimeChat: A time-sensitive multimodal large language model for long video understanding
CVPR 2023-12 Github 96
Star
Chat-UniVi unified visual representation empowers large language models with image and video understanding
CVPR 2023-11 Github 64
Star
VTimeLLM: Empower LLM to Grasp Video Moments
CVPR 2023-11 Github 100
Star
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
ECCV 2023-11 Github 1FPS
Star
Video-LLaVA: Learning united visual representation by alignment before projection
arXiv 2023-11 Github 8
Star
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
arXiv 2023-07 Github 2048
Star
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
ACL 2023-06 Github 100
Star
VALLEY: Video Assistant with Large Language model Enhanced ability
arXiv 2023-06 Github 0.5FPS
Star
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
EMNLP 2023-06 Github 8
Star
VideoChat: Chat-Centric Video Understanding
arXiv 2023-05 Github 4~32
Star
LLaMA-Adapter: Efficient Fine-tuning of LLaMA
ICLR 2023-03 Github -

Interesting Works

Title Venue Date Code Frames
Star
T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs
arXiv 2024-12 Github -
Streaming long video understanding with large language models arXiv 2024-05 - 16(Streaming)

Benchmarks for Evaluation

General

Title Venue Date Repo LeaderBoard
Star
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
- 2023-12 Github -
Star
TempCompass: Do Video LLMs Really Understand Videos?
ACL 2024-03 Github -
Star
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
- 2024-06 Github -
Star
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
NIPS D&B 2024-06 Github -
Star
MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding
arXiv 2024-06 Github -
Star
HourVideo: 1-Hour Video-Language Understanding
NIPS D&B 2024-11 Github comming soon

Releases

No releases published

Packages

No packages published