Awesome-Video-Multimodal-Large-Language-Models

🔥 🔥 🔥 Awesome MLLMs/Benchmarks for Short/Long/Streaming Video Understanding 📹

Welcome to stars ⭐ & comments 😀 & sharing 📈 !!

Title	Venue	Date	Code	Frames
TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability	arXiv	2024-11	Github	128
ARIA : An Open Multimodal Native Mixture-of-Experts Model	arXiv	2024-10	Github	256
Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution	arXiv	2024-10	Github	768(2fps)
LLaVA-Video: Video Instruction Tuning With Synthetic Data	arXiv	2024-10	Github	64
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding	arXiv	2024-10	Github	1FPS
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone	arXiv	2023-08	Github	64
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models	arXiv	2024-08	Github	128
InternVL2: Better than the Best—Expanding Performance Boundaries of Open-Source Multimodal Models with the Progressive Scaling Strategy	blog	2024-07	Github	16
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs	arXiv	2024-06	Github	32
ShareGPT4Video: Improving video understanding and generation with better captions	arXiv	2024-06	Github	16
LongVA: Long context transfer from language to vision	arXiv	2024-06	Github	1FPS
LongVLM: Efficient long video understanding via large language models	ECCV	2024-04	Github	100
VILA: On Pre-training for Visual Language Models	CVPR	2023-12	Github	8
TimeChat: A time-sensitive multimodal large language model for long video understanding	CVPR	2023-12	Github	96
Chat-UniVi unified visual representation empowers large language models with image and video understanding	CVPR	2023-11	Github	64
VTimeLLM: Empower LLM to Grasp Video Moments	CVPR	2023-11	Github	100
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models	ECCV	2023-11	Github	1FPS
Video-LLaVA: Learning united visual representation by alignment before projection	arXiv	2023-11	Github	8
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding	arXiv	2023-07	Github	2048
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models	ACL	2023-06	Github	100
VALLEY: Video Assistant with Large Language model Enhanced ability	arXiv	2023-06	Github	0.5FPS
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding	EMNLP	2023-06	Github	8
VideoChat: Chat-Centric Video Understanding	arXiv	2023-05	Github	4~32
LLaMA-Adapter: Efficient Fine-tuning of LLaMA	ICLR	2023-03	Github	-

Interesting Works

Title	Venue	Date	Code	Frames
T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs	arXiv	2024-12	Github	-
Streaming long video understanding with large language models	arXiv	2024-05	-	16(Streaming)

Benchmarks for Evaluation

General

📊 Opencomprass Leaderboard

Title	Venue	Date	Repo	LeaderBoard
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark	-	2023-12	Github	-
TempCompass: Do Video LLMs Really Understand Videos?	ACL	2024-03	Github	-
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis	-	2024-06	Github	-
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding	NIPS D&B	2024-06	Github	-
MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding	arXiv	2024-06	Github	-
HourVideo: 1-Hour Video-Language Understanding	NIPS D&B	2024-11	Github	comming soon

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-Video-Multimodal-Large-Language-Models

Welcome to stars ⭐ & comments 😀 & sharing 📈 !!

Contents

General Works

Interesting Works

Benchmarks for Evaluation

General

📊 Opencomprass Leaderboard

About

Releases

Packages

pipixin321/Awesome-Video-MLLMs

Folders and files

Latest commit

History

Repository files navigation

Awesome-Video-Multimodal-Large-Language-Models

Welcome to stars ⭐ & comments 😀 & sharing 📈 !!

Contents

General Works

Interesting Works

Benchmarks for Evaluation

General

📊 Opencomprass Leaderboard

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages