VLog

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
Kevin Qinghong Lin, Mike Zheng Shou
Show Lab @ National University of Singapore

	VLog (CVPR'25)	VLog-Agent
TL;DR	Video Narration as Vocabulary	Video as Long Document

Introduction	A novel, efficient video narrator (GPT2-based) with Narration Vocabulary via Generative Retrieval.	Given a video, we turn it into a textual document containing visual + audio info. By sending this doc to LLM, we can chat over the video!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

VLog

Files

README.md

Latest commit

History

README.md

File metadata and controls

VLog