Skip to content

Latest commit

 

History

History
10 lines (9 loc) · 813 Bytes

README.md

File metadata and controls

10 lines (9 loc) · 813 Bytes

VLog

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
Kevin Qinghong Lin, Mike Zheng Shou
Show Lab @ National University of Singapore

VLog (CVPR'25) VLog-Agent
TL;DR Video Narration as Vocabulary Video as Long Document
Introduction A novel, efficient video narrator (GPT2-based) with Narration Vocabulary via Generative Retrieval. Given a video, we turn it into a textual document containing visual + audio info. By sending this doc to LLM, we can chat over the video!