Skip to content

Latest commit

 

History

History
38 lines (29 loc) · 3 KB

VideoCaptioning.md

File metadata and controls

38 lines (29 loc) · 3 KB

Video Captioning

  • VideoStory: A New Multimedia Embedding for Few-Example Recognition and Translation of Events
    [Paper][Homepage]
    45,826 videos and their descriptions obtained by harvesting YouTube

  • MSR-VTT: A Large Video Description Dataset for Bridging Video and Language (CVPR 2016)
    [Paper][Homepage]
    10K web video clips, 200K clip-sentence pairs

  • VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research (ICCV 2019)
    [Paper][Homepage]
    41,250 videos, 825,000 captions in both English and Chinese, over 206,000 English-Chinese parallel translation pairs

  • ActivityNet Captions: Dense-Captioning Events in Videos (ICCV 2017)
    [Paper][Homepage]
    20k videos, 100k sentences

  • ActivityNet Entities: Grounded Video Description
    [Paper][Homepage]
    14,281 annotated videos, 52k video segments with at least one noun phrase annotated per segment, augment the ActivityNet Captions dataset with 158k bounding box

  • WebVid-2M: Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval (2021)
    [Paper][Homepage]
    over two million videos with weak captions scraped from the internet

  • VTW: Title Generation for User Generated Videos (ECCV 2016)
    [Paper][Homepage]
    18100 video clips with an average of 1.5 minutes duration per clip

  • TGIF: A New Dataset and Benchmark on Animated GIF Description (CVPR 2016)
    [Paper][Homepage]
    100K animated GIFs from Tumblr and 120K natural language descriptions

  • Charades: Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding (ECCV 2016)
    [Paper][Homepage]
    9,848 annotated videos, 267 people, 27,847 video descriptions, 66,500 temporally localized intervals for 157 action classes and 41,104 labels for 46 object classes