-
COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis (CVPR 2019)
[Paper][Homepage]
11,827 videos, 180 tasks, 12 domains, 46,354 annotated segments -
VideoLT: Large-scale Long-tailed Video Recognition
[Paper][Homepage]
256,218 untrimmed videos, annotated into 1,004 classes with a long-tailed distribution -
Youtube-8M: A Large-Scale Video Classification Benchmark
[Paper][Homepage]
8,000,000 videos, 4000 visual entities -
HVU: Large Scale Holistic Video Understanding (ECCV 2020)
[Paper][Homepage]
572k videos in total with 9 million annotations for training, validation and test set spanning over 3142 labels, semantic aspects defined on categories of scenes, objects, actions, events, attributes and concepts -
VLOG: From Lifestyle Vlogs to Everyday Interactions (CVPR 2018)
[Paper][Homepage]
114K video clips, 10.7K participants, Annotations: Hand/Semantic Object, Hand Contact State, Scene Classification -
EEV: A Large-Scale Dataset for Studying Evoked Expressions from Video
[Paper][Homepage]
Each video is annotated at 6 Hz with 15 continuous evoked expression labels, 36.7 million annotations of viewer facial reactions to 23,574 videos (1,700 hours) -
OmniSource Web Dataset: Omni-sourced Webly-supervised Learning for Video Recognition (ECCV 2020)
[Paper][Dataset]
web data related to the 200 classes in Mini-Kinetics subset, 732,855 instagram videos, 365,4650 instagram images, 3,050,880 google images