Skip to content

[NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"

Notifications You must be signed in to change notification settings

SilentView/LVD-2M

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LVD-2M: A Long-take Video Dataset with Temporally Dense Captions

Official Github repository of
LVD-2M: A Long-take Video Dataset with Temporally Dense Captions

Tianwei Xiong1,*, Yuqing Wang1,*, Daquan Zhou2,†, Zhijie Lin2, Jiashi Feng2, Xihui Liu1,✉

1The University of Hong Kong, 2ByteDance
*Equal contribution. †Project lead. Corresponding author.

NeurIPS 2024 Track Datasets and Benchmarks

arXiv Project Page

News

[2024/10/15] The dataset, the research paper and the project page are released!

Introduction

LVD-2M is a dataset featuring:

  1. long videos covering at least 10 seconds
  2. long-take videos without cuts
  3. large motion and diverse contents
  4. temporally dense captions.

Dataset Statistics

alt text

Dataset Access

Quick Walk Through for 100 Randomly Sampled Videos

We randomly sample 100 videos (Youtube source) from LVD-2M, users can download the videos and the annotation file.

We note that even a direct non-cherry picking random sample already presents decent quality.

We will remove the video samples from our dataset / demonstration if you find them inappropriate. Please contact xiongt20 at gmail dot com for the request.

File Downloading

We provide three splits of our video dataset according to their sources: Youtube, HDVG and WebVid.

You can download the three files from the links

The meta records should be put in the following paths:

  • data/ytb_600k_720p.csv
  • data/hdvg_300k_720p.csv
  • data/webvid_1200k_336_short.csv

Explanations for the Fields of the Meta Files:

Each row in the csv file corresponds to a video clip, the columns are:

  • raw_caption: The captions generated by LLaVA-v1.6-next-34B. For long video clips, multiple captions seperated by "Caption x:" will be provided.
  • refined_caption: The refined captions generated by Claude3-Haiku, refining the raw_caption into a consistent description of the whole video clip.
  • rewritten_caption: The rewritten captions generated by LLaMA-v3.1-70B, from the refined_caption to a more concise user-input style.
  • key: The id of the video clip.
  • video_id: The id of the YouTube video. Note a youtube video can have mutiple video clips.
  • url: The url of the video. For youtube videos, it is the url of the video that the video clip is from. For webvid videos, it directly points to the video clip.
  • dataset_src: Where the video clip is from. Values can be [hdvg, panda70m, internvid, webvid].
  • orig_caption: The original caption of the video clip, given by its dataset_src.
  • total score: The average optical flow score of the video clip.
  • span: The starting and ending time of the video clip in the original video, for video clips from YouTube only.
  • video_time: Then length of the video clip.
  • orig_span: (Trivial content) Special record for HDVG data format. It is a result of HDVG cutting video clips further into smaller clips.
  • scene_cut: (Trivial content) Special record for HDVG data format.

Environment

conda create --name lvd2m python=3.9
conda activate lvd2m

# install ffmpeg
sudo apt-get install ffmpeg

pip install -r requirements.txt

Video Downloading Script

To download videos from a csv file, run the following command:

${PYTHON_PATH} \
download_videos_release.py \
--bsz=96 \
--resolution=720p \
--node_num=1 \
--node_id=0 \
--process_num=96 \
--workdir=cache/download_cache \
--out_dir="dataset/videos" \
--dataset_key="hdvg" \
--multiprocess

Your google accounts may be banned or suspended for too many requets. So you are suggested to use multiple accounts. Set the ACCOUNT_NUM in download_videos_release.py to specify.

Details for Video Downloading

We don't provide the video data directly, instead we provide ways to download the videos from their original sources.

Although HDVG dataset is also from youtube, its format is different from other youtube scraped datasets, so it is treated seperately.

Technical suggestions for downloading videos from YouTube

We use a modified version of pytube to download the videos. It supports downloading videos from youtube in a parallel, fast and stable way (using multiprocessing and multiple accounts). For more details, check the download_videos_release.py script.

Overally, users are suggested to prepare multiple google accounts, run python download_videos_release.py --reset_auth for authorization and run the downloading scripts.

We implemented the mechanism of dividing the request loads to multiple accounts. The processes launched on all the nodes will be evenly assigned to different accounts.

Note: the code for downloading videos from youtube could fail due to variation in youtube api behaviors, you can check the issues in pytube for updates.

Disclaimer about WebVid

We don't provide code for downloading videos from webvid (whose videos are from stock footage providers) for two reasons:

  1. Users can directly access these video clips through the provided urls, which is much simper than video clips from youtube.
  2. To avoid possible violation of copyrights.

License

The video data is collected from publicly available resources. The license of this dataset is the same as License of HD-VILA.

Acknowledgements

Here we list the projects that inspired and helped us to build LVD-2M.

Citation

@article{xiong2024lvd2m,
      title={LVD-2M: A Long-take Video Dataset with Temporally Dense Captions}, 
      author={Tianwei Xiong and Yuqing Wang and Daquan Zhou and Zhijie Lin and Jiashi Feng and Xihui Liu},
      year={2024},
      journal={arXiv preprint arXiv:2410.10816}
}

About

[NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages