This repository contains two scripts for building a facial video clip database. The first script (yt-download.py
) downloads YouTube videos in parallel and splits them into smaller clips for efficient processing. The second script (face-extraction.py
) processes the clips to extract facial regions, compute embeddings, and save the data in a structured format, including metadata.
The CSV list we used to create our dataset in the paper Anchored Diffusion for Video Face Reenactment is available here.
-
yt-download.py
:- Downloads YouTube videos based on a list of video IDs provided in a CSV file.
- Splits videos into smaller clips of configurable duration.
- Supports parallel downloads for faster processing.
- Saves metadata about the downloaded videos.
-
face-extraction.py
:- Processes video clips to detect and extract facial regions.
- Computes CLIP embeddings, IQA scores, and other quality metrics for each clip.
- Extracts audio tracks and optionally converts facial frames into LMDB format.
- Saves structured metadata as CSV and pickle files.
-
Clone the repository:
git clone <repository_url> cd <repository_directory>
-
Create a conda environment from the provided
environment.yaml
file:conda env create -f environment.yaml conda activate yt-scraper
-
Ensure the following tools are installed:
- FFmpeg: Used for audio and video processing.
- CUDA Toolkit (if using GPU acceleration).
Use the yt-download.py
script to download and split YouTube videos into smaller clips.
Command:
python yt-download.py --urls <path_to_csv> --records-dir <output_directory> --clip-duration <clip_duration_in_minutes> --num-videos <number_of_videos_to_download> --num-processes <number_of_parallel_processes>
Example:
python yt-download.py --urls urls/faces/yt-@Oscars.csv --records-dir ./downloads --clip-duration 1 --num-videos 10 --num-processes 4
Inputs:
--urls
: Path to the CSV file containing YouTube video IDs (must have avideo_id
column).--records-dir
: Directory to save downloaded videos and clips.--clip-duration
: Duration of each split clip in minutes.--num-videos
: (Optional) Limit the number of videos to download.--num-processes
: Number of parallel processes for downloading and splitting.
Outputs:
- Downloaded videos saved in the specified directory.
- Clips saved in subdirectories by video ID.
- Metadata CSV file summarizing the download process.
Use the face-extraction.py
script to process the video clips, extract facial regions, and compute metrics.
Command:
python face-extraction.py --input-dir <input_directory> --output-dir <output_directory> --cuda-devices <list_of_cuda_devices> --num-processes <number_of_parallel_processes> [--make-lmdb]
Example:
python face-extraction.py --input-dir ./downloads --output-dir ./processed_faces --cuda-devices cuda:0 cuda:1 --num-processes 4 --make-lmdb
Inputs:
--input-dir
: Directory containing video clips (generated byyt-download.py
).--output-dir
: Directory to save processed data, including extracted clips and metadata.--cuda-devices
: List of CUDA devices to use for processing (e.g.,cuda:0 cuda:1
).--num-processes
: Number of parallel processes.--make-lmdb
: (Optional) Convert extracted facial frames into LMDB format.
Outputs:
- Extracted facial clips saved in the output directory.
- LMDB files (if
--make-lmdb
is specified). - Metadata saved as
metadata.csv
andmetadata.pkl
.
Both scripts generate metadata files summarizing their respective processes.
-
yt-download.py
Metadata:video_id
: YouTube video ID.downloaded
: Boolean indicating download success.failed
: Boolean indicating download failure.path
: Relative path to the downloaded video.num_clips
: Number of clips generated.
-
face-extraction.py
Metadata:file_original
: Original video file path.file_relative
: Relative path to the extracted clip.lmdb_file
: Path to the LMDB file (if created).audio_file
: Path to the extracted audio file.clip_score
: CLIP score for consistency.hyperiqa_score
: HyperIQA score for quality assessment.clipiqa+_score
: CLIP-IQA+ score for quality assessment.- Additional fields for frame ranges, dimensions, duration, and resolution.
If you find our code useful in your research or applications, please consider citing our paper:
@article{kligvasser2024anchored,
title={Anchored diffusion for video face reenactment},
author={Kligvasser, Idan and Cohen, Regev and Leifman, George and Rivlin, Ehud and Elad, Michael},
journal={arXiv preprint arXiv:2407.15153},
year={2024}
}
This helps us track the impact of our work and motivates us to continue contributing to the community. Thank you for your support!