Facial Video Clip Database Builder

This repository contains two scripts for building a facial video clip database. The first script (yt-download.py) downloads YouTube videos in parallel and splits them into smaller clips for efficient processing. The second script (face-extraction.py) processes the clips to extract facial regions, compute embeddings, and save the data in a structured format, including metadata.

The CSV list we used to create our dataset in the paper Anchored Diffusion for Video Face Reenactment is available here.

Features

yt-download.py:
- Downloads YouTube videos based on a list of video IDs provided in a CSV file.
- Splits videos into smaller clips of configurable duration.
- Supports parallel downloads for faster processing.
- Saves metadata about the downloaded videos.
face-extraction.py:
- Processes video clips to detect and extract facial regions.
- Computes CLIP embeddings, IQA scores, and other quality metrics for each clip.
- Extracts audio tracks and optionally converts facial frames into LMDB format.
- Saves structured metadata as CSV and pickle files.

Installation

Clone the repository:

git clone <repository_url>
cd <repository_directory>

Create a conda environment from the provided environment.yaml file:
```
conda env create -f environment.yaml
conda activate yt-scraper
```
Ensure the following tools are installed:
- FFmpeg: Used for audio and video processing.
- CUDA Toolkit (if using GPU acceleration).

Usage

1. Download and Split Videos

Use the yt-download.py script to download and split YouTube videos into smaller clips.

Command:

python yt-download.py     --urls <path_to_csv>     --records-dir <output_directory>     --clip-duration <clip_duration_in_minutes>     --num-videos <number_of_videos_to_download>     --num-processes <number_of_parallel_processes>

Example:

python yt-download.py     --urls urls/faces/yt-@Oscars.csv     --records-dir ./downloads     --clip-duration 1     --num-videos 10     --num-processes 4

Inputs:

--urls: Path to the CSV file containing YouTube video IDs (must have a video_id column).
--records-dir: Directory to save downloaded videos and clips.
--clip-duration: Duration of each split clip in minutes.
--num-videos: (Optional) Limit the number of videos to download.
--num-processes: Number of parallel processes for downloading and splitting.

Outputs:

Downloaded videos saved in the specified directory.
Clips saved in subdirectories by video ID.
Metadata CSV file summarizing the download process.

2. Build the Facial Video Clip Database

Use the face-extraction.py script to process the video clips, extract facial regions, and compute metrics.

Command:

python face-extraction.py     --input-dir <input_directory>     --output-dir <output_directory>     --cuda-devices <list_of_cuda_devices>     --num-processes <number_of_parallel_processes>     [--make-lmdb]

Example:

python face-extraction.py     --input-dir ./downloads     --output-dir ./processed_faces     --cuda-devices cuda:0 cuda:1     --num-processes 4     --make-lmdb

Inputs:

--input-dir: Directory containing video clips (generated by yt-download.py).
--output-dir: Directory to save processed data, including extracted clips and metadata.
--cuda-devices: List of CUDA devices to use for processing (e.g., cuda:0 cuda:1).
--num-processes: Number of parallel processes.
--make-lmdb: (Optional) Convert extracted facial frames into LMDB format.

Outputs:

Extracted facial clips saved in the output directory.
LMDB files (if --make-lmdb is specified).
Metadata saved as metadata.csv and metadata.pkl.

Metadata

Both scripts generate metadata files summarizing their respective processes.

yt-download.py Metadata:
- video_id: YouTube video ID.
- downloaded: Boolean indicating download success.
- failed: Boolean indicating download failure.
- path: Relative path to the downloaded video.
- num_clips: Number of clips generated.
face-extraction.py Metadata:
- file_original: Original video file path.
- file_relative: Relative path to the extracted clip.
- lmdb_file: Path to the LMDB file (if created).
- audio_file: Path to the extracted audio file.
- clip_score: CLIP score for consistency.
- hyperiqa_score: HyperIQA score for quality assessment.
- clipiqa+_score: CLIP-IQA+ score for quality assessment.
- Additional fields for frame ranges, dimensions, duration, and resolution.

Citation

If you find our code useful in your research or applications, please consider citing our paper:

@article{kligvasser2024anchored,
  title={Anchored diffusion for video face reenactment},
  author={Kligvasser, Idan and Cohen, Regev and Leifman, George and Rivlin, Ehud and Elad, Michael},
  journal={arXiv preprint arXiv:2407.15153},
  year={2024}
}

This helps us track the impact of our work and motivates us to continue contributing to the community. Thank you for your support!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Facial Video Clip Database Builder

Features

Installation

Usage

1. Download and Split Videos

2. Build the Facial Video Clip Database

Metadata

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Facial Video Clip Database Builder

Features

Installation

Usage

1. Download and Split Videos

2. Build the Facial Video Clip Database

Metadata

Citation