Skip to content

Latest commit

 

History

History
141 lines (106 loc) · 5.36 KB

README.md

File metadata and controls

141 lines (106 loc) · 5.36 KB

Facial Video Clip Database Builder

This repository contains two scripts for building a facial video clip database. The first script (yt-download.py) downloads YouTube videos in parallel and splits them into smaller clips for efficient processing. The second script (face-extraction.py) processes the clips to extract facial regions, compute embeddings, and save the data in a structured format, including metadata.

The CSV list we used to create our dataset in the paper Anchored Diffusion for Video Face Reenactment is available here.

Sample Output

Features

  1. yt-download.py:

    • Downloads YouTube videos based on a list of video IDs provided in a CSV file.
    • Splits videos into smaller clips of configurable duration.
    • Supports parallel downloads for faster processing.
    • Saves metadata about the downloaded videos.
  2. face-extraction.py:

    • Processes video clips to detect and extract facial regions.
    • Computes CLIP embeddings, IQA scores, and other quality metrics for each clip.
    • Extracts audio tracks and optionally converts facial frames into LMDB format.
    • Saves structured metadata as CSV and pickle files.

Installation

  1. Clone the repository:

    git clone <repository_url>
    cd <repository_directory>
  2. Create a conda environment from the provided environment.yaml file:

    conda env create -f environment.yaml
    conda activate yt-scraper
  3. Ensure the following tools are installed:

    • FFmpeg: Used for audio and video processing.
    • CUDA Toolkit (if using GPU acceleration).

Usage

1. Download and Split Videos

Use the yt-download.py script to download and split YouTube videos into smaller clips.

Command:

python yt-download.py     --urls <path_to_csv>     --records-dir <output_directory>     --clip-duration <clip_duration_in_minutes>     --num-videos <number_of_videos_to_download>     --num-processes <number_of_parallel_processes>

Example:

python yt-download.py     --urls urls/faces/yt-@Oscars.csv     --records-dir ./downloads     --clip-duration 1     --num-videos 10     --num-processes 4

Inputs:

  • --urls: Path to the CSV file containing YouTube video IDs (must have a video_id column).
  • --records-dir: Directory to save downloaded videos and clips.
  • --clip-duration: Duration of each split clip in minutes.
  • --num-videos: (Optional) Limit the number of videos to download.
  • --num-processes: Number of parallel processes for downloading and splitting.

Outputs:

  • Downloaded videos saved in the specified directory.
  • Clips saved in subdirectories by video ID.
  • Metadata CSV file summarizing the download process.

2. Build the Facial Video Clip Database

Use the face-extraction.py script to process the video clips, extract facial regions, and compute metrics.

Command:

python face-extraction.py     --input-dir <input_directory>     --output-dir <output_directory>     --cuda-devices <list_of_cuda_devices>     --num-processes <number_of_parallel_processes>     [--make-lmdb]

Example:

python face-extraction.py     --input-dir ./downloads     --output-dir ./processed_faces     --cuda-devices cuda:0 cuda:1     --num-processes 4     --make-lmdb

Inputs:

  • --input-dir: Directory containing video clips (generated by yt-download.py).
  • --output-dir: Directory to save processed data, including extracted clips and metadata.
  • --cuda-devices: List of CUDA devices to use for processing (e.g., cuda:0 cuda:1).
  • --num-processes: Number of parallel processes.
  • --make-lmdb: (Optional) Convert extracted facial frames into LMDB format.

Outputs:

  • Extracted facial clips saved in the output directory.
  • LMDB files (if --make-lmdb is specified).
  • Metadata saved as metadata.csv and metadata.pkl.

Metadata

Both scripts generate metadata files summarizing their respective processes.

  • yt-download.py Metadata:

    • video_id: YouTube video ID.
    • downloaded: Boolean indicating download success.
    • failed: Boolean indicating download failure.
    • path: Relative path to the downloaded video.
    • num_clips: Number of clips generated.
  • face-extraction.py Metadata:

    • file_original: Original video file path.
    • file_relative: Relative path to the extracted clip.
    • lmdb_file: Path to the LMDB file (if created).
    • audio_file: Path to the extracted audio file.
    • clip_score: CLIP score for consistency.
    • hyperiqa_score: HyperIQA score for quality assessment.
    • clipiqa+_score: CLIP-IQA+ score for quality assessment.
    • Additional fields for frame ranges, dimensions, duration, and resolution.

Citation

If you find our code useful in your research or applications, please consider citing our paper:

@article{kligvasser2024anchored,
  title={Anchored diffusion for video face reenactment},
  author={Kligvasser, Idan and Cohen, Regev and Leifman, George and Rivlin, Ehud and Elad, Michael},
  journal={arXiv preprint arXiv:2407.15153},
  year={2024}
}

This helps us track the impact of our work and motivates us to continue contributing to the community. Thank you for your support!