Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified version of script to download both audio and video #10

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 13 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@

# HDTF
Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual Dataset
Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual Dataset
<a href="https://openaccess.thecvf.com/content/CVPR2021/papers/Zhang_Flow-Guided_One-Shot_Talking_Face_Generation_With_a_High-Resolution_Audio-Visual_Dataset_CVPR_2021_paper.pdf" target="_blank">paper</a> <a href="https://github.com/MRzzm/HDTF/blob/main/Supplementary%20Materials.pdf" target="_blank">supplementary</a>

## Details of HDTF dataset
**./HDTF_dataset** consists of *youtube video url*, *video resolution* (in our method, may not be the best resolution), *time stamps of talking face*, *facial region* (in the our method) and *the zoom scale* of the cropped window.
**xx_video_url.txt:**
**xx_video_url.txt:**


```
Expand All @@ -29,24 +29,31 @@ format: video name+clip index | min_width | width | min_height | height (in
format: video name+clip index | window zoom scale
```


## Processing of HDTF dataset
When using HDTF dataset,
When using HDTF dataset,

- We provide video and url in **xx_video_url.txt**. (the highest definition of videos are 1080P or 720P). Transform video into **.mp4** format and transform interlaced video to progressive video as well.

- We split long original video into talking head clips with time stamps in **xx_annotion_time.txt**. Name the splitted clip as **video name_clip index.mp4**. For example, split the video *Radio11.mp4 00:30-01:00 01:30-02:30* into *Radio11_0.mp4* and *Radio11_1.mp4* .

- Our work does not always download videos with the best resolution, so we provide two cropping methods. Thanks @universome and @Feii Yin for pointing out this problem!
- Our work does not always download videos with the best resolution, so we provide two cropping methods. Thanks @universome and @Feii Yin for pointing out this problem!

1. Download the video with reference resulotion in **xx_resolution.txt** and crop the facial region with fixed window size in **xx_crop_wh.txt**. (This method is as same as ours, but the downloaded video may not be the best resolution).
2. First, download the video with best resulotion. Then, detect the facial landmark in the splitted talking head clips and count the square window of the face, specifically, count the facial region in each frame and merge all regions into one square range. Next, enlarge the window size with **xx_crop_ratio.txt**. Finally, crop the facial region.
2. First, download the video with best resulotion. Then, detect the facial landmark in the splitted talking head clips and count the square window of the face, specifically, count the facial region in each frame and merge all regions into one square range. Next, enlarge the window size with **xx_crop_ratio.txt**. Finally, crop the facial region.

- We resize all cropped videos into **512 x 512** resolution.


The HDTF dataset is available to download under a <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank"> Creative Commons Attribution 4.0 International License</a>. If you face any problems when processing HDTF, pls contact me.

## Downloading
For convenience, we added the `download.py` script which downloads, crops and resizes the dataset. You can use it via the following command:
```
python download.py --output_dir /path/to/output/dir --num_workers 8
```

Note: some videos might become unavailable if the authors will remove them or make them private.

## Reference
if you use HDTF, pls reference

Expand Down
60 changes: 60 additions & 0 deletions conda_env.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
name: hdtf
channels:
- conda-forge/label/cf201901
- jmcmurray
- conda-forge
- anaconda
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=4.5=1_gnu
- brotlipy=0.7.0=py39h27cfd23_1003
- bzip2=1.0.8=h7f98852_4
- ca-certificates=2022.6.15=ha878542_0
- certifi=2022.6.15=py39hf3d152e_0
- cffi=1.15.0=py39hd667e15_1
- cryptography=36.0.0=py39h9ce1e76_0
- ffmpeg=4.3.2=hca11adc_0
- freetype=2.10.4=h0708190_1
- gmp=6.2.1=h58526e2_0
- gnutls=3.6.13=h85f3911_1
- idna=3.3=pyhd3eb1b0_0
- lame=3.100=h7f98852_1001
- ld_impl_linux-64=2.35.1=h7274673_9
- libffi=3.3=he6710b0_2
- libgcc-ng=9.3.0=h5101ec6_17
- libgomp=9.3.0=h5101ec6_17
- libpng=1.6.37=h21135ba_2
- libstdcxx-ng=9.3.0=hd4cf53a_17
- ncurses=6.3=h7f8727e_2
- nettle=3.6=he412f7d_0
- openh264=2.1.1=h780b84a_0
- openssl=1.1.1o=h7f8727e_0
- os=0.1.4=0
- pip=21.2.4=py39h06a4308_0
- pycparser=2.21=pyhd3eb1b0_0
- pyopenssl=22.0.0=pyhd3eb1b0_0
- pysocks=1.7.1=py39h06a4308_0
- python=3.9.12=h12debd9_0
- python_abi=3.9=2_cp39
- qutil=3.2.1=6
- readline=8.1.2=h7f8727e_1
- setuptools=61.2.0=py39h06a4308_0
- sqlite=3.38.2=hc218d9a_0
- tk=8.6.11=h1ccaba5_0
- tqdm=4.29.0=py_0
- tzdata=2022a=hda174b7_0
- urllib3=1.26.9=py39h06a4308_0
- wheel=0.37.1=pyhd3eb1b0_0
- x264=1!161.3030=h7f98852_1
- xz=5.2.5=h7b6447c_0
- zlib=1.2.12=h7f8727e_2
- pip:
- brotli==1.0.9
- ffmpeg-python==0.2.0
- future==0.18.2
- mutagen==1.45.1
- pycryptodomex==3.15.0
- websockets==10.3
- yt-dlp==2022.6.22.1
prefix: /home/leee/anaconda3/envs/hdtf
275 changes: 275 additions & 0 deletions download.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,275 @@
"""
This file downloads almost all the videos from the HDTF dataset. Some videos are discarded for the following reasons:
- they do not contain cropping information because they are somewhat noisy (hand moving, background changing, etc.)
- they are not available on youtube anymore (at all or in the specified format)

The discarded videos constitute a small portion of the dataset, so you can try to re-download them manually on your own.

Usage:
```
$ python download.py --output_dir /tmp/data/hdtf --num_workers 8
```

You need tqdm and youtube-dl libraries to be installed for this script to work.
"""


import os
import argparse
from typing import List, Dict
from multiprocessing import Pool
import subprocess
from subprocess import Popen, PIPE
from urllib import parse

from tqdm import tqdm


subsets = ["RD", "WDA", "WRA"]


def download_hdtf(source_dir: os.PathLike, output_dir: os.PathLike, num_workers: int, **process_video_kwargs):
os.makedirs(output_dir, exist_ok=True)
os.makedirs(os.path.join(output_dir, '_videos_raw'), exist_ok=True)

download_queue = construct_download_queue(source_dir, output_dir)
task_kwargs = [dict(
video_data=vd,
output_dir=output_dir,
**process_video_kwargs,
) for vd in download_queue]
pool = Pool(processes=num_workers)
tqdm_kwargs = dict(total=len(task_kwargs), desc=f'Downloading videos into {output_dir}')

for _ in tqdm(pool.imap_unordered(task_proxy, task_kwargs), **tqdm_kwargs):
pass

print('Download is finished, you can now (optionally) delete the following directories, since they are not needed anymore and occupy a lot of space:')
print(' -', os.path.join(output_dir, '_videos_raw'))


def construct_download_queue(source_dir: os.PathLike, output_dir: os.PathLike) -> List[Dict]:
download_queue = []

for subset in subsets:
video_urls = read_file_as_space_separated_data(os.path.join(source_dir, f'{subset}_video_url.txt'))
crops = read_file_as_space_separated_data(os.path.join(source_dir, f'{subset}_crop_wh.txt'))
intervals = read_file_as_space_separated_data(os.path.join(source_dir, f'{subset}_annotion_time.txt'))
resolutions = read_file_as_space_separated_data(os.path.join(source_dir, f'{subset}_resolution.txt'))

for video_name, (video_url,) in video_urls.items():
if not f'{video_name}.mp4' in intervals:
print(f'Entire {subset}/{video_name} does not contain any clip intervals, hence is broken. Discarding it.')
continue

if not f'{video_name}.mp4' in resolutions or len(resolutions[f'{video_name}.mp4']) > 1:
print(f'Entire {subset}/{video_name} does not contain the resolution (or it is in a bad format), hence is broken. Discarding it.')
continue

all_clips_intervals = [x.split('-') for x in intervals[f'{video_name}.mp4']]
clips_crops = []
clips_intervals = []

for clip_idx, clip_interval in enumerate(all_clips_intervals):
clip_name = f'{video_name}_{clip_idx}.mp4'
if not clip_name in crops:
print(f'Clip {subset}/{clip_name} is not present in crops, hence is broken. Discarding it.')
continue
clips_crops.append(crops[clip_name])
clips_intervals.append(clip_interval)

clips_crops = [list(map(int, cs)) for cs in clips_crops]

if len(clips_crops) == 0:
print(f'Entire {subset}/{video_name} does not contain any crops, hence is broken. Discarding it.')
continue

assert len(clips_intervals) == len(clips_crops)
assert set([len(vi) for vi in clips_intervals]) == {2}, f"Broken time interval, {clips_intervals}"
assert set([len(vc) for vc in clips_crops]) == {4}, f"Broken crops, {clips_crops}"
assert all([vc[1] == vc[3] for vc in clips_crops]), f'Some crops are not square, {clips_crops}'

download_queue.append({
'name': f'{subset}_{video_name}',
'id': parse.parse_qs(parse.urlparse(video_url).query)['v'][0],
'intervals': clips_intervals,
'crops': clips_crops,
'output_dir': output_dir,
'resolution': resolutions[f'{video_name}.mp4'][0]
})

return download_queue


def task_proxy(kwargs):
return download_and_process_video(**kwargs)


def download_and_process_video(video_data: Dict, output_dir: str):
"""
Downloads the video and cuts/crops it into several ones according to the provided time intervals
"""
raw_download_path = os.path.join(output_dir, '_videos_raw', f"{video_data['name']}.mp4")
raw_download_log_file = os.path.join(output_dir, '_videos_raw', f"{video_data['name']}_download_log.txt")
download_result = download_video(video_data['id'], raw_download_path, resolution=video_data['resolution'], log_file=raw_download_log_file)

if not download_result:
print('Failed to download', video_data)
print(f'See {raw_download_log_file} for details')
return

# We do not know beforehand, what will be the resolution of the downloaded video
# Youtube-dl selects a (presumably) highest one
video_resolution = get_video_resolution(raw_download_path)
if not video_resolution != video_data['resolution']:
print(f"Downloaded resolution is not correct for {video_data['name']}: {video_resolution} vs {video_data['name']}. Discarding this video.")
return

for clip_idx in range(len(video_data['intervals'])):
start, end = video_data['intervals'][clip_idx]
clip_name = f'{video_data["name"]}_{clip_idx:03d}'
clip_path = os.path.join(output_dir, clip_name + '.mp4')
crop_success = cut_and_crop_video(raw_download_path, clip_path, start, end, video_data['crops'][clip_idx])

if not crop_success:
print(f'Failed to cut-and-crop clip #{clip_idx}', video_data)
continue


def read_file_as_space_separated_data(filepath: os.PathLike) -> Dict:
"""
Reads a file as a space-separated dataframe, where the first column is the index
"""
with open(filepath, 'r') as f:
lines = f.read().splitlines()
lines = [[v.strip() for v in l.strip().split(' ')] for l in lines]
data = {l[0]: l[1:] for l in lines}

return data


def download_video(video_id, download_path, resolution: int=None, video_format="mp4", log_file=None):
"""
Download video from YouTube.
:param video_id: YouTube ID of the video.
:param download_path: Where to save the video.
:param video_format: Format to download.
:param log_file: Path to a log file for youtube-dl.
:return: Tuple: path to the downloaded video and a bool indicating success.

Copy-pasted from https://github.com/ytdl-org/youtube-dl
"""
# if os.path.isfile(download_path): return True # File already exists

if log_file is None:
stderr = subprocess.DEVNULL
else:
stderr = open(log_file, "a")
video_selection = f"bestvideo[ext={video_format}]"
video_selection = video_selection if resolution is None else f"{video_selection}[height={resolution}]"

video_command = [
"youtube-dl",
"https://youtube.com/watch?v={}".format(video_id), "--quiet", "-f",
video_selection,
"--output", download_path,
"--no-continue"
]
video_return_code = subprocess.call(video_command, stderr=stderr)

success = video_return_code == 0

if success:
audio_command = [
"youtube-dl",
"https://youtube.com/watch?v={}".format(video_id), "--quiet",
"--extract-audio",
"--audio-format", "wav",
"--output", f"{download_path[:-4]}.wav",
"--no-continue"
]

audio_return_code = subprocess.call(audio_command, stderr=stderr)

success = audio_return_code == 0

if log_file is not None:
stderr.close()

return success and os.path.isfile(download_path)


def get_video_resolution(video_path: os.PathLike) -> int:
command = ' '.join([
"ffprobe",
"-v", "error",
"-select_streams", "v:0", "-show_entries", "stream=height", "-of", "csv=p=0",
video_path
])

process = Popen(command, stdout=PIPE, shell=True)
(output, err) = process.communicate()
return_code = process.wait()
success = return_code == 0

if not success:
print('Command failed:', command)
return -1

return int(output)


def cut_and_crop_video(raw_video_path, output_path, start, end, crop: List[int]):
# if os.path.isfile(output_path): return True # File already exists

x, out_w, y, out_h = crop

video_command = ' '.join([
"ffmpeg", "-i", raw_video_path,
"-strict", "-2", # Some legacy arguments
"-loglevel", "quiet", # Verbosity arguments
"-qscale", "0", # Preserve the quality
"-y", # Overwrite if the file exists
"-ss", str(start), "-to", str(end), # Cut arguments
"-filter:v", f'"crop={out_w}:{out_h}:{x}:{y}"', # Crop arguments
output_path
])

video_return_code = subprocess.call(video_command, shell=True)
success = video_return_code == 0

if not success:
print('Video command failed:', video_command)
return success

audio_command = ' '.join([
"ffmpeg", "-i", f"{raw_video_path[:-4]}.wav",
"-strict", "-2", # Some legacy arguments
"-loglevel", "quiet", # Verbosity arguments
"-qscale", "0", # Preserve the quality
"-y", # Overwrite if the file exists
"-ss", str(start), "-to", str(end), # Cut arguments
f"{output_path[:-4]}.wav"
])

audio_return_code = subprocess.call(audio_command, shell=True)
success = audio_return_code == 0

if not success:
print('Audio command failed:', audio_command)

return success


if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Download HDTF dataset")
parser.add_argument('-s', '--source_dir', type=str, default='HDTF_dataset', help='Path to the directory with the dataset')
parser.add_argument('-o', '--output_dir', type=str, help='Where to save the videos?')
parser.add_argument('-w', '--num_workers', type=int, default=8, help='Number of workers for downloading')
args = parser.parse_args()

download_hdtf(
args.source_dir,
args.output_dir,
args.num_workers,
)