R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning

This is the official repository for the paper: R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning.

Project | ArXiv | Video

0.Supported Features

☐ Add progressive optimization for hash grid
☐ Add landmark generator
☑ Add landmark encoder
☑ Support methods: R2-Talker, RAD-NeRF, Geneface+instant-ngp

Method	Driving Features	Audio Encoder
R2-Talker	3D Facial Landmarks	Hash grid encoder
RAD-NeRF	Audio Features	Audio Feature Extractor
Geneface+instant-ngp	3D facial landmarks	Audio Feature Extractor

1.Install

Install dependency & Build extension (optional)

Tested on Ubuntu 22.04, Pytorch 1.12 and CUDA 11.6.

git clone git@github.com:KylinYee/R2-Talker-code.git
cd R2-Talker-code

Install dependency

# for ubuntu, portaudio is needed for pyaudio to work.
sudo apt install portaudio19-dev

pip install -r requirements.txt

Build extension (optional)

By default, we use load to build the extension at runtime. However, this may be inconvenient sometimes. Therefore, we also provide the setup.py to build each extension:

# install all extension modules
bash scripts/install_ext.sh

2.Data pre-processing

Preparation & Pre-processing Custom Training Video

Preparation:

## install pytorch3d
pip install "git+https://github.com/facebookresearch/pytorch3d.git"

## prepare face-parsing model
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_parsing/79999_iter.pth?raw=true -O data_utils/face_parsing/79999_iter.pth

## prepare basel face model
# 1. download `01_MorphableModel.mat` from https://faces.dmi.unibas.ch/bfm/main.php?nav=1-2&id=downloads and put it under `data_utils/face_tracking/3DMM/`
# 2. download other necessary files from AD-NeRF's repository:
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/exp_info.npy?raw=true -O data_utils/face_tracking/3DMM/exp_info.npy
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/keys_info.npy?raw=true -O data_utils/face_tracking/3DMM/keys_info.npy
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/sub_mesh.obj?raw=true -O data_utils/face_tracking/3DMM/sub_mesh.obj
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/topology_info.npy?raw=true -O data_utils/face_tracking/3DMM/topology_info.npy
# 3. run convert_BFM.py
cd data_utils/face_tracking
python convert_BFM.py
cd ../..

## prepare ASR model
# if you want to use DeepSpeech as AD-NeRF, you should install tensorflow 1.15 manually.
# else, we also support Wav2Vec in PyTorch.

Pre-processing Custom Training Video

Put training video under data/<ID>/<ID>.mp4.

The video must be 25FPS, with all frames containing the talking person. The resolution should be about 512x512, and duration about 1-5min.

# an example training video from AD-NeRF
mkdir -p data/obama
wget https://github.com/YudongGuo/AD-NeRF/blob/master/dataset/vids/Obama.mp4?raw=true -O data/obama/obama.mp4

Run script (may take hours dependending on the video length)

# run all steps
python data_utils/process.py data/<ID>/<ID>.mp4

# if you want to run a specific step 
python data_utils/process.py data/<ID>/<ID>.mp4 --task 1 # extract audio wave

3D facial landmark generator will be added in the feature. If you want to process the custom data, please ref to Geneface to obtain trainval_dataset.npy, using our binarizedFile2landmarks.py to extract landmarks and put the landmarks to data/<ID>/.

File structure after finishing all steps:

./data/<ID>
├──<ID>.mp4 # original video
├──ori_imgs # original images from video
│  ├──0.jpg
│  ├──0.lms # 2D landmarks
│  ├──...
├──gt_imgs # ground truth images (static background)
│  ├──0.jpg
│  ├──...
├──parsing # semantic segmentation
│  ├──0.png
│  ├──...
├──torso_imgs # inpainted torso images
│  ├──0.png
│  ├──...
├──aud.wav # original audio 
├──aud_eo.npy # audio features (wav2vec)
├──aud.npy # audio features (deepspeech)
├──bc.jpg # default background
├──track_params.pt # raw head tracking results
├──transforms_train.json # head poses (train split)
├──transforms_val.json # head poses (test split)
|——aud_idexp_train.npy # head landmarks (train split)
|——aud_idexp_val.npy # head landmarks (test split)
|——aud_idexp.npy # head landmarks

For your convenience, we provide some processed results of the Obama video here. And you can also download more raw videos from geneface here.

3.Usage

Quick Start & Detailed Usage

Quick Start

We have prepared relevant materials here. Please download these materials and put them in the new pretrained file

File structure after finishing all steps:

./pretrained
├──r2talker_Obama_idexp_torso.pth # pretrained model 
├──test_eo.npy # driving audio features (wav2vec)
├──test_lm3ds.npy # driving audio features (landmarks)
├──test.wav # raw driving audio
├──bc.jpg # default background
├──transforms_val.json # head poses
├──test.mp4 # raw driving video

Run inference:

# save video to trail_test/results/*.mp4
sh scripts/test_pretrained.sh

Detailed Usage

First time running will take some time to compile the CUDA extensions.

# step.1 train (head)
# by default, we load data from disk on the fly.
# we can also preload all data to CPU/GPU for faster training, but this is very memory-hungry for large datasets.
# `--preload 0`: load from disk (default, slower).
# `--preload 1`: load to CPU, requires ~70G CPU memory (slightly slower)
# `--preload 2`: load to GPU, requires ~24G GPU memory (fast)
python main.py data/Obama/ --workspace trial_r2talker_Obama_idexp/ -O --iters 200000 --method r2talker --cond_type idexp

# step.2 train (finetune lips for another 50000 steps, run after the above command!)
python main.py data/Obama/ --workspace trial_r2talker_Obama_idexp/ -O --finetune_lips --iters 250000 --method r2talker --cond_type idexp


# step.3 train (torso)
# <head>.pth should be the latest checkpoint in trial_obama
python main.py data/Obama/ --workspace trial_r2talker_Obama_idexp_torso/ -O --torso --iters 200000 --head_ckpt trial_r2talker_Obama_idexp/checkpoints/ngp_ep0035.pth  --method r2talker --cond_type idexp

check the scripts directory for more provided examples.

4.Acknowledgement

This code is developed heavily relying on RAD-NeRF, GeneFace, and AD-NeRF. Thanks for these great projects.

5.Citation

@article{zhiling2023r2talker,
  title={R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning},
  author={Zhiling Ye, Liangguo Zhang, Dingheng Zeng, Quan Lu, Ning Jiang},
  journal={arXiv preprint arXiv:2312.05572},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.vscode		.vscode
assets		assets
data_utils		data_utils
freqencoder		freqencoder
gridencoder		gridencoder
nerf		nerf
raymarching		raymarching
scripts		scripts
shencoder		shencoder
workspace		workspace
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
activation.py		activation.py
binarizedFile2landmarks		binarizedFile2landmarks
encoding.py		encoding.py
main.py		main.py
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning

Project | ArXiv | Video

0.Supported Features

1.Install

Install dependency

Build extension (optional)

2.Data pre-processing

Preparation:

Pre-processing Custom Training Video

3.Usage

Quick Start

Detailed Usage

4.Acknowledgement

5.Citation

About

Releases

Packages

Languages

License

KylinYee/R2-Talker-code

Folders and files

Latest commit

History

Repository files navigation

R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning

Project | ArXiv | Video

0.Supported Features

1.Install

Install dependency

Build extension (optional)

2.Data pre-processing

Preparation:

Pre-processing Custom Training Video

3.Usage

Quick Start

Detailed Usage

4.Acknowledgement

5.Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages