This repository contains a PyTorch re-implementation of the paper: Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis (CVPR 2024).
Requires Python 3.6+, Cuda 11.3+ and PyTorch 1.10+.
Tested in Linux and Anaconda3 with Python 3.9 and PyTorch 1.10.
Please refer to scripts/
conda create -n dyntet python=3.9
conda activate dyntet
conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge
pip install ninja imageio PyOpenGL glfw xatlas gdown
pip install git+
pip install git+
pip install --global-option="--no-networks" git+
pip install scikit-learn configargparse face_alignment natsort matplotlib dominate tensorboard kornia trimesh open3d imageio-ffmpeg lpips easydict pysdf rich openpyxl gfpgan
The following steps refer to AD-NeRF.
Prepare face-parsing model.
wget -O data_utils/face_parsing/79999_iter.pth
Prepare the 3DMM model for head pose estimation.
wget -O data_utils/face_tracking/3DMM/exp_info.npy wget -O data_utils/face_tracking/3DMM/keys_info.npy wget -O data_utils/face_tracking/3DMM/sub_mesh.obj wget -O data_utils/face_tracking/3DMM/topology_info.npy
Download 3DMM model from Basel Face Model 2009:
# 1. copy 01_MorphableModel.mat to data_util/face_tracking/3DMM/ # 2. cd data_utils/face_tracking && python
In addition, the following steps refer to Deep3DFace. We use 3DMM coefficients to drive talking heads.
- Download the pre-trained model using this link (google drive) and organize the directory into the following structure:
└─── checkpoints
└─── facerecon
└─── epoch_20.pth
For evaluation, download the pre-trained model arcface model and organize the directory into the following structure:
└─── model_ir_se50.pth
Put training video under
- The video must be 25FPS, with all frames containing the talking person.
- Due to the usage of nvdiffrast, we will process video width and height into integers multiple of 8, like 448*448 and 512*512.
We get the experiment videos mainly from AD-NeRF, ER-NeRF, GeneFace and YouTube. Due to copyright restrictions, we can't distribute all of them. You may have to download and crop these videos by youself. Here is an example training video (Obama) from AD-NeRF.
mkdir -p data/video wget -O data/video/obama.mp4
Run script to process the video. (may take several hours)
python data_utils/ --path "data/video/obama.mp4" --save_dir "data/video/obama" --task -1
To train the model on the Obama video:
python --config configs/obama.json
To evaluate the trained model on the validation dataset:
python evaluate_utils/ --train_dir out/obama
To infer the video of validation dataset:
python --config configs/obama.json
To infer the video with customized 3DMM coefficients, and (optionally) merge the video and audio:
python --config configs/obama.json --drive_3dmm data/test_audio/obama_sing_sadtalker.npy --audio data/test_audio/sing.wav
Note: Given an audio (e.g., AUDIO.wav
), you can try SadTalker to generate the 3DMM coefficients mat file (e.g., FILE.mat
) , then run
python --config configs/obama.json --drive_3dmm FILE.mat --audio AUDIO.wav
- Release Code.
- We consider that uploading a script that fine-tunes GFPGAN on DynTet to enhance the visual effects of talking head.
Consider citing as below if you find this repository helpful to your project:
title={Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis},
author={Zicheng Zhang and Ruobing Zheng and Ziwen Liu and Congying Han and Tianqi Li and Meng Wang and Tiande Guo and Jingdong Chen and Bonan Li and Ming Yang},
This code is developed heavily relying on AD-NeRF for data processing, nvdiffrec for Marching Tetrahedra, Deep3DFace for 3DMM extraction. Some of the code is drawn from OTAvatar, RAD-NeRF and ER-NeRF. Thanks for these great projects. Please follow the license of the above open-source code