CVPR 2023
- ๐ฅ Video editing: single source video & a driving video & a piece of audio. We tranfer pose through the video and transfer expression through the audio with the help of SadTalker.
Source video | Result |
---|---|
full_s.mp4 |
dpe.mp4 |
full_s.mp4 |
dpe.mp4 |
full_s.mp4 |
dpe.mp4 |
- ๐ฅ Video editing: single source image & a driving video & a piece of audio. We tranfer pose through the video and transfer expression through the audio with the help of SadTalker.
demo4_1.mp4
demo5_1.mp4
- ๐ฅ Video editing: single source image & two driving videos. We tranfer pose through the first video and transfer expression through the second video. Some videos are selected from here.
- 2023.07.21 Release code for one-shot driving.
- 2023.05.26 Release code for training.
- 2023.05.06 Support
Enhancement
. - 2023.05.05 Support
Video editing
. - 2023.04.30 Add some demos.
- 2023.03.18 Support
Pose driving
๏ผExpression driving
andPose and Expression driving
. - 2023.03.18 Upload the pre-trained model, which is fine-tuning for expression generator.
- 2023.03.03 Release the test code!
- 2023.02.28 DPE has been accepted by CVPR 2023!
- Test code for video driving.
- Some demos.
- Gradio/Colab Demo.
- Training code of each componments.
- Test code for video editing.
- Test code for one-shot driving.
- Integrate audio driven methods for video editing.
- Integrate GFPGAN for face enhancement.
CLICK ME
git clone https://github.com/Carlyx/DPE
cd DPE
conda create -n dpe python=3.8
source activate dpe
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt
### install gpfgan for enhancer
pip install git+https://github.com/TencentARC/GFPGAN
CLICK ME
Please download our pre-trained model and put it in ./checkpoints.
Model | Description |
---|---|
checkpoints/dpe.pt | Pre-trained model (V1). |
python run_demo.py --s_path ./data/s.mp4 \
--d_path ./data/d.mp4 \
--model_path ./checkpoints/dpe.pt \
--face exp \
--output_folder ./res
python run_demo.py --s_path ./data/s.mp4 \
--d_path ./data/d.mp4 \
--model_path ./checkpoints/dpe.pt \
--face pose \
--output_folder ./res
Video driving:
python run_demo.py --s_path ./data/s.mp4 \
--d_path ./data/d.mp4 \
--model_path ./checkpoints/dpe.pt \
--face both \
--output_folder ./res
One-shot driving:
python run_demo_single.py --s_path ./data/s.jpg \
--pose_path ./data/pose.mp4 \
--exp_path ./data/exp.mp4 \
--model_path ./checkpoints/dpe.pt \
--face both \
--output_folder ./res
python crop_video.py
Before video editing, you should run python crop_video.py
to process the input full video.
For pre-trained segmentation model, you can download from here and put it in ./checkpoints.
(Optional) You can run git clone https://github.com/TencentARC/GFPGAN
and download the pre-trained enhancement model from here and put it in ./checkpoints. Then you can use --EN
to make the result better.
python run_demo_paste.py --s_path <cropped source video> \
--d_path <driving video> \
--box_path <txt after running crop_video.py> \
--model_path ./checkpoints/dpe.pt \
--face exp \
--output_folder ./res \
--EN
TODO
- Data preprocessing.
To train DPE, please follow video-preprocessing
to download and pre-process the VoxCelebA dataset. We use the lmdb
to improve I/O efficiency.
(Or you can rewrite the Class VoxDataset
in dataset.py
to load data with .mp4
directly.)
- Train DPE from scratch:
python train.py --data_root <DATA_PATH>
- (Optional) If you want to accelerate convergence speed, you can download the pre-trained model of LIA and rename it to
vox.pt
.
python train.py --data_root <DATA_PATH> --resume_ckpt <model_path for vox.pt>
If you find our work useful in your research, please consider citing:
@InProceedings{Pang_2023_CVPR,
author = {Pang, Youxin and Zhang, Yong and Quan, Weize and Fan, Yanbo and Cun, Xiaodong and Shan, Ying and Yan, Dong-Ming},
title = {DPE: Disentanglement of Pose and Expression for General Video Portrait Editing},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023},
pages = {427-436}
}
Part of the code is adapted from LIA, PIRenderer, STIT. We thank authors for their contribution to the community.
- StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN (ECCV 2022)
- CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior (CVPR 2023)
- VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild (SIGGRAPH Asia 2022)
- SadTalker๏ผ Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation (CVPR 2023)
- 3D GAN Inversion with Facial Symmetry Prior (CVPR 2023)
- T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations (CVPR 2023)
This is not an official product of Tencent. This repository can only be used for personal/research/non-commercial purposes.