Code for CVPR19 paper "Monocular Total Capture: Posing Face, Body and Hands in the Wild"
Project website: [http://domedb.perception.cs.cmu.edu/mtc.html]
This code is tested on a Ubuntu 16.04 machine with a GTX 1080Ti GPU, with the following dependencies.
- ffmpeg
- Python 3.5 (with TensorFlow 1.5.0, OpenCV, Matplotlib, packages installed with pip3)
- cmake >= 2.8
- OpenCV 2.4.13 (compiled from source with CUDA 9.0, CUDNN 7.0)
- Ceres-Solver 1.13.0 (with SuiteSparse)
- OpenGL, GLUT, GLEW
- libigl https://github.com/libigl/libigl
- wget
- OpenPose
- git clone this repository; suppose the main directory is ${ROOT} on your local machine.
- "cd ${ROOT}"
- "bash download.sh"
- git clone OpenPose https://github.com/CMU-Perceptual-Computing-Lab/openpose and compile. Suppose the main directory of OpenPose is ${openposeDir}, such that the compiled binary is at ${openposeDir}/build/examples/openpose/openpose.bin
- Edit ${ROOT}/run_pipeline.sh: set line 13 to you ${openposeDir}
- Edit ${ROOT}/FitAdam/CMakeLists.txt: set line 13 to the "include" directory of libigl (this is a header only library)
- "cd ${ROOT}/FitAdam/ && mkdir build && cd build"
- "cmake .."
- "make -j12"
- Suppose the video to be tested is named "${seqName}.mp4". Place it in "${ROOT}/${seqName}/${seqName}.mp4".
- If the camera intrinsics is known, put it in "${ROOT}/${seqName}/calib.json" (refer to "POF/calib.json" for example); otherwise, a default camera intrinsics will be used.
- In ${ROOT}, run "bash run_pipeline.sh ${seqName}"; if the subject in the video shows only upper body, run "bash run_pipeline.sh ${seqName} -f".
- Install NVIDIA Docker
- Build the docker image
docker build . --tag mtc
- Running the docker image:
xhost local:root
docker run --gpus 0 -it -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY -e XAUTHORITY -e NVIDIA_DRIVER_CAPABILITIES=all mtc
- Once inside (should be in /opt/mtc by default):
bash run_pipeline.sh example_speech -f
Tested on Ubuntu 16.04 and 18.04 with Titan Xp and Titan X Maxwell (External w/Razer Core).
"download.sh" automatically download 2 example videos to test. After successful installation run
bash run_pipeline.sh example_dance
or
bash run_pipeline.sh example_speech -f
This code can only be used for non-commercial research purposes. If you use this code in your research, please cite the following papers.
@inproceedings{xiang2019monocular,
title={Monocular total capture: Posing face, body, and hands in the wild},
author={Xiang, Donglai and Joo, Hanbyul and Sheikh, Yaser},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2019}
}
@inproceedings{joo2018total,
title={Total capture: A 3d deformation model for tracking faces, hands, and bodies},
author={Joo, Hanbyul and Simon, Tomas and Sheikh, Yaser},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2018}
}
Some part of this code is modified from lmb-freiburg/hand3d.
We use the deformable human model Adam in this code.
The relationship between Adam and SMPL: The body part of Adam is derived from SMPL model by Loper et al. 2015. It follows SMPL's body joint hierarchy, but uses a different joint regressor. Adam does not contain the original SMPL model's shape and pose blendshapes, but uses its own version trained from Panoptic Studio database.
The relationship between Adam and FaceWarehouse: The face part of Adam is derived from FaceWarehouse. In particular, the mesh topology of face of Adam is a modified version of the learned model from FaceWarehouse dataset. Adam does not contain the blendshapes of the original FaceWarehouse data, and facial expression of Adam model is unavailable due to copyright issues.
The Adam model is shared for research purpose only, and cannot be used for commercial purpose. Redistributing the original or modified version of Adam is also not allowed without permissions.
- In our code, the output of ceres::AngleAxisToRotationMatrix is always a RowMajor matrix, while the function is designed for a ColMajor matrix. To account for this, please treat our output pose parameters as the opposite value. In other words, before exporting our pose parameter to other softwares, please multiply them by -1.