Adding Adversarial loss and perceptual loss (VGGface) to deepfakes'(reddit user) auto-encoder architecture.
Date | Update |
---|---|
2018-07-25 | Data preparation: Add a new notebook for video pre-processing in which MTCNN is used for face detection as well as face alignment. |
2018-07-04 | GAN training: Add the relativistic discriminator as an alternative option to the default mixup training method. Set loss_config["gan_training"]="relativistic_avg_LSGAN" in config cells to enable it. |
2018-06-29 | Model architecture: faceswap-GAN v2.2 now supports different output resolutions: 64x64, 128x128, and 256x256. Default RESOLUTION = 64 can be changed in the config cell of v2.2 notebook. |
2018-06-25 | New version: faceswap-GAN v2.2 has been released. The main improvements of v2.2 model are its capability of generating realistic and consistent eye movements (results are shown below, or Ctrl+F for eyes), as well as higher video quality with face alignment. |
2018-06-06 | Model architecture: Add a self-attention mechanism proposed in SAGAN into V2 GAN model. (Note: There is still no official code release for SAGAN, the implementation in this repo. could be wrong. We'll keep an eye on it.) |
-
FaceSwap_GAN_v2.2_train_test.ipynb
- Notebook for model training of faceswap-GAN model version 2.2.
- Require additional training images generated through prep_binary_masks.ipynb.
-
FaceSwap_GAN_v2.2_video_conversion.ipynb
- Notebook for video conversion of faceswap-GAN model version 2.2.
- Face alignment using 5-points landmarks is introduced to video conversion.
-
- Notebook for training data preprocessing. Output binary masks are save in
./binary_masks/faceA_eyes
and./binary_masks/faceB_eyes
folders. - Require face_alignment package. (An alternative method for generating binary masks (not requiring
face_alignment
anddlib
packages) can be found in MTCNN_video_face_detection_alignment.ipynb.)
- Notebook for training data preprocessing. Output binary masks are save in
-
Usage
- Run MTCNN_video_face_detection_alignment.ipynb to extract faces from videos.
- Run prep_binary_masks.ipynb to create binary masks of training images.
- Run FaceSwap_GAN_v2.2_train_test.ipynb to train a model.
- Run FaceSwap_GAN_v2.2_video_conversion.ipynb to produce videos using the trained model in step 3.
-
- Notebook for training the version 2 GAN model.
- Video conversion functions are also included.
-
FaceSwap_GAN_v2_test_video_MTCNN.ipynb
- Notebook for generating videos. Use MTCNN for face detection.
-
faceswap_WGAN-GP_keras_github.ipynb
- This notebook is an independent training script for a GAN model of WGAN-GP in which perceptual loss is discarded for simplicity.
- Training can be start easily as the following:
gan = FaceSwapGAN() # instantiate the class gan.train(max_iters=10e4, save_interval=500) # start training
-
FaceSwap_GAN_v2_sz128_train.ipynb
- This notebook is an independent script for a model with 128x128 input/output resolution.
- MTCNN_video_face_detection_alignment.ipynb
- This notebook performs face detection/alignment on the input video.
- Detected faces are saved in
./faces/raw_faces
and./faces/aligned_faces
for non-aligned/aligned results respectively. - Crude eyes binary masks are also generated and saved in
./faces/binary_masks_eyes
. These binary masks can serve as a suboptimal alternative to masks generated through prep_binary_masks.ipynb.
- Face images are supposed to be in
./faceA/
or./faceB/
folder for each taeget respectively. - Images will be resized to 256x256 during training.
-
Improved output quality: Adversarial loss improves reconstruction quality of generated images.
-
Additional results: This image shows 160 random results generated by v2 GAN with self-attention mechanism (image format: source -> mask -> transformed).
-
Consistent eye movements (v2.2 model): Results of the v2.2 model which specializes on eye direcitons are presented below. V2.2 model generates more realistic eyes within shorter training iteations. (Input gifs are created using DeepWarp.)
-
Evaluations: Evaluations of the output quality on Trump/Cage dataset can be found here.
The Trump/Cage images are obtained from the reddit user deepfakes' project on pastebin.com.
-
VGGFace perceptual loss: Perceptual loss improves direction of eyeballs to be more realistic and consistent with input face. It also smoothes out artifacts in the segmentation mask, resulting higher output quality.
-
Attention mask: Model predicts an attention mask that helps on handling occlusion, eliminating artifacts, and producing natrual skin tone. In below are results transforming Hinako Sano (佐野ひなこ) to Emi Takei (武井咲).
- From left to right: source face, swapped face (before masking), swapped face (after masking).
- From left to right: source face, swapped face (after masking), mask heatmap.
Source video: 佐野ひなことすごくどうでもいい話?(遊戯王)
-
Configurable input/output resolution (v2.2): The model supports 64x64, 128x128, and 256x256 outupt resolutions.
-
Face tracking/alignment using MTCNN and Kalman filter during video conversion:
- MTCNN is introduced for more stable detections and reliable face alignment (FA).
- Kalman filter smoothen the bounding box positions over frames and eliminate jitter on the swapped face.
-
Training schedule: Notebooks for training provide a predefined training schedule. The above Trump/Cage face-swapping are generated by model trained for 21k iters using
TOTAL_ITERS = 30000
predefined training schedule.- Training tricks: Swapping the decoders in the late stage of training reduces artifacts caused by the extreme facial expressions. E.g., some of the failure cases (of results above) having their mouth open wide are better transformed using this trick.
-
Eyes-aware training: Introduce high reconstruction loss and edge loss around eyes area, which guides the model to generate realistic eyes.
- V2.1 model: An improved architecture is updated in order to stablize training. The architecture is greatly inspired by XGAN
and MS-D neural network. (Note: V2.1 script is experimental and not well-maintained)- V2.1 model provides three base architectures: (i) XGAN, (ii) VAE-GAN, and (iii) a variant of v2 GAN. (default
base_model="GAN"
) - FCN8s for face segmentation is introduced to improve masking in video conversion (default
use_FCN_mask = True
).- To enable this feature, keras weights file should be generated through jupyter notebook provided in this repo.
- V2.1 model provides three base architectures: (i) XGAN, (ii) VAE-GAN, and (iii) a variant of v2 GAN. (default
- The following illustration shows a very high-level and abstract (but not exactly the same) flowchart of the denoising autoencoder algorithm. The objective functions look like this.
- Model performs its full potential when the input images are preprocessed with face alignment methods.
- keras 2.1.5
- Tensorflow 1.6.0
- Python 3.6.4
- OpenCV
- keras-vggface
- moviepy
- prefetch_generator (required for v2.2 model)
- face-alignment (required as preprocessing for v2.2 model)
Code borrows from tjwei, eriklindernoren, fchollet, keras-contrib and reddit user deepfakes' project. The generative network is adopted from CycleGAN. Weights and scripts of MTCNN are from FaceNet. Illustrations are from irasutoya.