MUNIT-keras

A keras (tensorflow) reimplementation of MUNIT: Multimodal Unsupervised Image-to-Image Translation

Multimodal Unsupervised Image-to-Image Translation

Xun Huang, Ming-Yu Liu, Serge Belongie, Jan Kautz

Deviation from the official implementation

~~Use group normalization instead of layer normalization in upscaling blocks.~~
- Model using group norm (group=8) failed on reconstructing edge images of edges2shoe dataset.
Use mixup technique for training.
Input/Output size is defaulted 128x128.
Use only 3 res blocks (instead of 4) as default in content encoder/decoder in order to reduce training time.
- However, I'm worrying that this decreases the receptive field size so that the output quality becomes worse.
Upscaling blocks use conv2d having kernel_size = 3 instead of 4.

Environment

Google Colab

Result

Edges2shoes (config. 1)
- Cyclic reconstruction loss weight = 1 for the first 80k iters and 0.3 for the rest.
- Input/Output size: 64x64.
- Training iterations: ~130k.
- Optimization: Use mixup technique for the first 80k iters.
Edges2shoes (config. 2)
- Cyclic reconstruction loss weight = 10
- Input/Output size: 64x64.
- Training iterations: ~70k.
- Optimization: Use mixup technique for the entire training process.
- Model performed better on guided translation (generated more detail and clearer edges) when using high reconstruction loss?

Acknowledgement

Code heavily inspired by official MUNIT pytorch implementation. Also borrow code from eridgd and tjwei.