A keras (tensorflow) reimplementation of MUNIT: Multimodal Unsupervised Image-to-Image Translation
Xun Huang, Ming-Yu Liu, Serge Belongie, Jan Kautz
Use group normalization instead of layer normalization in upscaling blocks.- Model using group norm (group=8) failed on reconstructing edge images of edges2shoe dataset.
- Use mixup technique for training.
- Input/Output size is defaulted 128x128.
- Use only 3 res blocks (instead of 4) as default in content encoder/decoder in order to reduce training time.
- However, I'm worrying that this decreases the receptive field size so that the output quality becomes worse.
- Upscaling blocks use conv2d having
kernel_size
= 3 instead of 4.
-
Edges2shoes (config. 1)
- Cyclic reconstruction loss weight = 1 for the first 80k iters and 0.3 for the rest.
- Input/Output size: 64x64.
- Training iterations: ~130k.
- Optimization: Use mixup technique for the first 80k iters.
-
Edges2shoes (config. 2)
- Cyclic reconstruction loss weight = 10
- Input/Output size: 64x64.
- Training iterations: ~70k.
- Optimization: Use mixup technique for the entire training process.
- Model performed better on guided translation (generated more detail and clearer edges) when using high reconstruction loss?
Code heavily inspired by official MUNIT pytorch implementation. Also borrow code from eridgd and tjwei.