![]() |
![]() |
(a) Training | (b) Inference |
---|---|
![]() |
![]() |
(c) Training (w/ optional properties) | (d) Inference (w/ optional properties) |
Encoder: 4 WN layers
Decoder: 20 conditional WN layers
Vocoder: From HiFTNet
Discriminator: From HiFTNet
F0 Adjustment: Adjust F0 of source speech to suit the characteristics of target speaker
Transform: SR-based augmentation in an on-the-fly manner. Specify use_aug: true
in config_v1_16k.json
to enable this properties.
F0 Finetune: In some corner cases the --search
when using convert_*.py
to enable this properties.