Skip to content

Latest commit

 

History

History
48 lines (35 loc) · 1.74 KB

Description.md

File metadata and controls

48 lines (35 loc) · 1.74 KB

Description

training inference
(a) Training (b) Inference
training inference
(c) Training (w/ optional properties) (d) Inference (w/ optional properties)

Modules

Encoder: 4 WN layers

Decoder: 20 conditional WN layers

Vocoder: From HiFTNet

Discriminator: From HiFTNet

F0 Adjustment: Adjust F0 of source speech to suit the characteristics of target speaker

$$ \begin{align} lf0_{src} &= \ln(f0_{src}) \\ lf0_{adj} &= lf0_{src} - mean(lf0_{src}) + \ln(mean(f0_{tgt})) \\ f0_{adj} &= e^{lf0_{adj}} \\ \end{align} $$

Optional

Transform: SR-based augmentation in an on-the-fly manner. Specify use_aug: true in config_v1_16k.json to enable this properties.

F0 Finetune: In some corner cases the $mean(lf0_{src})$ is inaccurate, which causes speaker dissimilarity of converted speech. This properties makes use of ASV model scores to automatically finetune the adjusted F0. Specify --search when using convert_*.py to enable this properties.