original implementation of fast parallel speech signal generation from text with MFCC (2018)
- Text → (Location-based attention mechanism) → MFCC
- MFCC → (parallel recurrent network) → Speech Signal
This model is based on Alex Glaves「Generating Sequences With Recurrent Neural Networks」
Parallel speech signal generation vocoder model (based on WaveRNN)
WaveRNN math::
xt = [ct-1, ft-1, ct] # input
ut = σ(Ru ht-1 + Iu*xt + bu) # update gate
rt = σ(Rr ht-1 + Ir*xt + br) # reset gate
et = tanh(rt∘(Re ht-1) + Ie*xt + be) # recurrent unit
ht = ut∘ht-1 + (1-u)∘et # next hidden state
yc, yf = split(ht) # coarse, fine
P(ct) = softmax(O2 relu(O1 yc)) # coarse distribution
P(ft) = softmax(O4 relu(O3 yf)) # fine distribution