Skip to content

original implementation of fast parallel speech signal generation from text with MFCC

Notifications You must be signed in to change notification settings

kazukiotsuka/Alternative

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Alternative (Neura Voice)

original implementation of fast parallel speech signal generation from text with MFCC (2018)

Idea & Architecture

  • Text → (Location-based attention mechanism) → MFCC
  • MFCC → (parallel recurrent network) → Speech Signal

Text to Mel

FFTNet architecture

This model is based on Alex Glaves「Generating Sequences With Recurrent Neural Networks」

FFTNet architecture FFTNet architecture

FFTNet architecture

FFTNet architecture

FFTNet architecture

MFCC to Speech Signal

Parallel speech signal generation vocoder model (based on WaveRNN)

    WaveRNN math::
        xt = [ct-1, ft-1, ct]  # input
        ut = σ(Ru ht-1 + Iu*xt + bu)  # update gate
        rt = σ(Rr ht-1 + Ir*xt + br)  # reset gate
        et = tanh(rt∘(Re ht-1) + Ie*xt + be)  # recurrent unit
        ht = utht-1 + (1-u)∘et  # next hidden state
        yc, yf = split(ht)  # coarse, fine
        P(ct) = softmax(O2 relu(O1 yc))  # coarse distribution
        P(ft) = softmax(O4 relu(O3 yf))  # fine distribution

About

original implementation of fast parallel speech signal generation from text with MFCC

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published