You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @sunnnnnnnny, the samples on the webpage are generated from the actual mel spectrogram. I haven't had the chance to experiment with something like tacotron yet but the model does seem to work reasonably well on "smoothed" spectrograms. For example, the following spec was reconstructed using an L2 loss and with a vector quantized autoencoder (VQVAE):
Compared to the original:
I've attached the audio generated by the reconstucted spectrogram. A little noiser than the original but not too bad (also, the VQVAE may be causing some loss of quality). sample.zip
The audio corresponds to the first sample from speaker V002 on the webpage.
“A PyTorch implementation of Robust Universal Neural Vocoding. Audio samples can be found here.“
The link you gave here is the sample you generated is the actual spectrum feed or the acoustic model predicted?
The text was updated successfully, but these errors were encountered: