Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any plan for WORLD vocoder? #7

Closed
lsq357 opened this issue Dec 4, 2017 · 15 comments
Closed

Any plan for WORLD vocoder? #7

lsq357 opened this issue Dec 4, 2017 · 15 comments
Labels

Comments

@lsq357
Copy link

lsq357 commented Dec 4, 2017

Any plan for WORLD vocoder for Multi-Speaker TTS

@r9y9
Copy link
Owner

r9y9 commented Dec 4, 2017

Not currently planned. I wish I had more time..

@lsq357 lsq357 closed this as completed Dec 4, 2017
@r9y9
Copy link
Owner

r9y9 commented Dec 13, 2017

I will leave this open to track progress on it. Not currently planned, though.

@r9y9 r9y9 reopened this Dec 13, 2017
@r9y9
Copy link
Owner

r9y9 commented Dec 16, 2017

Seems like there's a folk trying to support WORLD vocoder. https://github.com/geneing/deepvoice3_pytorch

@DarkDefender
Copy link

@r9y9 Thanks for the heads up!

I'm actually really interested in how this turns out. As the WORLD vocoder is used in the "UTAU" music software. If one managed to make the network be able to train successfully with this then I think we might be able to get rid of the "sound compression" artifacts that is present in most of the current deepvoice/tacotron implementations...

And example of the sound quality possible with UTAU (and therefore WORLD):
https://www.youtube.com/watch?v=Es_5kvVtiNA

@geneing would you mind keeping us updated with your progess? Even if the results are not good.

@geneing
Copy link

geneing commented Dec 18, 2017

Replacing Griffin-Lim with World vocoder seems to be fairly straightforward. Full transform for 22KHz signal is length 1027 vs 80 for mel output. World vocoder includes an encoder for aperiodicity and spectrogram, which reduces output to length of 131.

@lsq357
Copy link
Author

lsq357 commented Dec 18, 2017

In my view, using WORLD vocoder, the network only changes the output shape and adds multi-output, which WORLD vocoder need at least three parameters(f0, aperiodicity, spectrogram).
Moreover, it can add WORLD parameters(f0, aperiodicity spectrogram) and mel-outputs to loss function which speed convergence.(the idea is my guess!)

@DarkDefender
Copy link

DarkDefender commented Dec 20, 2017

BTW if anyone is interested in singing neural networks. Then I just found this:
http://www.dtic.upf.edu/~mblaauw/NPSS/

The spanish output sounds really awesome I think. The english and japanese sounds a little bit too stilted. But I guess that depends on what kind of dataset and music you throw at it.

Edit: forgot to mention that it seems to use the WORLD vocoder

@geneing
Copy link

geneing commented Dec 22, 2017

In the view of the Tacotron 2 paper, it appears that WaveNet may be a better choice. Looking into it.

@lsq357
Copy link
Author

lsq357 commented Dec 23, 2017

It needs much more GPUs to train Wavenet for me(in Tacotron 2 use 32 GPUs ).
And WORLD vocoder can use only in cpu.

@r9y9
Copy link
Owner

r9y9 commented Dec 23, 2017

Does anybody have experience working on WaveNet? Is it impossible to train WaveNet with only 1 GPU in practice?

@lsq357
Copy link
Author

lsq357 commented Dec 23, 2017

I experience WaveNet on two 1080Ti GPUs, it only train 3k+ steps(asyn update) each day.,batch size =32.

I try QuasiRNN + WaveNet in DeepVoice2 or DeepVoice, but my tensorflow code of QuasiRNN not speed up!
I only train a week, and not sucess.

@r9y9
Copy link
Owner

r9y9 commented Dec 31, 2017

I started to implement the WaveNet vocoder. Check out r9y9/wavenet_vocoder#1 (comment) if you are interested.

@MlWoo
Copy link

MlWoo commented Jul 10, 2018

@geneing Have you trained your model with "world"? Could you provide some audio samples?

@stale
Copy link

stale bot commented May 30, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label May 30, 2019
@stale stale bot closed this as completed Jun 6, 2019
@hash2430
Copy link

I made one myself.
https://github.com/hash2430/dv3_world
Anyone who needs it are welcome to use.
I will upload sample audios soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants