Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Did you use this repo to train a vocoder? #2

Open
syang1993 opened this issue Jul 20, 2018 · 3 comments
Open

Did you use this repo to train a vocoder? #2

syang1993 opened this issue Jul 20, 2018 · 3 comments

Comments

@syang1993
Copy link

@fatchord Hi, happy to see you again! I'm also working on the FFTNet. But in my experiments, I cannot get the similar results of the paper's demo page, mainly about conditional sampling and post-denoising. Do you try to reconstruct their results? Thanks.

@fatchord
Copy link
Owner

@syang1993 Hi, how's it going? Yeah, I'm having similar problems - here's what my conditioned model sounds like after 300k steps: (used 80-band mel-spectrograms)

300k_steps.wav.tar.gz

I haven't implemented the noise reduction, is that algorithm publicly available? I had a quick look around and couldn't find it.

As for conditional sampling - I was going to implement a simple threshold or perhaps an exponential moving average from the summed values in the conditioning frames - and use that to differentiate between a voiced/unvoiced state. But haven't got around to it yet so perhaps that's why it doesn't sound so good.

I'm curious what your implementation sounds like - any chance you could post a sample?

@syang1993
Copy link
Author

@fatchord I also used the 80-band mel-spectrogram to train my model. Since the author cited a book for noise reduction, I don't know what specific method they use, maybe the wiener filtering?

Since I'm on a summer vocation, I can't send you my samples. But you can listen the generated-model.ckpt-200000.ema.pt.wav in syang1993/FFTNet#2 , my results are the same like that (without condition sampling and noise-reduction). It contains strident audio at some positions. When I tried to use random sampling rather argmax, the generated speech will get noisy.

@alirezag
Copy link

@fatchord this is not bad at all, although I know the goal is to replicate the paper quality results.

Repository owner deleted a comment from ha0min Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants