Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training on a more diverse dataset #45

Open
nickhward opened this issue Aug 25, 2024 · 4 comments
Open

Training on a more diverse dataset #45

nickhward opened this issue Aug 25, 2024 · 4 comments

Comments

@nickhward
Copy link

Thank you for your paper!

I have been applying your model to a more diverse dataset consisting of approximately 3,000 speakers and around 1,000 hours of audio data. However, I have observed that the model's performance diminishes with such a diverse dataset. I am reaching out to ask if you have any recommendations or best practices for training the model to enhance its generalization capabilities, particularly when dealing with a wide variety of speakers and audio conditions.

I appreciate any advice or insights you could share.

Thank you!

@yxlu-0102
Copy link
Owner

My understanding is that this diverse dataset probably includes not just noise, but also reverberation or bandwidth limitations.

First, I recommend replacing the bounded LSigmoid with an unbounded PReLU as the activation function for the mask.

If there are band limitations, you might also consider adding a waveform discriminator.

@nickhward
Copy link
Author

Thank you for the fast response.

For your first recommendation of using an unbounded PReLU in place of the bounded LSigmoid, wouldn't the negative values produced by the unbounded PReLU be problematic for the magnitude spectrogram?

@yxlu-0102
Copy link
Owner

Thank you for the fast response.

For your first recommendation of using an unbounded PReLU in place of the bounded LSigmoid, wouldn't the negative values produced by the unbounded PReLU be problematic for the magnitude spectrogram?

It's more reasonable to use a ReLU activation, but in my implementations, I found that PReLU also worked.

@nickhward
Copy link
Author

Did you clip the negative values of mag to zero in the mag_pha_istft function to make PReLU work? Or did you set compress factor to 1 to avoid NAN issues?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants