-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training on a more diverse dataset #45
Comments
My understanding is that this diverse dataset probably includes not just noise, but also reverberation or bandwidth limitations. First, I recommend replacing the bounded LSigmoid with an unbounded PReLU as the activation function for the mask. If there are band limitations, you might also consider adding a waveform discriminator. |
Thank you for the fast response. For your first recommendation of using an unbounded PReLU in place of the bounded LSigmoid, wouldn't the negative values produced by the unbounded PReLU be problematic for the magnitude spectrogram? |
It's more reasonable to use a ReLU activation, but in my implementations, I found that PReLU also worked. |
Did you clip the negative values of mag to zero in the mag_pha_istft function to make PReLU work? Or did you set compress factor to 1 to avoid NAN issues? |
Thank you for your paper!
I have been applying your model to a more diverse dataset consisting of approximately 3,000 speakers and around 1,000 hours of audio data. However, I have observed that the model's performance diminishes with such a diverse dataset. I am reaching out to ask if you have any recommendations or best practices for training the model to enhance its generalization capabilities, particularly when dealing with a wide variety of speakers and audio conditions.
I appreciate any advice or insights you could share.
Thank you!
The text was updated successfully, but these errors were encountered: