Training on a more diverse dataset #45

nickhward · 2024-08-25T16:32:22Z

Thank you for your paper!

I have been applying your model to a more diverse dataset consisting of approximately 3,000 speakers and around 1,000 hours of audio data. However, I have observed that the model's performance diminishes with such a diverse dataset. I am reaching out to ask if you have any recommendations or best practices for training the model to enhance its generalization capabilities, particularly when dealing with a wide variety of speakers and audio conditions.

I appreciate any advice or insights you could share.

Thank you!

yxlu-0102 · 2024-08-26T02:13:08Z

My understanding is that this diverse dataset probably includes not just noise, but also reverberation or bandwidth limitations.

First, I recommend replacing the bounded LSigmoid with an unbounded PReLU as the activation function for the mask.

If there are band limitations, you might also consider adding a waveform discriminator.

nickhward · 2024-08-26T06:32:17Z

Thank you for the fast response.

For your first recommendation of using an unbounded PReLU in place of the bounded LSigmoid, wouldn't the negative values produced by the unbounded PReLU be problematic for the magnitude spectrogram?

yxlu-0102 · 2024-08-27T05:17:32Z

Thank you for the fast response.

For your first recommendation of using an unbounded PReLU in place of the bounded LSigmoid, wouldn't the negative values produced by the unbounded PReLU be problematic for the magnitude spectrogram?

It's more reasonable to use a ReLU activation, but in my implementations, I found that PReLU also worked.

nickhward · 2024-08-27T05:50:47Z

Did you clip the negative values of mag to zero in the mag_pha_istft function to make PReLU work? Or did you set compress factor to 1 to avoid NAN issues?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training on a more diverse dataset #45

Training on a more diverse dataset #45

nickhward commented Aug 25, 2024

yxlu-0102 commented Aug 26, 2024

nickhward commented Aug 26, 2024

yxlu-0102 commented Aug 27, 2024

nickhward commented Aug 27, 2024

Training on a more diverse dataset #45

Training on a more diverse dataset #45

Comments

nickhward commented Aug 25, 2024

yxlu-0102 commented Aug 26, 2024

nickhward commented Aug 26, 2024

yxlu-0102 commented Aug 27, 2024

nickhward commented Aug 27, 2024