-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VoiceFilter realization problem #5
Comments
Actually, the resulting SDR strongly depends on the test data that we use. |
I used |
Hi @va-volokhov , I also faced a similar situation. The chosen test dataset caused this difference. As you can see from @seungwonpark 's evaluation code, there is only one validation. Therefore if your test data were selected from the different folder, you would get a different result. However, It will be still difficult to achieve a similar score from the paper. In the original paper, the median and mean SDR is already high even before the separation, which means when it mixes two audio files, interference audio is not fully interfering the all clean audio (shorter length maybe?). Therefore, some of the rear parts of mixed audio may be just clean audio. Therefore, the easiest way to compare your performance is, as @seungwonpark said, comparing to the published sample. And the original paper's published sample as well (https://google.github.io/speaker-id/publications/VoiceFilter/). By the way, thank you for the excellent implementation @seungwonpark, including all the preprocessing. |
Hi @thejungwon, thank you for your answer. Yes, after excluding the train-other-500 subset from training and testing on data from dev-clean, the SDR behavior becomes similar to that of @seungwonpark. Thank you for the help @thejungwon! Thank you for the excellent implementation @seungwonpark! |
Hi @thejungwon, thanks for pointing out that excluding I would also like to thank @va-volokhov for kindly sharing this issue here. |
@va-volokhov @seungwonpark Can you please suggest how to use 2 GPU to train using this code. Using single GPU is too slow. Is there any degradation in performance when we move from 1 to 2 GPU. @va-volokhov Seems you have used even more GPUs. Do you mind sharing the code snippets that can allow us to use this code on 2 GPUs? Thanks guys! |
do you mean this? |
I have encounter the same problem, do you have a way to solve it? |
Seungwon, hello.
My name is Vladimir. I am a researcher at Speech Technology Center, Russia, St. Petersburg. Your implementation of the VoiceFilter algorithm (https://github.com/mindslab-ai/voicefilter) is very interesting to me and my colleagues. Unfortunately, we could not get the SDR metric dynamics like yours, using your code with the standard settings in the default.yaml file. SDR converged to 4.5 dB after 200k iterations (see figure below), but not to 10 dB after 65k as in your results. Could you tell us your training settings, as well as the neural network architecture that you used to get your result?
Our python environment:
We use four Nvidia GeForce GTX 1080 Ti when training one VoiceFilter's model. Subsets train-clean-100, train-clean-360 and train-other-500 from LibriSpeech dataset are used to train VoiceFilter's model and dev-clean is used to test VoiceFilter's model. We use the pretrained d-vector model to encode the target speaker.
We used your default configuration file:
The neural network architecture was standard and followed your implementation.
The text was updated successfully, but these errors were encountered: