-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Controllable and Interpretable Singing Voice Decomposition via Assem-VC #27
Comments
Is there any detail about the speaker embedding? Such as What model is used to generate it, whether it is pre-trained, and what data set is used |
@AK391 @980202006 |
Thank you! |
I have tried to reproduce this paper. After train the 'Decoder' model, I use this model to do GTA fine finetuning on the HiFi-GAN model you provide. After that, I try to control speaker Identity by just switching the speaker embedding to target speaker, which is the the I found that lyrics are hard to hear clearly. My dataset config: My speaker embedding dimension is 256.( It seems 256 is too large?) I want to know what could be the problem with my model? |
@iehppp2010 |
@wookladin I want to know if I need to do fine-tune the 'Cotatron' model on singing dataset to get better alignment result? |
@iehppp2010 |
@wookladin I use that checkpiont to train the Decoder model and fine tune HIFI-GAN vocoder. I found that when test with an audio if the fine-tuned Cotatron model never seen it, I can't get good sample quality. So, I want to know how to let the Cotatron model get better alignment on unseen sing audio? |
@iehppp2010, I am also trying to reproduce the results of this paper. I have one doubt regarding the dataset preparation: how did you split the files? In the paper it is said that "all singing voices are split between 1-12 seconds", did you do it manually for both CSD and NUS-48E, or how? Thanks!! |
just saw this paper https://arxiv.org/abs/2110.12676, when will the repo be updated for this thanks
The text was updated successfully, but these errors were encountered: