Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whisper-large instead of whisper small? #131

Open
sleepingcat4 opened this issue Feb 15, 2025 · 7 comments
Open

Whisper-large instead of whisper small? #131

sleepingcat4 opened this issue Feb 15, 2025 · 7 comments

Comments

@sleepingcat4
Copy link

I read the code and it is very clear if I change Whsiper-small to Whisper larger, what output dim, I should change? @Plachtaa do you have any hints or directions?

@Plachtaa
Copy link
Owner

Plachtaa commented Feb 15, 2025

hi @sleepingcat4 , simply change model_params.length_regulator.in_channels to 1280 in the config file to match whisper-large encoder output dim should work, don't forget to finetune the model after you changed so

@sleepingcat4
Copy link
Author

sleepingcat4 commented Feb 15, 2025

thanks @Plachtaa for the quick answer! btw if I want to increase parameters to upto 1B, what changes to the DiT architecture should be made? do you have any advice

@Plachtaa
Copy link
Owner

I don't really suggest you to do so as the merit of VC model should be real-time and lightweight, it is not a difficult task that it worth's scaling up to 1B

@sleepingcat4
Copy link
Author

@Plachtaa I wanted to experiment and see if how it may behave since I had some spare compute. I was thinking to increase the number of hidden dim of DiT but if you could suggest some advice for experimentation only, it would be nice.

@Plachtaa
Copy link
Owner

for your reference

Image

@sleepingcat4
Copy link
Author

@Plachtaa thank you for being so helpful. another question, if I change this voice encoder model from "nvidia/bigvgan_v2_22khz_80band_256x" to "https://huggingface.co/nvidia/bigvgan_v2_44khz_128band_512x" what param I should change in the config?

@Plachtaa
Copy link
Owner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants