-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to train a multilingual model, is there a script for it? #1656
Comments
w2v-conformer don't use any text information to calculate pretrain loss . But in order not to change the wenet training pipeline,you can fill in any text unit just like 'A' for multilingual wavs. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I see that the w2v-conformer pre-trained model is trained using a multilingual dataset. Currently I have not found a relevant multilingual training solution or script.
Some of the problems encountered so far are how to choose the text modeling unit, is it BPE or char or something else?
The text was updated successfully, but these errors were encountered: