Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training corpus of LASER? #150

Open
ever4244 opened this issue Aug 14, 2020 · 0 comments
Open

Training corpus of LASER? #150

ever4244 opened this issue Aug 14, 2020 · 0 comments

Comments

@ever4244
Copy link

Good Morning!
Would you release the script/link on how to get the training corpus of LASER?
Since it is a corpus combination, some corpus is cut from the original corpus. It is very difficult to replicate the original training corpus of LASER.

So I want to ask for a script/detailed guide to cut, combine, and process all the original corpus as the LASER does. If there a readily available corpus combination then it would be more wonderful. I actually just want a make a subset of LASER training corpus containing 'en, es, fr, de, it.'
Regards!
Wei

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant