Skip to content
This repository has been archived by the owner on Oct 31, 2022. It is now read-only.

Is possible to use this train script for another language dataset? In order to train it from start in a new language. #49

Open
nikkon3 opened this issue Apr 3, 2020 · 2 comments

Comments

@nikkon3
Copy link

nikkon3 commented Apr 3, 2020

No description provided.

@nikkon3
Copy link
Author

nikkon3 commented Apr 3, 2020

i am interested to train this model in my dataset , greek text. Can anyone say me if tried something similar?
How much data i would need?
What edit maybe i should do, to this train script in order to start with my dataset?

@nikkon3 nikkon3 changed the title is possible to use this train script for another language dataset? in order to train it from start in a new language Is possible to use this train script for another language dataset? In order to train it from start in a new language. Apr 3, 2020
@ZheMann
Copy link

ZheMann commented Apr 4, 2020

@nikkon3 I tried something similar last year for the Dutch language. Perhaps you should take a look at my GitHub Repo. Back then, 117M and 345M versions of GPT-2 were the only ones available, but I think you'll get the idea of how to train it for different languages.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants