-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[model_cards] Add a new model for Irish #6544
Conversation
Codecov Report
@@ Coverage Diff @@
## master #6544 +/- ##
==========================================
- Coverage 80.52% 79.39% -1.14%
==========================================
Files 156 156
Lines 28108 28129 +21
==========================================
- Hits 22633 22332 -301
- Misses 5475 5797 +322
Continue to review full report at Codecov.
|
**Training data:** | ||
* [PARSEME 1.2](https://gitlab.com/parseme/parseme_corpus_ga/-/blob/master/README.md) | ||
* Newscrawl 300k portion of the [Leipzig Corpora](https://wortschatz.uni-leipzig.de/en/download/irish) | ||
* Private news corpus crawled with [Corpus Crawler](https://github.com/google/corpuscrawler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How large was the total dataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2125804 sentences, 47419062 (wc) tokens. Pretty small, but a lot bigger than the one from Turku.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to add to the model card! (we'll make it way easier to update in the next few weeks)
@@ -0,0 +1,21 @@ | |||
--- | |||
language: ga |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you get a chance to test the Opus-MT translation models for Irish? https://huggingface.co/models?filter=ga
Are they good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked. I worked with MT for several years, did a master's on it and burned myself out completely writing the dissertation, so I wasn't really inclined to check.
I get 'Can't load config' on both models from the website; the English->Irish model seems good enough for the sort of short sentences I can confidently rate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, thanks!
cc @sshleifer and @mfuntowicz for the inference API issue
Really cool! If you get a chance, do you think you could add sample inputs for Irish to https://github.com/huggingface/widgets-server/blob/master/DefaultWidget.ts? |
This reverts commit 48d352d.
No description provided.