Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[model_cards] Add a new model for Irish #6544

Merged
merged 1 commit into from
Aug 17, 2020

Conversation

jimregan
Copy link
Contributor

No description provided.

@julien-c julien-c added the model card Related to pretrained model cards label Aug 17, 2020
@codecov
Copy link

codecov bot commented Aug 17, 2020

Codecov Report

Merging #6544 into master will decrease coverage by 1.13%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6544      +/-   ##
==========================================
- Coverage   80.52%   79.39%   -1.14%     
==========================================
  Files         156      156              
  Lines       28108    28129      +21     
==========================================
- Hits        22633    22332     -301     
- Misses       5475     5797     +322     
Impacted Files Coverage Δ
src/transformers/tokenization_t5.py 96.73% <100.00%> (+0.96%) ⬆️
src/transformers/modeling_tf_openai.py 22.58% <0.00%> (-72.26%) ⬇️
src/transformers/tokenization_xlm.py 16.26% <0.00%> (-66.67%) ⬇️
src/transformers/tokenization_mbart.py 56.25% <0.00%> (-39.07%) ⬇️
src/transformers/tokenization_transfo_xl.py 33.56% <0.00%> (-8.93%) ⬇️
src/transformers/tokenization_auto.py 95.55% <0.00%> (-2.23%) ⬇️
src/transformers/modeling_openai.py 80.96% <0.00%> (-1.30%) ⬇️
src/transformers/modeling_tf_utils.py 86.31% <0.00%> (-0.98%) ⬇️
src/transformers/configuration_utils.py 95.91% <0.00%> (-0.69%) ⬇️
src/transformers/generation_tf_utils.py 86.46% <0.00%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 407da12...d9608f8. Read the comment docs.

**Training data:**
* [PARSEME 1.2](https://gitlab.com/parseme/parseme_corpus_ga/-/blob/master/README.md)
* Newscrawl 300k portion of the [Leipzig Corpora](https://wortschatz.uni-leipzig.de/en/download/irish)
* Private news corpus crawled with [Corpus Crawler](https://github.com/google/corpuscrawler)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How large was the total dataset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2125804 sentences, 47419062 (wc) tokens. Pretty small, but a lot bigger than the one from Turku.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to add to the model card! (we'll make it way easier to update in the next few weeks)

@@ -0,0 +1,21 @@
---
language: ga
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you get a chance to test the Opus-MT translation models for Irish? https://huggingface.co/models?filter=ga

Are they good?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked. I worked with MT for several years, did a master's on it and burned myself out completely writing the dissertation, so I wasn't really inclined to check.
I get 'Can't load config' on both models from the website; the English->Irish model seems good enough for the sort of short sentences I can confidently rate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, thanks!

cc @sshleifer and @mfuntowicz for the inference API issue

@julien-c
Copy link
Member

Really cool! If you get a chance, do you think you could add sample inputs for Irish to https://github.com/huggingface/widgets-server/blob/master/DefaultWidget.ts?

@julien-c julien-c merged commit 3a30290 into huggingface:master Aug 17, 2020
@jimregan jimregan deleted the patch-1 branch August 17, 2020 20:35
Zigur pushed a commit to Zigur/transformers that referenced this pull request Oct 26, 2020
fabiocapsouza pushed a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
fabiocapsouza added a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model card Related to pretrained model cards
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants