Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Ner pipeline grouped_entities fixes #5970

Merged
merged 23 commits into from
Nov 3, 2020

Conversation

cceyda
Copy link
Contributor

@cceyda cceyda commented Jul 22, 2020

There are many issues with ner pipeline using grouped_entities=True
#5077
#4816
#5730
#5609
#6514
#5541

  • [Bug Fix] add an option ignore_subwords to ignore subsequent ##wordpieces in predictions. Because some models train on only the first token of a word and not on the subsequent wordpieces (BERT NER default). So it makes sense doing the same thing at inference time.

    • The simplest fix is to just group the subwords with the first wordpiece.
      • [TODO] how to handle ignored scores? just set them to 0 and calculate zero invariant mean ?
      • [TODO] handle different wordpiece_prefix ## ? possible approaches:
        get it from tokenizer? but currently most tokenizers dont have a wordpiece_prefix property?
        have an _is_subword(token)
  • [Feature add] added option to skip_special_tokens. Cause It was harder to remove them after grouping.

  • [Additional Changes] remove B/I prefix on returned grouped_entities

Edit: Ignored subwords' scores are also ignored by setting them to nan and using nanmean
Edit: B entities of different type are separated (as per BIO tag definition)
Edit: skip_special_tokens is now the default behavior
Edit: ignore_subwords is now the default behavior
Edit: more flexibility for custom non-standard tokenizers through tokenizer.is_subword_fn, tokenizer.convert_tokens_to_string
Edit: [fix UNK token related bugs by mapping UNK tokens to the correct original string] Use fast tokenizer or pass offset_mapping

Usage

pipeline('ner', model=model, tokenizer=tokenizer, ignore_labels=[], grouped_entities=True, ignore_subwords=True)

Ceyda Cinarel added 3 commits July 22, 2020 19:35
… same type

	(B-type1 B-type1) != (B-type1 I-type1)
[Bug Fix] add an option `ignore_subwords` to ignore subsequent ##wordpieces in predictions. Because some models train on only the first token of a word and not on the subsequent wordpieces (BERT NER default). So it makes sense doing the same thing at inference time.
	The simplest fix is to just group the subwords with the first wordpiece.
	[TODO] how to handle ignored scores? just set them to 0 and calculate zero invariant mean ?
	[TODO] handle different wordpiece_prefix ## ? possible approaches:
		get it from tokenizer? but currently most tokenizers dont have a wordpiece_prefix property?
		have an _is_subword(token)
[Feature add] added option to `skip_special_tokens`. Cause It was harder to remove them after grouping.
[Additional Changes] remove B/I prefix on returned grouped_entities
[Feature Request/TODO] Return indexes?
[Bug TODO]  can't use fast tokenizer with grouped_entities ('BertTokenizerFast' object has no attribute 'convert_tokens_to_string')
Copy link
Contributor

@enzoampil enzoampil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @cceyda , this generally looks good to me with just some comments around:

  1. I think we can hard code skip_special_tokens=True, similar to the approach in TextGenerationPipeline.
  2. I'm wondering about why we shouldn't group B type entities? Hoping you can elaborate your rationale here
  3. Please make sure to add the test cases to NerPipelineTests in test_pipelines. These should be similar test cases to the ones you referenced in this PR (that used to fail, but now pass with these changes).
  4. Please update the tests to account for these changes. The current failures seem to be due to removal of the prefixes in "entity type".

Thanks!

src/transformers/pipelines.py Outdated Show resolved Hide resolved
src/transformers/pipelines.py Outdated Show resolved Hide resolved
src/transformers/pipelines.py Outdated Show resolved Hide resolved
src/transformers/pipelines.py Outdated Show resolved Hide resolved
@HHoofs
Copy link

HHoofs commented Aug 3, 2020

I'm wondering should the B & I part maybe separated from the entity type part? In the sense that you average the entities (disregarding the B/I part) and vice-versa. I now have the feeling that only the first subtoken decides whether the complete word is a B or an I.

@cceyda
Copy link
Contributor Author

cceyda commented Sep 1, 2020

I want to complete this but ran into another issue while working it:

All [UNK] tokens get mapped to [UNK] in the output, instead of the actual input token (because the code is getting from ids->tokens), Also [UNK]s gets lost when using skip_special_tokens (#6863)
While this is a simple token alignment issue and can be solved by using offset_mappings. offset_mappings is only available with fast tokenizers, I'm wondering what would be a more general approach to solving this?

@Monique497
Copy link

Monique497 commented Sep 13, 2020

Dear @cceyda,

In the last couple of days I started to work with Huggingface's transformers and especially NER-classification. I ran into issues that has been previously addressed in other issues you just mentioned at the beginning. Especially that subtokens that were classified with 'O' were not properly merged with the full token.

For example (Dutch):
sentence = "Als we Volkswagens OR-voorzitter Bernd Osterloh moeten geloven, dan moet dat binnen drie jaar het geval zijn."

Gives me as group entities:
[{'entity_group': 'B-per', 'score': 0.9999980926513672, 'word': 'Bern'},
{'entity_group': 'I-per', 'score': 0.9999990463256836, 'word': 'Ost'}]

I expect:
[{'entity_group': 'B-per', 'score': 0.9999980926513672, 'word': 'Bernd'},
{'entity_group': 'I-per', 'score': 0.9999990463256836, 'word': 'Osterloh'}]

However, the considered subtokens are classified as 'O':

{'word': '[CLS]', 'score': 0.9999999403953552, 'entity': 'O', 'index': 0}
{'word': 'Als', 'score': 0.9999999403953552, 'entity': 'O', 'index': 1}
{'word': 'we', 'score': 0.9999999403953552, 'entity': 'O', 'index': 2}
{'word': 'Volkswagen', 'score': 0.9999955296516418, 'entity': 'B-misc', 'index': 3}
{'word': '##s', 'score': 0.9999999403953552, 'entity': 'O', 'index': 4}
{'word': 'O', 'score': 0.9981945157051086, 'entity': 'I-misc', 'index': 5}
{'word': '##R', 'score': 0.9999998807907104, 'entity': 'O', 'index': 6}
{'word': '-', 'score': 0.9999999403953552, 'entity': 'O', 'index': 7}
{'word': 'voorzitter', 'score': 0.9999998807907104, 'entity': 'O', 'index': 8}
{'word': 'Bern', 'score': 0.9999980926513672, 'entity': 'B-per', 'index': 9}
{'word': '##d', 'score': 0.9999998807907104, 'entity': 'O', 'index': 10}
{'word': 'Ost', 'score': 0.9999990463256836, 'entity': 'I-per', 'index': 11}
{'word': '##er', 'score': 0.9999998807907104, 'entity': 'O', 'index': 12}
{'word': '##lo', 'score': 0.9999997615814209, 'entity': 'O', 'index': 13}
{'word': '##h', 'score': 0.9999998807907104, 'entity': 'O', 'index': 14}

{'word': 'moeten', 'score': 0.9999999403953552, 'entity': 'O', 'index': 15}
{'word': 'geloven', 'score': 0.9999998807907104, 'entity': 'O', 'index': 16}
{'word': ',', 'score': 0.9999999403953552, 'entity': 'O', 'index': 17}
{'word': 'dan', 'score': 0.9999999403953552, 'entity': 'O', 'index': 18}
{'word': 'moet', 'score': 0.9999999403953552, 'entity': 'O', 'index': 19}
{'word': 'dat', 'score': 0.9999999403953552, 'entity': 'O', 'index': 20}
{'word': 'binnen', 'score': 0.9999999403953552, 'entity': 'O', 'index': 21}
{'word': 'drie', 'score': 0.9999999403953552, 'entity': 'O', 'index': 22}
{'word': 'jaar', 'score': 0.9999999403953552, 'entity': 'O', 'index': 23}
{'word': 'het', 'score': 0.9999999403953552, 'entity': 'O', 'index': 24}
{'word': 'geval', 'score': 0.9999999403953552, 'entity': 'O', 'index': 25}
{'word': 'zijn', 'score': 0.9999999403953552, 'entity': 'O', 'index': 26}
{'word': '.', 'score': 0.9999999403953552, 'entity': 'O', 'index': 27}
{'word': '[SEP]', 'score': 0.9999999403953552, 'entity': 'O', 'index': 28}

I believe your pull request addresses these issues properly.
However, I saw the merge did not complete since it failed on some tasks.

I was wondering if there is still the intention to solve these issues.

Disclaimer: I am a total newbie to git (just set up an account), so please be mild, haha.
Any help is much appreciated!

Thank you in advance,

Monique

@enzoampil
Copy link
Contributor

@cceyda I actually want this PR to move forward. Are you okay collaborating on your fork (can add me as collaborator)? I can help out with some of the issues failing so we can get this merged 😄

@cceyda
Copy link
Contributor Author

cceyda commented Sep 14, 2020

@enzoampil I have added you as a collaborator.
Also pushed some additional changes addressing the [UNK] token mapping problem I mentioned before.
Still there are some things I'm not very satisfied with:

  1. subword prefix was fixed to '##' before. with the latest change I added a check to see if the tokenizer has an is_subword_fn defined (still dont like handling it this way). I know some tokenizers have subword_prefix but most don't and this was the most flexible solution for now.
  2. offset_mappings is needed to resolve [UNK] tokens, but is only available with fast tokenizers. Fast tokenizers don't have convert_ids_to_tokens so had to implement a hacky solution for those aswell.
  3. skip_special_tokens also dropped [UNK] tokens so I had to change things and rely on special_tokens_mask.

It is not optimal but it worked for my use cases.
Haven't had a chance to look at the failing tests yet :/

@cceyda
Copy link
Contributor Author

cceyda commented Sep 16, 2020

I have changed the ignore_subwords default to True which covers cases like

[
{'word': 'Cons', 'score': 0.9994944930076599, 'entity': 'B-PER', 'index': 1},
{'word': '##uelo', 'score': 0.802545428276062, 'entity': 'B-PER', 'index': 2}
]

And honestly I don't know why subwords shouldn't be ignored for most cases. (Unless there is need for some custom logic that determines a words tag; ie by averaging the wordpieces etc etc. In which case grouped_entities shouldn't be used 🤔 )
IMO Mid-word inconsistencies made by the model while ignore_subwords = False shouldn't effect pipelines output logic.

[todo]

  • torch tests are passing for now but probably should add more cases? (I can't see why the tf tests are failing though, don't have dev env for that)
  • should add the new parameters to the doc strings.

@codecov
Copy link

codecov bot commented Sep 21, 2020

Codecov Report

Merging #5970 into master will increase coverage by 26.30%.
The diff coverage is 71.87%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #5970       +/-   ##
===========================================
+ Coverage   52.05%   78.36%   +26.30%     
===========================================
  Files         236      168       -68     
  Lines       43336    32338    -10998     
===========================================
+ Hits        22560    25341     +2781     
+ Misses      20776     6997    -13779     
Impacted Files Coverage Δ
src/transformers/pipelines.py 80.59% <71.87%> (+61.46%) ⬆️
src/transformers/modeling_tf_xlm.py 18.94% <0.00%> (-60.01%) ⬇️
src/transformers/modeling_tf_flaubert.py 24.53% <0.00%> (-56.29%) ⬇️
src/transformers/tokenization_camembert.py 37.03% <0.00%> (-29.23%) ⬇️
src/transformers/modeling_tf_gpt2.py 71.84% <0.00%> (-11.00%) ⬇️
src/transformers/data/__init__.py 100.00% <0.00%> (ø)
src/transformers/modeling_mmbt.py 23.47% <0.00%> (ø)
src/transformers/modeling_mbart.py 100.00% <0.00%> (ø)
src/transformers/modeling_outputs.py 100.00% <0.00%> (ø)
src/transformers/modeling_pegasus.py 100.00% <0.00%> (ø)
... and 217 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7087d9b...47797d1. Read the comment docs.

@Monique497
Copy link

Dear @cceyda,

Last two days I worked on your branch to see how it performs on my own input texts.
However, I came accross the following issue I would like to point out to you:

When I use the following line of code (as you suggest under 'Usage' above):

pipeline('ner', model=model, tokenizer=tokenizer, ignore_labels=[], grouped_entities=True, skip_special_tokens=True, ignore_subwords=True)

I get the error:

TypeError: init() got an unexpected keyword argument 'skip_special_tokens'.

When looking in the file transformer.pipelines and looking specifically for the tokenclassificationpipeline, it seems that it is not yet implemented. Or am I missing something?

Best,

Monique

@cceyda
Copy link
Contributor Author

cceyda commented Sep 28, 2020

@Monique497 sorry for the delay
A couple of things have changed since I first wrote that example:

  • special tokens ([CLS][PAD][SEP]) are always skipped (per comments above) so you don't need that kwarg. This is also valid for grouped_entities=False
from transformers import (
    AutoModelForTokenClassification,
    AutoTokenizer,
    pipeline,
)

model = AutoModelForTokenClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True) # note the fast tokenizer use
# ignore_subwords = True by default
nlp = pipeline("ner",model=model,tokenizer=tokenizer, grouped_entities=True)
inputs="test sentence"
output=nlp(inputs)
  • Another important thing is you have to use a fast tokenizer OR pass offset_mapping as a parameter because the [UNK] token resolution depends on this. (maybe I should rename this to offset_mappings). This is also valid for grouped_entities=False
# you can pass it like this
nlp(inputs,offset_mapping=mappings_you_calculate)
  • If you are using a custom tokenizer that treats subwords differently (ie not starting with '##'), you can pass a function implementing your custom logic through tokenizer.is_subword_fn and tokenizer.convert_tokens_to_string
    I don't know if this is the best way to handle non standard tokenizations, but I use some custom non-standard tokenizers for Korean and this solution gave me enough flexibility.

something like this:

def sub_fn(token):
    if token.starts_with("%%"): return True
tokenizer.is_subword_fn=sub_fn

def convert_tokens_to_string(self, tokens):
    out_string = " ".join(tokens).replace(" %%", "").strip()
    return out_string
tokenizer.convert_tokens_to_string=convert_tokens_to_string

@enzoampil what are your thoughts on this?

@@ -771,8 +785,15 @@ def _test_ner_pipeline(
for key in output_keys:
self.assertIn(key, result)

for ungrouped_input, grouped_result in zip(ungrouped_ner_inputs, expected_grouped_ner_results):
self.assertEqual(nlp.group_entities(ungrouped_input), grouped_result)
if nlp.grouped_entities:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conditioned so that grouped_entities=False tests won't fail because of grouped_entities=True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

for ungrouped_input, grouped_result in zip(ungrouped_ner_inputs, expected_grouped_ner_results):
self.assertEqual(nlp.group_entities(ungrouped_input), grouped_result)
if nlp.grouped_entities:
if nlp.ignore_subwords:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added case for ignore_subwords=True and False

Copy link
Contributor Author

@cceyda cceyda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What else can I do for this pr for merge? It has been a while

{"entity_group": "B-PER", "score": 0.9997273534536362, "word": "Andrés Pastrana"},
{"entity_group": "B-ORG", "score": 0.8589080572128296, "word": "Farc"},
{"entity_group": "PER", "score": 0.999369223912557, "word": "Consuelo Araújo Noguera"},
{"entity_group": "PER", "score": 0.9997771680355072, "word": "Andrés Pastrana"},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test for hyphenated names (ex. Juantia Gomez-Cortez) would be useful, especially given that the fast and slow tokenizers have different codepaths for reconstructing the original text. I had to implement grouping of named entities myself recently and was tripped up by that corner case.

start_ind, end_ind = offset_mapping[idx]
word_ref = sentence[start_ind:end_ind]
word = self.tokenizer.convert_ids_to_tokens([int(input_ids[idx])])[0]
is_subword = len(word_ref) != len(word)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed is_subword detection logic: by comparing length of token(##token) with the original text span mapping (Assuming subwordpieces get prefixed by something).
Incase the user wants some other logic they can first get ungrouped entities add is_subword:bool field to entities and call pipeline.group_entities themselves.

if self.tokenizer.is_fast:
word = self.tokenizer.decode(self.tokenizer.convert_tokens_to_ids(tokens))
else:
word = self.tokenizer.convert_tokens_to_string(tokens)
Copy link
Contributor Author

@cceyda cceyda Oct 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed as suggested! I agree it is much cleaner this way. Umm it looks like fast tokenizers have a convert_tokens_to_string method now? 😕

@@ -1299,6 +1299,29 @@ def __call__(self, *args, targets=None, **kwargs):
return results


class TokenClassificationArgumentHandler(ArgumentHandler):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this to check offset_mapping if provided. (does a simple batch_size check)

for model_name in self.small_models:
nlp = pipeline(
task="ner", model=model_name, tokenizer=model_name, grouped_entities=True, ignore_subwords=False
task="ner", model=model_name, tokenizer=tokenizer, grouped_entities=True, ignore_subwords=True
Copy link
Contributor Author

@cceyda cceyda Oct 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know why the tests are failing, I get normal results when running them outside of the test suite 😕 ? I was being just careless 🤦 . should still add cases for not fast tokenizers

@LysandreJik
Copy link
Member

Thanks for iterating! I'll check this today.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Played around with it, works well, and the implementation seems robust. Looks good, LGTM! Thanks for iterating.

@stefan-it, do you want to take a quick look?

@LysandreJik
Copy link
Member

Merging this as soon as it's green, thank you for iterating on the PR! Sorry this took so long to merge.

@LysandreJik LysandreJik merged commit 29b536a into huggingface:master Nov 3, 2020
@enzoampil
Copy link
Contributor

Thanks @LysandreJik and congrats @cceyda !! 😄

@Botfacke

This comment was marked as spam.

@LysandreJik
Copy link
Member

LysandreJik commented Nov 6, 2020

FYI this broke the NER pipeline:

from transformers import pipeline

nlp = pipeline("ner")

nlp("My name is Alex and I live in New York")

crashes with the following error:

    raise Exception("To decode [UNK] tokens use a fast tokenizer or provide offset_mapping parameter")
Exception: To decode [UNK] tokens use a fast tokenizer or provide offset_mapping parameter

Trying to see if this can be quickly patched, otherwise we'll revert the PR while we patch this.

@cceyda
Copy link
Contributor Author

cceyda commented Nov 6, 2020

oops! although returning unk tokens with slow tokenizers are not the best, I agree not forcing a fast tokenizer with a default of ignore_subword=True looks better for keeping the compatibility. I saw a bit late the _args_parser line was mis-merged during this pr merge and I see it is fixed/improved on the patch. I wasn't sure on how to test for the offset_mapping argument with the new test structure (which looks to be good at the patch). Sorry for the trouble 😅 @LysandreJik

@LysandreJik
Copy link
Member

No worries, thanks for taking a look at the patch!

fabiocapsouza pushed a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
* Bug fix: NER pipeline shouldn't group separate entities of same type

* style fix

* [Bug Fix] Shouldn't group entities that are both 'B' even if they are same type
	(B-type1 B-type1) != (B-type1 I-type1)
[Bug Fix] add an option `ignore_subwords` to ignore subsequent ##wordpieces in predictions. Because some models train on only the first token of a word and not on the subsequent wordpieces (BERT NER default). So it makes sense doing the same thing at inference time.
	The simplest fix is to just group the subwords with the first wordpiece.
	[TODO] how to handle ignored scores? just set them to 0 and calculate zero invariant mean ?
	[TODO] handle different wordpiece_prefix ## ? possible approaches:
		get it from tokenizer? but currently most tokenizers dont have a wordpiece_prefix property?
		have an _is_subword(token)
[Feature add] added option to `skip_special_tokens`. Cause It was harder to remove them after grouping.
[Additional Changes] remove B/I prefix on returned grouped_entities
[Feature Request/TODO] Return indexes?
[Bug TODO]  can't use fast tokenizer with grouped_entities ('BertTokenizerFast' object has no attribute 'convert_tokens_to_string')

* use offset_mapping to fix [UNK] token problem

* ignore score for subwords

* modify ner_pipeline test

* modify ner_pipeline test

* modify ner_pipeline test

* ner_pipeline change ignore_subwords default to true

* add ner_pipeline ignore_subword=False test case

* fix offset_mapping index

* fix style again duh

* change is_subword and convert_tokens_to_string logic

* merge tests with new test structure

* change test names

* remove old tests

* ner tests for fast tokenizer

* fast tokenizers have convert_tokens_to_string

* Fix the incorrect merge

Co-authored-by: Ceyda Cinarel <snu-ceyda@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
fabiocapsouza added a commit to fabiocapsouza/transformers that referenced this pull request Nov 15, 2020
bdalal added a commit to lucidworks/transformers that referenced this pull request Nov 19, 2020
* Rename add_start_docstrings_to_callable (huggingface#8120)

* Update CI cache (huggingface#8126)

* Upgrade PyTorch Lightning to 1.0.2 (huggingface#7852)

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Fix typo in `AutoModelForMaskedLM` docs (huggingface#8129)

* [s2s test] cleanup (huggingface#8131)

* add tags (huggingface#8147)

* Add model_cards for DynaBERT (huggingface#8012)

* Update README.md

* Add dynabert_overview.png

* Update README.md

* Create README.md

* Add dynabert_overview.png

* Update README.md

* Update README.md

* Delete dynabert_overview.png

* Update README.md

* Delete dynabert_overview.png

* Update README.md

* Create README.md (huggingface#8015)

* Create README.md (huggingface#8017)

* Model Card for Gujarati-XLM-R-Base (huggingface#8038)

* Add model card for Gujarati-XLM-R-Base

* Update README.md

Add the model card for the Gujarati-XLM-R-Base.

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Add two model_cards: ethanyt/guwenbert-base and ethanyt/guwenbert-large (huggingface#8041)

* Create README.md (huggingface#8075)

* Create README.md

* Update model_cards/gurkan08/bert-turkish-text-classification/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Create README.md (huggingface#8088)

* Create README.md

* metadata

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Create README.md (huggingface#8089)

* Add model_cards (huggingface#7969)

* add readme

* add readmes

* Add metadata

* Update README.md (huggingface#8090)

* Update widget examples. (huggingface#8149)

Co-authored-by: yantan <yantan@effyic.com>

* Fix doc errors and typos across the board (huggingface#8139)

* Fix doc errors and typos across the board

* Fix a typo

* Fix the CI

* Fix more typos

* Fix CI

* More fixes

* Fix CI

* More fixes

* More fixes

* Document tokenizer_class in configurations (huggingface#8152)

* Smarter prediction loop and no- -> no_ in console args (huggingface#8151)

* Smarter prediction loop and no- -> no_ in console args

* Fix test

* [s2s] distillBART docs for paper replication (huggingface#8150)

* Add a template for examples and apply it for mlm and plm examples (huggingface#8153)

* Add a template for example scripts and apply it to mlm

* Formatting

* Fix test

* Add plm script

* Add a template for example scripts and apply it to mlm

* Formatting

* Fix test

* Add plm script

* Add a template for example scripts and apply it to mlm

* Formatting

* Fix test

* Add plm script

* Styling

* improve error checking (huggingface#8157)

* Fix typo: indinces -> indices (huggingface#8159)

* Fix typo: indinces -> indices

* Fix some more

* Fix some more

* Fix some more

* Fix CI

* Fix eval ref miss in Chinese WWM. (huggingface#8115)

* ADD: add whole word mask proxy for both eng and chinese

* MOD: adjust format

* MOD: reformat code

* MOD: update import

* MOD: fix bug

* MOD: add import

* MOD: fix bug

* MOD: decouple code and update readme

* MOD: reformat code

* Update examples/language-modeling/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* change wwm to whole_word_mask

* reformat code

* reformat

* format

* Code quality

* ADD: update chinese ref readme

* MOD: small changes

* MOD: small changes2

* update readme

* fix eval ref file miss bug

* format file

* MOD: move ref code to contrib

* MOD: add delimeter check

* reformat code

* refomat code

* Update examples/language-modeling/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* [CI] Better reports #2 (huggingface#8163)

* Fixing some warnings in DeBerta (huggingface#8176)

* Fixing some warnings in DeBerta

* Fixing docs with their rewritten version.

* Ci test tf super slow (huggingface#8007)

* Test TF GPU CI

* Change cache

* Fix missing torch requirement

* Fix some model tests


Style

* LXMERT

* MobileBERT

* Longformer skip test

* XLNet

* The rest of the tests

* RAG goes OOM in multi gpu setup

* YAML test files

* Last fixes

* Skip doctests

* Fill mask tests

* Yaml files

* Last test fix

* Style

* Update cache

* Change ONNX tests to slow + use tiny model

* Fix typo: s/languaged/language/ (huggingface#8165)

* TFMarian, TFMbart, TFPegasus, TFBlenderbot (huggingface#7987)

* Start plumbing

* Marian close

* Small stubs for all children

* Fixed bart

* marian working

* pegasus test is good, but failing

* Checkin tests

* More model files

* Subtle marian, pegasus integration test failures

* Works well

* rm print

* boom boom

* Still failing model2doc

* merge master

* Equivalence test failing, all others fixed

* cleanup

* Fix embed_scale

* Cleanup marian pipeline test

* Undo extra changes

* Smaller delta

* Cleanup model testers

* undo delta

* fix tests import structure

* cross test decorator

* Cleaner set_weights

* Respect authorized_unexpected_keys

* No warnings

* No warnings

* style

* Nest tf import

* black

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* functional dropout

* fixup

* Fixup

* style_doc

* embs

* shape list

* delete slow force_token_id_to_be_generated func

* fixup

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Doc fixes and filter warning in wandb (huggingface#8189)

* Finalize lm examples (huggingface#8188)

* Finish the cleanup of the language-modeling examples

* Update main README

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Apply suggestions from code review

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Propagate changes

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Replace swish with silu (huggingface#8166)

* Replace swish with silu

* revert nn.silu to nn.swish due to older version

* simplify optimized silu conditional and fix format

* Update activations.py

* Update activations_tf.py

* Update modeling_flax_utils.py

* Update modeling_openai.py

* add swish testcase

* add pytorch swish testcase

* Add more robust python version check

* more formatting fixes

Co-authored-by: TFUsers <TFUsers@gmail.com>

* Remove deprecated arguments from new run_clm (huggingface#8197)

* Minor style improvements for the Flax BERT and RoBERTa examples (huggingface#8178)

* Minor style improvements:

1. Use `@nn.compact` rather than `@compact` (as to not make it seem
   like compact is a standard Python decorator.
2. Move attribute docstrings from two `__call__` methods to comments
   on the attributes themselves. (This was probably a remnant from
   the pre-Linen version where the attributes were arguments to
   `call`.)

* Use black on the Flax modeling code

* Fix two bugs with --logging_first_step (huggingface#8193)

* make sure that logging_first_step evaluates

* fix bug with incorrect loss on logging_first_step

* fix style

* logging_first_step only logs, not evals

* [Bug fix] Fixed value for BlenderBot pad token (huggingface#8205)

* [Seq2SeqTrainer] Move import to init to make file self-contained (huggingface#8194)

* boom boom

* reverse order

* Added 12 model cards for Indian Language Models (huggingface#8198)

* Create README.md

* added model cards

* DynaBERT model cards update (huggingface#8192)

* Update README.md

* Update README.md

* Fix the behaviour of DefaultArgumentHandler (removing it). (huggingface#8180)

* Some work to fix the behaviour of DefaultArgumentHandler by removing it.

* Fixing specific pipelines argument checking.

* Fix ignore list behavior in doctests (huggingface#8213)

* doc: fix typo (huggingface#8235)

* Patch reports (huggingface#8238)

* Fix bad import with PyTorch <= 1.4.1 (huggingface#8237)

* Fix TensorBoardCallback for older versions of PyTorch (huggingface#8239)

* Create README.md

* Add line by line option to mlm/plm scripts (huggingface#8240)

* Make line by line optional in run_mlm

* Add option to disable dynamic padding

* Add option to plm too and update README

* Typos

* More typos

* Even more typos

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Create README.md

* Add XLMProphetNetTokenizer to tokenization auto (huggingface#8245)

* fix encoder decoder bug (huggingface#8243)

* add new notebooks (huggingface#8246)

* 2 SinusoidalPositionalEmbedding fixes (huggingface#8226)

* [Seq2Seq] Correct import in Seq2Seq Trainer (huggingface#8254)

* Skip tatoeba tests if Tatoeba-Challenge not cloned (huggingface#8260)

* Refactoring the generate() function (huggingface#6949)

* first draft

* show design proposition for new generate method

* up

* make better readable

* make first version

* gpt2 tests pass

* make beam search for gpt2 work

* add first encoder-decoder code

* delete typo

* make t5 work

* save indermediate

* make bart work with beam search

* finish beam search bart / t5

* add default kwargs

* make more tests pass

* fix no bad words sampler

* some fixes and tests for all distribution processors

* fix test

* fix rag slow tests

* merge to master

* add nograd to generate

* make all slow tests pass

* speed up generate

* fix edge case bug

* small fix

* correct typo

* add type hints and docstrings

* fix typos in tests

* add beam search tests

* add tests for beam scorer

* fix test rag

* finish beam search tests

* move generation tests in seperate file

* fix generation tests

* more tests

* add aggressive generation tests

* fix tests

* add gpt2 sample test

* add more docstring

* add more docs

* finish doc strings

* apply some more of sylvains and sams comments

* fix some typos

* make fix copies

* apply lysandres and sylvains comments

* final corrections on examples

* small fix for reformer

* [FIX] TextGenerationPipeline is currently broken. (huggingface#8256)

* [FIX] TextGenerationPipeline is currently broken.

It's most likely due to huggingface#8180.
What's missing is a multi vs single string handler at the beginning of
the pipe.
And also there was no testing of this pipeline.

* Fixing Conversational tests too.

* Updated ConversationalPipeline to work with encoder-decoder models (huggingface#8207)

* Updated ConversationalPipeline to work with encoder-decoder models (e.g. BlenderBot)

* Addition of integration test for EncoderDecoder conversation model

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Fix Tatoeba skip

* forward the worker stderr to the parent process (huggingface#8262)

* [examples] minimal version requirement run-time check in PL (huggingface#8133)

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* make files independent (huggingface#8267)

* Clean Trainer tests and datasets dep (huggingface#8268)

* improve documentation of training_args.py (huggingface#8270)

* improve documentation of training_args.py

- do_train
- do_eval
- do_predict

* fix line too long

* fix style with black on training_args.py

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix line length with utils/style_doc

* black reformatting

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Data collator for token classification (huggingface#8274)

* Add DataCollatorForTokenClassification and clean tests

* Make quality

* [CIs] Better reports everywhere (huggingface#8275)

* make it possible to invoke testconf.py in both test suites without crashing on having the same option added

* perl -pi -e 's|--make_reports|--make-reports|' to be consistent with other opts

* add `pytest --make-reports` to all CIs (and artifacts)

* fix

* [WIP] Ner pipeline grouped_entities fixes (huggingface#5970)

* Bug fix: NER pipeline shouldn't group separate entities of same type

* style fix

* [Bug Fix] Shouldn't group entities that are both 'B' even if they are same type
	(B-type1 B-type1) != (B-type1 I-type1)
[Bug Fix] add an option `ignore_subwords` to ignore subsequent ##wordpieces in predictions. Because some models train on only the first token of a word and not on the subsequent wordpieces (BERT NER default). So it makes sense doing the same thing at inference time.
	The simplest fix is to just group the subwords with the first wordpiece.
	[TODO] how to handle ignored scores? just set them to 0 and calculate zero invariant mean ?
	[TODO] handle different wordpiece_prefix ## ? possible approaches:
		get it from tokenizer? but currently most tokenizers dont have a wordpiece_prefix property?
		have an _is_subword(token)
[Feature add] added option to `skip_special_tokens`. Cause It was harder to remove them after grouping.
[Additional Changes] remove B/I prefix on returned grouped_entities
[Feature Request/TODO] Return indexes?
[Bug TODO]  can't use fast tokenizer with grouped_entities ('BertTokenizerFast' object has no attribute 'convert_tokens_to_string')

* use offset_mapping to fix [UNK] token problem

* ignore score for subwords

* modify ner_pipeline test

* modify ner_pipeline test

* modify ner_pipeline test

* ner_pipeline change ignore_subwords default to true

* add ner_pipeline ignore_subword=False test case

* fix offset_mapping index

* fix style again duh

* change is_subword and convert_tokens_to_string logic

* merge tests with new test structure

* change test names

* remove old tests

* ner tests for fast tokenizer

* fast tokenizers have convert_tokens_to_string

* Fix the incorrect merge

Co-authored-by: Ceyda Cinarel <snu-ceyda@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

* [blenderbot] regex fix (huggingface#8282)

Fixing:

```
src/transformers/tokenization_blenderbot.py:163: DeprecationWarning: invalid escape sequence \s
    token = re.sub("\s{2,}", " ", token)
```

* Fix typo in language-modeling README.md (huggingface#8287)

* [Generate Test] fix greedy generate test (huggingface#8293)

* fix greedy generate test

* delet ipdb

* Fix validation file loading in scripts (huggingface#8298)

* Upgrade resource for doc building

* Revert size change as it doesn't change anything

* Model card: T5-base fine-tuned on QASC (huggingface#8299)

* Update model cards of deepset/roberta-base-squad2 v1 and v2 (huggingface#8241)

* update deepset/roberta-base-squad2 to v2

* Update model_cards/deepset/roberta-base-squad2/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Improve QA pipeline error handling (huggingface#8286)

- The issue is that with previous code we would have the following:

```python
qa_pipeline = (...)
qa_pipeline(question="Where was he born ?", context="")
-> IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
```

The goal here is to improve this to actually return a ValueError
wherever possible.

While at it, I tried to simplify QuestionArgumentHandler's code to
make it smaller and more compat while keeping backward compat.

* adding model cards for distilled models (huggingface#8300)

* adding model cards for distil models

* forgot the languages

* Speedup doc build (huggingface#8301)

* Try -j option

* Try other thing

* Bigger machine

* Test lower sphinx version

* Remove trailing space

* Fix path to old run_language_modeling.py script (huggingface#8302)

* Clean up data collators and datasets (huggingface#8308)

* Clean up data collators and datasets

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Remove needless clone

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Create README.md (huggingface#8223)

* Create README.md

* Update README.md

* Apply suggestions from code review

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Update bug-report.md

* Update PULL_REQUEST_TEMPLATE.md

* Corrected typo in readme (huggingface#8320)

* change TokenClassificationTask class methods to static methods (huggingface#7902)

* change TokenClassificationTask class methods to static methods

Since we do not require self in the class methods of TokenClassificationTask we should probably switch to static methods. Also, since the class TokenClassificationTask does not contain a constructor it is currently unusable as is. By switching to static methods this fixes the issue of having to document the intent of the broken class.

Also, since the get_labels and read_examples_from_file methods are ought to be implemented. Static method definitions are unchanged even after inheritance, which means that it can be overridden, similar to other class methods.

* Trigger Build

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

* no warn (huggingface#8329)

* Output global_attentions in Longformer models (huggingface#7562)

* Output global_attentions in Longformer models

* make style

* small refactoring

* fix tests

* make fix-copies

* add for tf as well

* remove comments in test

* make fix-copies

* make style

* add docs

* make docstring pretty

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>

* Make Trainer evaluation handle dynamic seq_length (huggingface#8336)

* Make Trainer evaluation handle dynamic seq_length

* Document behavior.

* Fix test

* Better fix

* Fixes for realsies this time

* Address review comments

* Without forgetting to save...

* [s2s] test_distributed_eval (huggingface#8315)

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Docs bart training ref (huggingface#8330)

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* [s2s] test_bash_script.py - actually learn something (huggingface#8318)

* use decorator

* remove hardcoded paths

* make the test use more data and do real quality tests

* shave off 10 secs

* add --eval_beams 2, reformat

* reduce train size, use smaller custom dataset

* Model card: T5-base fine-tuned on QuaRel (huggingface#8334)

* Model card: CodeBERT fine-tuned for Insecure Code Detection (huggingface#8247)

* Model card: CodeBERT fine-tuned for Insecure Code Detection

* Update model_cards/mrm8488/codebert-base-finetuned-detect-insecure-code/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Model card: GPT-2 fine-tuned on CommonGen (huggingface#8248)

* [model_cards] Update Italian BERT models and introduce new Italian XXL ELECTRA model 🎉 (huggingface#8343)

* Create README.md (huggingface#8258)

* german medbert model details (huggingface#8266)

* model details

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Create README.md (huggingface#8327)

* Create README.md (huggingface#8167)

* Create README.md

Telugu BERTU Readme file

* Update model_cards/kuppuluri/telugu_bertu/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Create README.md (huggingface#8168)

* Create README.md

* Update README.md

* Create README.md (huggingface#8170)

* Create README.md (huggingface#8169)

* Create README.md (huggingface#8255)

* Create README.md

Initial commit

* Updated Read me

Updated

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Create README.md (huggingface#8312)

* Create README.md

* Update model_cards/ktrapeznikov/gpt2-medium-topic-news/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Update README.md (huggingface#8338)

fixes

* Fix typo (huggingface#8351)

* Update README.md (huggingface#8360)

Fix websitr address

* [All Seq2Seq model + CLM models that can be used with EncoderDecoder] Add cross-attention weights to outputs (huggingface#8071)

* Output cross-attention with decoder attention output

* Update src/transformers/modeling_bert.py

* add cross-attention for t5 and bart as well

* fix tests

* correct typo in docs

* add sylvains and sams comments

* correct typo

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix encoder outputs (huggingface#8368)

* [make] rewrite modified_py_files in python to be cross-platform (huggingface#8371)

* rewrite modified_py_files in python to be cross-platform

* try a different way to test for variable not being ""

* improve comment

* Fix DataCollatorForWholeWordMask (huggingface#8379)

* Fix DataCollatorForWholeWordMask

* Replace all tensorize_batch in data_collator.py

* fix md table (huggingface#8395)

* Add gpt2-medium-chinese model card (huggingface#8402)

* Create README.md

* Update model_cards/mymusise/gpt2-medium-chinese/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* fixed default labels for QA model (huggingface#8399)

* Fix DataCollatorForWholeWordMask again (huggingface#8397)

* [s2s examples test] fix data path (huggingface#8398)

* [s2s test_finetune_trainer] failing multigpu test (huggingface#8400)

* [s2s/distill] remove run_distiller.sh, fix xsum script (huggingface#8412)

* comet_ml temporary fix(huggingface#8410)

* updating tag for exbert viz (huggingface#8408)

* Update README.md (huggingface#8406)

* Fix some tooling for windows (huggingface#8359)

* Fix some tooling for windows

* Fix conflict

* Trigger CI

* examples/docs: caveat that PL examples don't work on TPU (huggingface#8309)

* add evaluate doc - trainer.evaluate returns 'epoch' from training (huggingface#8273)

* add evaluate doc

* fix style with utils/style.doc

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Bug fix for permutation language modelling (huggingface#8409)

* [fsmt tokenizer] support lowercase tokenizer (huggingface#8389)

* support lowercase tokenizer

* fix arg pos

* Bump tokenizers (huggingface#8419)

* Add new token classification example (huggingface#8340)

* Add new token classification example

* Remove txt file

* Add test

* With actual testing done

* Less warmup is better

* Update examples/token-classification/run_ner_new.py

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Address review comments

* Fix test

* Make Lysandre happy

* Last touches and rename

* Rename in tests

* Address review comments

* More run_ner -> run_ner_old

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Fix typo

* [fsmt convert script] fairseq broke chkpt data - fixing that (huggingface#8377)

* fairseq broke chkpt data - fixing that

* style

* support older bpecodes filenames - specifically "code" in iwslt14

* Deprecate old data/metrics functions (huggingface#8420)

* [Tests] Add Common Test for Training + Fix a couple of bugs (huggingface#8415)

* add training tests

* correct longformer

* fix docs

* fix some tests

* fix some more train tests

* remove ipdb

* fix multiple edge case model training

* fix funnel and prophetnet

* clean gpt models

* undo renaming of albert

* [docs] remove sshleifer from issue-template :( (huggingface#8418)

* Fix bart shape comment (huggingface#8423)

* [docs] [testing] gpu decorators table (huggingface#8422)

* gpu decorators table

* whitespace

* Update docs/source/testing.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* whitespace

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Check all models are in an auto class (huggingface#8425)

* [github CI] add a multi-gpu job for all example tests (huggingface#8341)

* add a multi-gpu job for all example tests

* run only ported tests

* rename

* explain why env is re-activated on each step

* mark all unported/checked tests with @require_torch_non_multigpu_but_fix_me

* style

* Apply suggestions from code review

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Changing XLNet default from not using memories to 512 context size following paper (huggingface#8417)

* Move XLNet memory length FutureWarning

* isort

* style

* Changed default XLNet memory length

* Model versioning (huggingface#8324)

* fix typo

* rm use_cdn & references, and implement new hf_bucket_url

* I'm pretty sure we don't need to `read` this file

* same here

* [BIG] file_utils.networking: do not gobble up errors anymore

* Fix CI 😇

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Tiny doc tweak

* Add doc + pass kwarg everywhere

* Add more tests and explain

cc @sshleifer let me know if better

Co-Authored-By: Sam Shleifer <sshleifer@gmail.com>

* Also implement revision in pipelines

In the case where we're passing a task name or a string model identifier

* Fix CI 😇

* Fix CI

* [hf_api] new methods + command line implem

* make style

* Final endpoints post-migration

* Fix post-migration

* Py3.6 compat

cc @stefan-it

Thank you @stas00

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Patch token classification pipeline (huggingface#8364)

* Patch token classification pipeline

* Some added tests for TokenClassificationArgumentHandler (huggingface#8366)

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

* Update links from s3 to huggingface.co

* Fix style

* Model sharing rst (huggingface#8439)

* Update RST

* Finer details

* Re-organize

* Style

* Release: v3.5.0

* v3.5.0 documentation

* [s2s/distill] hparams.tokenizer_name = hparams.teacher (huggingface#8382)

* [examples] better PL version check (huggingface#8429)

* Question template (huggingface#8440)

* Remove SO from question template

* Styling

* [docs] improve bart/marian/mBART/pegasus docs (huggingface#8421)

* Add auto next sentence prediction (huggingface#8432)

* Add auto next sentence prediction

* Fix style

* Add mobilebert next sentence prediction

* Windows dev section in the contributing file (huggingface#8436)

* Add a Windows dev section in the contributing file.

* Forgotten link

* Trigger CI

* Rework description

* Trigger CI

* [testing utils] get_auto_remove_tmp_dir more intuitive behavior (huggingface#8401)

* [testing utils] get_auto_remove_tmp_dir default change

Now that I have been using `get_auto_remove_tmp_dir default change` for a while, I realized that the defaults aren't most optimal.

99% of the time we want the tmp dir to be empty at the beginning of the test - so changing the default to `before=True` - this shouldn't impact any tests since this feature is used only during debug.

* simplify things

* update docs

* fix doc layout

* style

* Update src/transformers/testing_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* better 3-state doc

* style

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* s/tmp/temporary/ + style

* correct the statement

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add missing import (huggingface#8444)

* Add missing import

* Fix dummy objects

* fix t5 special tokens (huggingface#8435)

* using multi_gpu consistently (huggingface#8446)

* s|multiple_gpu|multi_gpu|g; s|multigpu|multi_gpu|g'

* doc

* Add missing tasks to `pipeline` docstring (huggingface#8428)

* [No merge] TF integration testing (huggingface#7621)

* stash

* TF Integration testing for ELECTRA, BERT, Longformer

* Trigger slow tests

* Apply suggestions from code review

* fix t5 token type ids (huggingface#8437)

* Bug fix for modeling utilities function: apply_chunking_to_forward, chunking should be in the chunking dimension, an exception was raised if the complete shape of the inputs was not the same rather than only the chunking dimension (huggingface#8391)

Co-authored-by: pedro <pe25171@mit.edu>

* [model_cards] harmonization

* Fix TF Longformer (huggingface#8460)

* Add next sentence prediction loss computation (huggingface#8462)

* Add next sentence prediction loss computation

* Apply style

* Fix tests

* Add forgotten import

* Add forgotten import

* Use a new parameter

* Remove kwargs and use positional arguments

* Fix next sentence output (huggingface#8466)

* Example NER script predicts on tokenized dataset (huggingface#8468)

The new run_ner.py script tries to run prediction on the input
test set `datasets["test"]`, but it should be the tokenized set
`tokenized_datasets["test"]`

* Add TFDPR (huggingface#8203)

* Create modeling_tf_dpr.py

* Add TFDPR

* Add back TFPegasus, TFMarian, TFMBart, TFBlenderBot

last commit accidentally deleted these 4 lines, so I recover them back

* Add TFDPR

* Add TFDPR

* clean up some comments, add TF input-style doc string

* Add TFDPR

* Make return_dict=False as default

* Fix return_dict bug (in .from_pretrained)

* Add get_input_embeddings()

* Create test_modeling_tf_dpr.py

The current version is already passed all 27 tests!
Please see the test run at : 
https://colab.research.google.com/drive/1czS_m9zy5k-iSJbzA_DP1k1xAAC_sdkf?usp=sharing

* fix quality

* delete init weights

* run fix copies

* fix repo consis

* del config_class, load_tf_weights

They shoud be 'pytorch only'

* add config_class back

after removing it, test failed ... so totally only removing "use_tf_weights = None" on Lysandre suggestion

* newline after .. note::

* import tf, np (Necessary for ModelIntegrationTest)

* slow_test from_pretrained with from_pt=True

At the moment we don't have TF weights (since we don't have official official TF model)
Previously, I did not run slow test, so I missed this bug

* Add simple TFDPRModelIntegrationTest

Note that this is just a test that TF and Pytorch gives approx. the same output.
However, I could not test with the official DPR repo's output yet

* upload correct tf model

* remove position_ids as missing keys

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patrickvonplaten <patrick@huggingface.co>

* Replaced some iadd operations on lists with proper list methods. (huggingface#8433)

* Skip test until investigation

* Flax/Jax documentation (huggingface#8331)

* First addition of Flax/Jax documentation

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* make style

* Ensure input order match between Bert & Roberta

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Install dependencies "all" when building doc

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* wraps build_doc deps with ""

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing @sgugger comments.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use list to highlight JAX features.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Make style.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Let's not look to much into the future for now.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Style

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

* [s2s] distill t5-large -> t5-small (huggingface#8376)

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Update deploy-docs dependencies on CI to enable Flax (huggingface#8475)

* Update deploy-docs dependencies on CI to enable Flax

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added pair of ""

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* [model_cards] other chars than [\w\-_] not allowed anymore in model names

cc @Pierrci

* Fix typo in roberta-base-squad2-v2 model card (huggingface#8489)

* quick fix on concatenating text to support more datasets (huggingface#8474)

* Fix doc bug (huggingface#8500)

* fix doc bug

Signed-off-by: mymusise <mymusise1@gmail.com>

* fix example bug

Signed-off-by: mymusise <mymusise1@gmail.com>

* Model sharing doc (huggingface#8498)

* Model sharing doc

* Style

* fix SqueezeBertForMaskedLM (huggingface#8479)

* Try to understand and apply Sylvain's comments (huggingface#8458)

* Use LF instead of os.linesep (huggingface#8491)

* Add pretraining loss computation for TF Bert pretraining (huggingface#8470)

* Add pretraining loss computation for TF Bert pretraining

* Fix labels creation

* Fix T5 model

* restore T5 kwargs

* try a generic fix for pretraining models

* Apply style

* Overide the prepare method for the BERT tests

* Remove typo

* Update deepset/roberta-base-squad2 model card (huggingface#8522)

* Update README.md

* Update README.md

* Update doc for v3.5.1

* [T5] Bug correction & Refactor (huggingface#8518)

* fix bug

* T5 refactor

* refactor tf

* apply sylvains suggestions

* Model templates encoder only (huggingface#8509)

* Model templates

* TensorFlow

* Remove pooler

* CI

* Tokenizer + Refactoring

* Encoder-Decoder

* Let's go testing

* Encoder-Decoder in TF

* Let's go testing in TF

* Documentation

* README

* Fixes

* Better names

* Style

* Update docs

* Choose to skip either TF or PT

* Code quality fixes

* Add to testing suite

* Update file path

* Cookiecutter path

* Update `transformers` path

* Handle rebasing

* Remove seq2seq from model templates

* Remove s2s config

* Apply Sylvain and Patrick comments

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Last fixes from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix paths in github YAML

* Model sharing doc: more tweaks (huggingface#8520)

* More doc tweaks

* Update model_sharing.rst

* make style

* missing newline

* Add email tip

Co-authored-by: Pierric Cistac <pierric@huggingface.co>

* Add bart-large-mnli model card (huggingface#8527)

* fix load weights (huggingface#8528)

* fix load weights

* delete line

* Rework some TF tests (huggingface#8492)

* Update some tests

* Small update

* Apply style

* Use max_position_embeddings

* Create a fake attribute

* Create a fake attribute

* Update wrong name

* Wrong TransfoXL model file

* Keep the common tests agnostic

* [breaking|pipelines|tokenizers] Adding slow-fast tokenizers equivalence tests pipelines - Removing sentencepiece as a required dependency (huggingface#8073)

* Fixing roberta for slow-fast tests

* WIP getting equivalence on pipelines

* slow-to-fast equivalence - working on question-answering pipeline

* optional FAISS tests

* Pipeline Q&A

* Move pipeline tests to their own test job again

* update tokenizer to add sequence id methods

* update to tokenizers 0.9.4

* set sentencepiecce as optional

* clean up squad

* clean up pipelines to use sequence_ids

* style/quality

* wording

* Switch to use_fast = True by default

* update tests for use_fast at True by default

* fix rag tokenizer test

* removing protobuf from required dependencies

* fix NER test for use_fast = True by default

* fixing example tests (Q&A examples use slow tokenizers for now)

* protobuf in main deps extras["sentencepiece"] and example deps

* fix protobug install test

* try to fix seq2seq by switching to slow tokenizers for now

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Create README.md for Chinese RoBERTa Miniatures (huggingface#8550)

* Create README.md

* Update model_cards/uer/chinese_roberta_L-2_H-128/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Readme for News Headline Generation (bert2bert) (huggingface#8557)

* Readme for Wiki Summary [Persian] bert2bert (huggingface#8558)

* Clearer Model Versioning Example (huggingface#8562)

* [doc] typo fix (huggingface#8535)

* [doc] typo fix

@sgugger

* Update src/transformers/modeling_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Adding the prepare_seq2seq_batch function to ProphetNet (huggingface#8515)

* Simply insert T5Tokenizer's prepare_seq2seq_batch

* Update/Add some 'import'

* fix RunTimeError caused by '.view'

* Moves .view related error avoidance from seq2seq_trainer to inside prophetnet

* Update test_tokenization_prophetnet.py

* Format the test code with black

* Re-format the test code

* Update test_tokenization_prophetnet.py

* Add importing require_torch in the test code

* Add importing BatchEncoding in the test code

* Re-format the test code on Colab

* Fix GPT2DoubleHeadsModel to work with model.generate() (huggingface#6601)

* Fix passing token_type_ids during GPT2DoubleHeadsModel.generate() if used

and for GPT2LMHeadModel too

* Update tests to check token_type_ids usage in GPT2 models

* Update version to v4.0.0-dev (huggingface#8568)

* Switch `return_dict` to `True` by default. (huggingface#8530)

* Use the CI to identify failing tests

* Remove from all examples and tests

* More default switch

* Fixes

* More test fixes

* More fixes

* Last fixes hopefully

* Use the CI to identify failing tests

* Remove from all examples and tests

* More default switch

* Fixes

* More test fixes

* More fixes

* Last fixes hopefully

* Run on the real suite

* Fix slow tests

* Fix mixed precision issue for GPT2 (huggingface#8572)

* Fix mixed precision issue for GPT2

* Forgot one cast

* oops

* Forgotten casts

* Reorganize repo (huggingface#8580)

* Put models in subfolders

* Styling

* Fix imports in tests

* More fixes in test imports

* Sneaky hidden imports

* Fix imports in doc files

* More sneaky imports

* Finish fixing tests

* Fix examples

* Fix path for copies

* More fixes for examples

* Fix dummy files

* More fixes for example

* More model import fixes

* Is this why you're unhappy GitHub?

* Fix imports in conver command

* model_card for indolem/indobert-base-uncased (huggingface#8579)

* T5 & mT5 (huggingface#8552)

* add mt5 and t5v1_1 model

* fix tests

* correct some imports

* add tf model

* finish tf t5

* improve examples

* fix copies

* clean doc

* [MT5] More docs (huggingface#8589)

* add docs

* make style

* Add __init__ to the models folder

* Fix init for MT5 (huggingface#8591)

* Tokenizers: ability to load from model subfolder (huggingface#8586)

* <small>tiny typo</small>

* Tokenizers: ability to load from model subfolder

* use subfolder for local files as well

* Uniformize model shortcut name => model id

* from s3 => from huggingface.co

Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>

* Fix model templates (huggingface#8595)

* First fixes

* Fix imports and add init

* Fix typo

* Move init to final dest

* Fix tokenization import

* More fixes

* Styling

* these should run fine on multi-gpu (huggingface#8582)

* Fix check repo utils (huggingface#8600)

* Tokenizers should be framework agnostic (huggingface#8599)

* Tokenizers should be framework agnostic

* Run the slow tests

* Not testing

* Fix documentation

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Remove deprecated (huggingface#8604)

* Remove old deprecated arguments

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>

* Remove needless imports

* Fix tests

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>

* Add Harry Potter Model Card (huggingface#8605)

* Add Harry Potter Model

* Update model_cards/ceostroff/harry-potter-gpt2-fanfiction/README.md

* Update model_cards/ceostroff/harry-potter-gpt2-fanfiction/README.md

* Update model_cards/ceostroff/harry-potter-gpt2-fanfiction/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Remove old doc

* Fixed link to the wrong paper. (huggingface#8607)

* Reset loss to zero on logging in Trainer to avoid bfloat16 issues (huggingface#8561)

* make tr_loss regular float

* Revert "make tr_loss regular float"

This reverts commit c9d7ccf.

* reset loss at each logging step

* keep track of total loss with _total_loss_scalar

* add remaining tr_loss at the end

* Fix DataCollatorForLanguageModeling (huggingface#8621)

* Fix missing space in multiline warning (huggingface#8593)

Multiline string informing about missing PyTorch/TensorFlow had missing space.

* [s2s] broken test (huggingface#8613)

* fix to adjust for huggingface#8530 changes (huggingface#8612)

* self.self.activation_dropout -> self.activation_dropout (huggingface#8611)

(one line typo)

* New TF loading weights (huggingface#8490)

* New TF loading weights

* apply style

* Better naming

* Largely comment the loading method

* Apply style

* Address Patrick's comments

* Remove useless line of code

* Update Docstring

* Address Sylvain's and Lysandre's comments

* Simplify the names computation

* Typos

* Adding PrefixConstrainedLogitsProcessor (huggingface#8529)

* Adding PrefixConstrainedLogitsProcessor

* fixing RAG and style_doc

* fixing black (v20 instead of v19)

* Improving doc in generation_logits_process.py

* Improving docs and typing in generation_utils.py

* docs improvement

* adding test and fixing doc typo

* fixing doc_len

* isort on test

* fixed test

* improve docstring a bit

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* [Tokenizer Doc] Improve tokenizer summary (huggingface#8622)

* improve summary

* small fixes

* cleaned line length

* correct "" formatting

* apply sylvains suggestions

* Fixes the training resuming with gradient accumulation (huggingface#8624)

* Fix training from scratch in new scripts (huggingface#8623)

* model_cards for Chinese Couplet and Poem GPT2 models (huggingface#8620)

* replace performance table with markdown (huggingface#8565)

* replace performance table with markdown

* Update model_cards/smanjil/German-MedBERT/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Update README.md (huggingface#8544)

Modified Model in Action section. The class `AutoModelWithLMHead` is deprecated so changed it to `AutoModelForSeq2SeqLM` for encoder-decoder models. Removed duplicate eos token.

* Model Card for abhilash1910/financial_roberta (huggingface#8625)

* Model Card for abhilash1910/financial_roberta

* Update model_cards/abhilash1910/financial_roberta/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Update README.md (huggingface#8405)

* Update README.md

* Update README.md

* Add model card for ai4bharat/indic-bert (huggingface#8464)

* Create README.md (huggingface#8363)

* Model card: T5-base fine-tuned on QuaRTz (huggingface#8369)

* Model card: T5-base fine-tuned on QuaRTz

* Update model_cards/mrm8488/t5-base-finetuned-quartz/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Create README.md (huggingface#8362)

* Created ModelCard for Hel-ach-en MT model (huggingface#8496)

* Updated ModelCard

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* [s2s] distillation apex breaks return_dict obj (huggingface#8631)

* apex breaks return_dict obj

* style

* grammar (huggingface#8639)

* Update README.md (huggingface#8635)

* Updated the Extractive Question Answering code snippets (huggingface#8636)

* Updated the Extractive Question Answering code snippets

The Extractive Question Answering code snippets do not work anymore since the models return task-specific output objects. This commit fixes the pytorch and tensorflow examples but adding `.values()` to the model call.

* Update task_summary.rst

* Add cards for all Geotrend models (huggingface#8617)

* docs(bert-base-15lang-cased): add model card

* add cards for all Geotrend models

* [model cards] fix language tag for all Geotrend models

* [model card] : fix bert-base-15lang-cased (huggingface#8655)

the table was badly formatted because of a single line break

* fix missing return dict (huggingface#8653)

* fixed imports

* update example conversion

* removed redundant tests

* added back marker

* reverted to old QA Argument Handler

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sean Naren <sean@grid.ai>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Santiago Castro <sacastro@umich.edu>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Zhiqi Huang <mazicwong@gmail.com>
Co-authored-by: Manuel Romero <mrm8488@gmail.com>
Co-authored-by: Ashwani Tanwar <ashwanitanwar333@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Ethan <9592150+Ethan-yt@users.noreply.github.com>
Co-authored-by: gurkan08 <33202187+gurkan08@users.noreply.github.com>
Co-authored-by: dartrevan <awesombatsy@gmail.com>
Co-authored-by: Branden Chan <33759007+brandenchan@users.noreply.github.com>
Co-authored-by: yantan <yantan@effyic.com>
Co-authored-by: Santiago Castro <santi.1410@hotmail.com>
Co-authored-by: wlhgtc <hgtcwl@foxmail.com>
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: TFUsers <25044281+TFUsers@users.noreply.github.com>
Co-authored-by: TFUsers <TFUsers@gmail.com>
Co-authored-by: Avital Oliver <avital@thewe.net>
Co-authored-by: Abi See <abigail.e.see@gmail.com>
Co-authored-by: guillaume-be <guillaume.becquin@gmail.com>
Co-authored-by: Kushal <32245327+kushalj001@users.noreply.github.com>
Co-authored-by: Martin Monperrus <martin.monperrus@gnieh.org>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Philip May <eniak.info@gmail.com>
Co-authored-by: Ceyda Cinarel <15624271+cceyda@users.noreply.github.com>
Co-authored-by: Ceyda Cinarel <snu-ceyda@users.noreply.github.com>
Co-authored-by: Pengzhi Gao <pengzhi.gao@petuum.com>
Co-authored-by: Victor SANH <victorsanh@gmail.com>
Co-authored-by: Yifan Peng <pengyifan.mail@gmail.com>
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: Guillem García Subies <37592763+GuillemGSubies@users.noreply.github.com>
Co-authored-by: Bobby Donchev <contact@donchev.is>
Co-authored-by: Guillaume Filion <guillaume.filion@gmail.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Jiaxin Pei <pedropei@vip.qq.com>
Co-authored-by: smanjil <shresthamanjil21@gmail.com>
Co-authored-by: Karthik Uppuluri <karthik.uppuluri@gmail.com>
Co-authored-by: hasantanvir79 <hasantanvir79@gmail.com>
Co-authored-by: ktrapeznikov <ktrapeznikov@gmail.com>
Co-authored-by: hassoudi <hassoudi@gmail.com>
Co-authored-by: Jonathan Chang <31893406+cccntu@users.noreply.github.com>
Co-authored-by: Yossi Synett <github@yossisynett.com>
Co-authored-by: Chengxi Guo <mymusise1@gmail.com>
Co-authored-by: Manav Rathod <manav.rathod@berkeley.edu>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
Co-authored-by: Shashank Gupta <shaz4194@gmail.com>
Co-authored-by: Teven <teven.lescao@gmail.com>
Co-authored-by: Shichao Sun <sunshichao1995@gmail.com>
Co-authored-by: Pedro <pedro.colon4@upr.edu>
Co-authored-by: pedro <pe25171@mit.edu>
Co-authored-by: sarnoult <31313050+sarnoult@users.noreply.github.com>
Co-authored-by: Ratthachat (Jung) <56621342+ratthachat@users.noreply.github.com>
Co-authored-by: patrickvonplaten <patrick@huggingface.co>
Co-authored-by: Beomsoo Kim <bluewhale8202@gmail.com>
Co-authored-by: Funtowicz Morgan <mfuntowicz@users.noreply.github.com>
Co-authored-by: Sumithra Bhakthavatsalam <sumithra.b@gmail.com>
Co-authored-by: Antonio Lanza <antoniolanza1996@gmail.com>
Co-authored-by: zeyuyun1 <43428393+zeyuyun1@users.noreply.github.com>
Co-authored-by: Forrest Iandola <fiandola@gmail.com>
Co-authored-by: Pierric Cistac <pierric@huggingface.co>
Co-authored-by: Joe Davison <josephddavison@gmail.com>
Co-authored-by: zhezhaoa <1152543959@qq.com>
Co-authored-by: Mehrdad Farahani <m3hrdadfi@gmail.com>
Co-authored-by: Yusuke Mori <mori@mi.t.u-tokyo.ac.jp>
Co-authored-by: LSinev <LSinev@users.noreply.github.com>
Co-authored-by: fajri91 <fajri91@users.noreply.github.com>
Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
Co-authored-by: Caitlin Ostroff <caitlin.ostroff@gmail.com>
Co-authored-by: cronoik <johannes.schaffrath@mail.de>
Co-authored-by: Benjamin Minixhofer <bminixhofer@gmail.com>
Co-authored-by: Michał Pogoda <237372@student.pwr.edu.pl>
Co-authored-by: Nicola De Cao <nicola.decao@gmail.com>
Co-authored-by: hhou435 <59219579+hhou435@users.noreply.github.com>
Co-authored-by: Vishal Singh <vishalsingh7x@gmail.com>
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
Co-authored-by: Divyanshu Kakwani <divkakwani@gmail.com>
Co-authored-by: Perez Ogayo <pogayo17@alustudent.com>
Co-authored-by: Tim Isbister <timisbister@gmail.com>
Co-authored-by: Amine Abdaoui <abdaoui@lirmm.fr>
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
bdalal added a commit to lucidworks/transformers that referenced this pull request Nov 19, 2020
* Rename add_start_docstrings_to_callable (huggingface#8120)

* Update CI cache (huggingface#8126)

* Upgrade PyTorch Lightning to 1.0.2 (huggingface#7852)

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Fix typo in `AutoModelForMaskedLM` docs (huggingface#8129)

* [s2s test] cleanup (huggingface#8131)

* add tags (huggingface#8147)

* Add model_cards for DynaBERT (huggingface#8012)

* Update README.md

* Add dynabert_overview.png

* Update README.md

* Create README.md

* Add dynabert_overview.png

* Update README.md

* Update README.md

* Delete dynabert_overview.png

* Update README.md

* Delete dynabert_overview.png

* Update README.md

* Create README.md (huggingface#8015)

* Create README.md (huggingface#8017)

* Model Card for Gujarati-XLM-R-Base (huggingface#8038)

* Add model card for Gujarati-XLM-R-Base

* Update README.md

Add the model card for the Gujarati-XLM-R-Base.

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Add two model_cards: ethanyt/guwenbert-base and ethanyt/guwenbert-large (huggingface#8041)

* Create README.md (huggingface#8075)

* Create README.md

* Update model_cards/gurkan08/bert-turkish-text-classification/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Create README.md (huggingface#8088)

* Create README.md

* metadata

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Create README.md (huggingface#8089)

* Add model_cards (huggingface#7969)

* add readme

* add readmes

* Add metadata

* Update README.md (huggingface#8090)

* Update widget examples. (huggingface#8149)

Co-authored-by: yantan <yantan@effyic.com>

* Fix doc errors and typos across the board (huggingface#8139)

* Fix doc errors and typos across the board

* Fix a typo

* Fix the CI

* Fix more typos

* Fix CI

* More fixes

* Fix CI

* More fixes

* More fixes

* Document tokenizer_class in configurations (huggingface#8152)

* Smarter prediction loop and no- -> no_ in console args (huggingface#8151)

* Smarter prediction loop and no- -> no_ in console args

* Fix test

* [s2s] distillBART docs for paper replication (huggingface#8150)

* Add a template for examples and apply it for mlm and plm examples (huggingface#8153)

* Add a template for example scripts and apply it to mlm

* Formatting

* Fix test

* Add plm script

* Add a template for example scripts and apply it to mlm

* Formatting

* Fix test

* Add plm script

* Add a template for example scripts and apply it to mlm

* Formatting

* Fix test

* Add plm script

* Styling

* improve error checking (huggingface#8157)

* Fix typo: indinces -> indices (huggingface#8159)

* Fix typo: indinces -> indices

* Fix some more

* Fix some more

* Fix some more

* Fix CI

* Fix eval ref miss in Chinese WWM. (huggingface#8115)

* ADD: add whole word mask proxy for both eng and chinese

* MOD: adjust format

* MOD: reformat code

* MOD: update import

* MOD: fix bug

* MOD: add import

* MOD: fix bug

* MOD: decouple code and update readme

* MOD: reformat code

* Update examples/language-modeling/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* change wwm to whole_word_mask

* reformat code

* reformat

* format

* Code quality

* ADD: update chinese ref readme

* MOD: small changes

* MOD: small changes2

* update readme

* fix eval ref file miss bug

* format file

* MOD: move ref code to contrib

* MOD: add delimeter check

* reformat code

* refomat code

* Update examples/language-modeling/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* [CI] Better reports #2 (huggingface#8163)

* Fixing some warnings in DeBerta (huggingface#8176)

* Fixing some warnings in DeBerta

* Fixing docs with their rewritten version.

* Ci test tf super slow (huggingface#8007)

* Test TF GPU CI

* Change cache

* Fix missing torch requirement

* Fix some model tests


Style

* LXMERT

* MobileBERT

* Longformer skip test

* XLNet

* The rest of the tests

* RAG goes OOM in multi gpu setup

* YAML test files

* Last fixes

* Skip doctests

* Fill mask tests

* Yaml files

* Last test fix

* Style

* Update cache

* Change ONNX tests to slow + use tiny model

* Fix typo: s/languaged/language/ (huggingface#8165)

* TFMarian, TFMbart, TFPegasus, TFBlenderbot (huggingface#7987)

* Start plumbing

* Marian close

* Small stubs for all children

* Fixed bart

* marian working

* pegasus test is good, but failing

* Checkin tests

* More model files

* Subtle marian, pegasus integration test failures

* Works well

* rm print

* boom boom

* Still failing model2doc

* merge master

* Equivalence test failing, all others fixed

* cleanup

* Fix embed_scale

* Cleanup marian pipeline test

* Undo extra changes

* Smaller delta

* Cleanup model testers

* undo delta

* fix tests import structure

* cross test decorator

* Cleaner set_weights

* Respect authorized_unexpected_keys

* No warnings

* No warnings

* style

* Nest tf import

* black

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* functional dropout

* fixup

* Fixup

* style_doc

* embs

* shape list

* delete slow force_token_id_to_be_generated func

* fixup

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Doc fixes and filter warning in wandb (huggingface#8189)

* Finalize lm examples (huggingface#8188)

* Finish the cleanup of the language-modeling examples

* Update main README

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Apply suggestions from code review

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Propagate changes

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Replace swish with silu (huggingface#8166)

* Replace swish with silu

* revert nn.silu to nn.swish due to older version

* simplify optimized silu conditional and fix format

* Update activations.py

* Update activations_tf.py

* Update modeling_flax_utils.py

* Update modeling_openai.py

* add swish testcase

* add pytorch swish testcase

* Add more robust python version check

* more formatting fixes

Co-authored-by: TFUsers <TFUsers@gmail.com>

* Remove deprecated arguments from new run_clm (huggingface#8197)

* Minor style improvements for the Flax BERT and RoBERTa examples (huggingface#8178)

* Minor style improvements:

1. Use `@nn.compact` rather than `@compact` (as to not make it seem
   like compact is a standard Python decorator.
2. Move attribute docstrings from two `__call__` methods to comments
   on the attributes themselves. (This was probably a remnant from
   the pre-Linen version where the attributes were arguments to
   `call`.)

* Use black on the Flax modeling code

* Fix two bugs with --logging_first_step (huggingface#8193)

* make sure that logging_first_step evaluates

* fix bug with incorrect loss on logging_first_step

* fix style

* logging_first_step only logs, not evals

* [Bug fix] Fixed value for BlenderBot pad token (huggingface#8205)

* [Seq2SeqTrainer] Move import to init to make file self-contained (huggingface#8194)

* boom boom

* reverse order

* Added 12 model cards for Indian Language Models (huggingface#8198)

* Create README.md

* added model cards

* DynaBERT model cards update (huggingface#8192)

* Update README.md

* Update README.md

* Fix the behaviour of DefaultArgumentHandler (removing it). (huggingface#8180)

* Some work to fix the behaviour of DefaultArgumentHandler by removing it.

* Fixing specific pipelines argument checking.

* Fix ignore list behavior in doctests (huggingface#8213)

* doc: fix typo (huggingface#8235)

* Patch reports (huggingface#8238)

* Fix bad import with PyTorch <= 1.4.1 (huggingface#8237)

* Fix TensorBoardCallback for older versions of PyTorch (huggingface#8239)

* Create README.md

* Add line by line option to mlm/plm scripts (huggingface#8240)

* Make line by line optional in run_mlm

* Add option to disable dynamic padding

* Add option to plm too and update README

* Typos

* More typos

* Even more typos

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Create README.md

* Add XLMProphetNetTokenizer to tokenization auto (huggingface#8245)

* fix encoder decoder bug (huggingface#8243)

* add new notebooks (huggingface#8246)

* 2 SinusoidalPositionalEmbedding fixes (huggingface#8226)

* [Seq2Seq] Correct import in Seq2Seq Trainer (huggingface#8254)

* Skip tatoeba tests if Tatoeba-Challenge not cloned (huggingface#8260)

* Refactoring the generate() function (huggingface#6949)

* first draft

* show design proposition for new generate method

* up

* make better readable

* make first version

* gpt2 tests pass

* make beam search for gpt2 work

* add first encoder-decoder code

* delete typo

* make t5 work

* save indermediate

* make bart work with beam search

* finish beam search bart / t5

* add default kwargs

* make more tests pass

* fix no bad words sampler

* some fixes and tests for all distribution processors

* fix test

* fix rag slow tests

* merge to master

* add nograd to generate

* make all slow tests pass

* speed up generate

* fix edge case bug

* small fix

* correct typo

* add type hints and docstrings

* fix typos in tests

* add beam search tests

* add tests for beam scorer

* fix test rag

* finish beam search tests

* move generation tests in seperate file

* fix generation tests

* more tests

* add aggressive generation tests

* fix tests

* add gpt2 sample test

* add more docstring

* add more docs

* finish doc strings

* apply some more of sylvains and sams comments

* fix some typos

* make fix copies

* apply lysandres and sylvains comments

* final corrections on examples

* small fix for reformer

* [FIX] TextGenerationPipeline is currently broken. (huggingface#8256)

* [FIX] TextGenerationPipeline is currently broken.

It's most likely due to huggingface#8180.
What's missing is a multi vs single string handler at the beginning of
the pipe.
And also there was no testing of this pipeline.

* Fixing Conversational tests too.

* Updated ConversationalPipeline to work with encoder-decoder models (huggingface#8207)

* Updated ConversationalPipeline to work with encoder-decoder models (e.g. BlenderBot)

* Addition of integration test for EncoderDecoder conversation model

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Fix Tatoeba skip

* forward the worker stderr to the parent process (huggingface#8262)

* [examples] minimal version requirement run-time check in PL (huggingface#8133)

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* make files independent (huggingface#8267)

* Clean Trainer tests and datasets dep (huggingface#8268)

* improve documentation of training_args.py (huggingface#8270)

* improve documentation of training_args.py

- do_train
- do_eval
- do_predict

* fix line too long

* fix style with black on training_args.py

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix line length with utils/style_doc

* black reformatting

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Data collator for token classification (huggingface#8274)

* Add DataCollatorForTokenClassification and clean tests

* Make quality

* [CIs] Better reports everywhere (huggingface#8275)

* make it possible to invoke testconf.py in both test suites without crashing on having the same option added

* perl -pi -e 's|--make_reports|--make-reports|' to be consistent with other opts

* add `pytest --make-reports` to all CIs (and artifacts)

* fix

* [WIP] Ner pipeline grouped_entities fixes (huggingface#5970)

* Bug fix: NER pipeline shouldn't group separate entities of same type

* style fix

* [Bug Fix] Shouldn't group entities that are both 'B' even if they are same type
	(B-type1 B-type1) != (B-type1 I-type1)
[Bug Fix] add an option `ignore_subwords` to ignore subsequent ##wordpieces in predictions. Because some models train on only the first token of a word and not on the subsequent wordpieces (BERT NER default). So it makes sense doing the same thing at inference time.
	The simplest fix is to just group the subwords with the first wordpiece.
	[TODO] how to handle ignored scores? just set them to 0 and calculate zero invariant mean ?
	[TODO] handle different wordpiece_prefix ## ? possible approaches:
		get it from tokenizer? but currently most tokenizers dont have a wordpiece_prefix property?
		have an _is_subword(token)
[Feature add] added option to `skip_special_tokens`. Cause It was harder to remove them after grouping.
[Additional Changes] remove B/I prefix on returned grouped_entities
[Feature Request/TODO] Return indexes?
[Bug TODO]  can't use fast tokenizer with grouped_entities ('BertTokenizerFast' object has no attribute 'convert_tokens_to_string')

* use offset_mapping to fix [UNK] token problem

* ignore score for subwords

* modify ner_pipeline test

* modify ner_pipeline test

* modify ner_pipeline test

* ner_pipeline change ignore_subwords default to true

* add ner_pipeline ignore_subword=False test case

* fix offset_mapping index

* fix style again duh

* change is_subword and convert_tokens_to_string logic

* merge tests with new test structure

* change test names

* remove old tests

* ner tests for fast tokenizer

* fast tokenizers have convert_tokens_to_string

* Fix the incorrect merge

Co-authored-by: Ceyda Cinarel <snu-ceyda@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

* [blenderbot] regex fix (huggingface#8282)

Fixing:

```
src/transformers/tokenization_blenderbot.py:163: DeprecationWarning: invalid escape sequence \s
    token = re.sub("\s{2,}", " ", token)
```

* Fix typo in language-modeling README.md (huggingface#8287)

* [Generate Test] fix greedy generate test (huggingface#8293)

* fix greedy generate test

* delet ipdb

* Fix validation file loading in scripts (huggingface#8298)

* Upgrade resource for doc building

* Revert size change as it doesn't change anything

* Model card: T5-base fine-tuned on QASC (huggingface#8299)

* Update model cards of deepset/roberta-base-squad2 v1 and v2 (huggingface#8241)

* update deepset/roberta-base-squad2 to v2

* Update model_cards/deepset/roberta-base-squad2/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Improve QA pipeline error handling (huggingface#8286)

- The issue is that with previous code we would have the following:

```python
qa_pipeline = (...)
qa_pipeline(question="Where was he born ?", context="")
-> IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
```

The goal here is to improve this to actually return a ValueError
wherever possible.

While at it, I tried to simplify QuestionArgumentHandler's code to
make it smaller and more compat while keeping backward compat.

* adding model cards for distilled models (huggingface#8300)

* adding model cards for distil models

* forgot the languages

* Speedup doc build (huggingface#8301)

* Try -j option

* Try other thing

* Bigger machine

* Test lower sphinx version

* Remove trailing space

* Fix path to old run_language_modeling.py script (huggingface#8302)

* Clean up data collators and datasets (huggingface#8308)

* Clean up data collators and datasets

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Remove needless clone

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Create README.md (huggingface#8223)

* Create README.md

* Update README.md

* Apply suggestions from code review

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Update bug-report.md

* Update PULL_REQUEST_TEMPLATE.md

* Corrected typo in readme (huggingface#8320)

* change TokenClassificationTask class methods to static methods (huggingface#7902)

* change TokenClassificationTask class methods to static methods

Since we do not require self in the class methods of TokenClassificationTask we should probably switch to static methods. Also, since the class TokenClassificationTask does not contain a constructor it is currently unusable as is. By switching to static methods this fixes the issue of having to document the intent of the broken class.

Also, since the get_labels and read_examples_from_file methods are ought to be implemented. Static method definitions are unchanged even after inheritance, which means that it can be overridden, similar to other class methods.

* Trigger Build

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

* no warn (huggingface#8329)

* Output global_attentions in Longformer models (huggingface#7562)

* Output global_attentions in Longformer models

* make style

* small refactoring

* fix tests

* make fix-copies

* add for tf as well

* remove comments in test

* make fix-copies

* make style

* add docs

* make docstring pretty

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>

* Make Trainer evaluation handle dynamic seq_length (huggingface#8336)

* Make Trainer evaluation handle dynamic seq_length

* Document behavior.

* Fix test

* Better fix

* Fixes for realsies this time

* Address review comments

* Without forgetting to save...

* [s2s] test_distributed_eval (huggingface#8315)

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Docs bart training ref (huggingface#8330)

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* [s2s] test_bash_script.py - actually learn something (huggingface#8318)

* use decorator

* remove hardcoded paths

* make the test use more data and do real quality tests

* shave off 10 secs

* add --eval_beams 2, reformat

* reduce train size, use smaller custom dataset

* Model card: T5-base fine-tuned on QuaRel (huggingface#8334)

* Model card: CodeBERT fine-tuned for Insecure Code Detection (huggingface#8247)

* Model card: CodeBERT fine-tuned for Insecure Code Detection

* Update model_cards/mrm8488/codebert-base-finetuned-detect-insecure-code/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Model card: GPT-2 fine-tuned on CommonGen (huggingface#8248)

* [model_cards] Update Italian BERT models and introduce new Italian XXL ELECTRA model 🎉 (huggingface#8343)

* Create README.md (huggingface#8258)

* german medbert model details (huggingface#8266)

* model details

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Create README.md (huggingface#8327)

* Create README.md (huggingface#8167)

* Create README.md

Telugu BERTU Readme file

* Update model_cards/kuppuluri/telugu_bertu/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Create README.md (huggingface#8168)

* Create README.md

* Update README.md

* Create README.md (huggingface#8170)

* Create README.md (huggingface#8169)

* Create README.md (huggingface#8255)

* Create README.md

Initial commit

* Updated Read me

Updated

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Create README.md (huggingface#8312)

* Create README.md

* Update model_cards/ktrapeznikov/gpt2-medium-topic-news/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Update README.md (huggingface#8338)

fixes

* Fix typo (huggingface#8351)

* Update README.md (huggingface#8360)

Fix websitr address

* [All Seq2Seq model + CLM models that can be used with EncoderDecoder] Add cross-attention weights to outputs (huggingface#8071)

* Output cross-attention with decoder attention output

* Update src/transformers/modeling_bert.py

* add cross-attention for t5 and bart as well

* fix tests

* correct typo in docs

* add sylvains and sams comments

* correct typo

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix encoder outputs (huggingface#8368)

* [make] rewrite modified_py_files in python to be cross-platform (huggingface#8371)

* rewrite modified_py_files in python to be cross-platform

* try a different way to test for variable not being ""

* improve comment

* Fix DataCollatorForWholeWordMask (huggingface#8379)

* Fix DataCollatorForWholeWordMask

* Replace all tensorize_batch in data_collator.py

* fix md table (huggingface#8395)

* Add gpt2-medium-chinese model card (huggingface#8402)

* Create README.md

* Update model_cards/mymusise/gpt2-medium-chinese/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* fixed default labels for QA model (huggingface#8399)

* Fix DataCollatorForWholeWordMask again (huggingface#8397)

* [s2s examples test] fix data path (huggingface#8398)

* [s2s test_finetune_trainer] failing multigpu test (huggingface#8400)

* [s2s/distill] remove run_distiller.sh, fix xsum script (huggingface#8412)

* comet_ml temporary fix(huggingface#8410)

* updating tag for exbert viz (huggingface#8408)

* Update README.md (huggingface#8406)

* Fix some tooling for windows (huggingface#8359)

* Fix some tooling for windows

* Fix conflict

* Trigger CI

* examples/docs: caveat that PL examples don't work on TPU (huggingface#8309)

* add evaluate doc - trainer.evaluate returns 'epoch' from training (huggingface#8273)

* add evaluate doc

* fix style with utils/style.doc

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Bug fix for permutation language modelling (huggingface#8409)

* [fsmt tokenizer] support lowercase tokenizer (huggingface#8389)

* support lowercase tokenizer

* fix arg pos

* Bump tokenizers (huggingface#8419)

* Add new token classification example (huggingface#8340)

* Add new token classification example

* Remove txt file

* Add test

* With actual testing done

* Less warmup is better

* Update examples/token-classification/run_ner_new.py

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Address review comments

* Fix test

* Make Lysandre happy

* Last touches and rename

* Rename in tests

* Address review comments

* More run_ner -> run_ner_old

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Fix typo

* [fsmt convert script] fairseq broke chkpt data - fixing that (huggingface#8377)

* fairseq broke chkpt data - fixing that

* style

* support older bpecodes filenames - specifically "code" in iwslt14

* Deprecate old data/metrics functions (huggingface#8420)

* [Tests] Add Common Test for Training + Fix a couple of bugs (huggingface#8415)

* add training tests

* correct longformer

* fix docs

* fix some tests

* fix some more train tests

* remove ipdb

* fix multiple edge case model training

* fix funnel and prophetnet

* clean gpt models

* undo renaming of albert

* [docs] remove sshleifer from issue-template :( (huggingface#8418)

* Fix bart shape comment (huggingface#8423)

* [docs] [testing] gpu decorators table (huggingface#8422)

* gpu decorators table

* whitespace

* Update docs/source/testing.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* whitespace

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Check all models are in an auto class (huggingface#8425)

* [github CI] add a multi-gpu job for all example tests (huggingface#8341)

* add a multi-gpu job for all example tests

* run only ported tests

* rename

* explain why env is re-activated on each step

* mark all unported/checked tests with @require_torch_non_multigpu_but_fix_me

* style

* Apply suggestions from code review

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Changing XLNet default from not using memories to 512 context size following paper (huggingface#8417)

* Move XLNet memory length FutureWarning

* isort

* style

* Changed default XLNet memory length

* Model versioning (huggingface#8324)

* fix typo

* rm use_cdn & references, and implement new hf_bucket_url

* I'm pretty sure we don't need to `read` this file

* same here

* [BIG] file_utils.networking: do not gobble up errors anymore

* Fix CI 😇

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Tiny doc tweak

* Add doc + pass kwarg everywhere

* Add more tests and explain

cc @sshleifer let me know if better

Co-Authored-By: Sam Shleifer <sshleifer@gmail.com>

* Also implement revision in pipelines

In the case where we're passing a task name or a string model identifier

* Fix CI 😇

* Fix CI

* [hf_api] new methods + command line implem

* make style

* Final endpoints post-migration

* Fix post-migration

* Py3.6 compat

cc @stefan-it

Thank you @stas00

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Patch token classification pipeline (huggingface#8364)

* Patch token classification pipeline

* Some added tests for TokenClassificationArgumentHandler (huggingface#8366)

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

* Update links from s3 to huggingface.co

* Fix style

* Model sharing rst (huggingface#8439)

* Update RST

* Finer details

* Re-organize

* Style

* Release: v3.5.0

* v3.5.0 documentation

* [s2s/distill] hparams.tokenizer_name = hparams.teacher (huggingface#8382)

* [examples] better PL version check (huggingface#8429)

* Question template (huggingface#8440)

* Remove SO from question template

* Styling

* [docs] improve bart/marian/mBART/pegasus docs (huggingface#8421)

* Add auto next sentence prediction (huggingface#8432)

* Add auto next sentence prediction

* Fix style

* Add mobilebert next sentence prediction

* Windows dev section in the contributing file (huggingface#8436)

* Add a Windows dev section in the contributing file.

* Forgotten link

* Trigger CI

* Rework description

* Trigger CI

* [testing utils] get_auto_remove_tmp_dir more intuitive behavior (huggingface#8401)

* [testing utils] get_auto_remove_tmp_dir default change

Now that I have been using `get_auto_remove_tmp_dir default change` for a while, I realized that the defaults aren't most optimal.

99% of the time we want the tmp dir to be empty at the beginning of the test - so changing the default to `before=True` - this shouldn't impact any tests since this feature is used only during debug.

* simplify things

* update docs

* fix doc layout

* style

* Update src/transformers/testing_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* better 3-state doc

* style

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* s/tmp/temporary/ + style

* correct the statement

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add missing import (huggingface#8444)

* Add missing import

* Fix dummy objects

* fix t5 special tokens (huggingface#8435)

* using multi_gpu consistently (huggingface#8446)

* s|multiple_gpu|multi_gpu|g; s|multigpu|multi_gpu|g'

* doc

* Add missing tasks to `pipeline` docstring (huggingface#8428)

* [No merge] TF integration testing (huggingface#7621)

* stash

* TF Integration testing for ELECTRA, BERT, Longformer

* Trigger slow tests

* Apply suggestions from code review

* fix t5 token type ids (huggingface#8437)

* Bug fix for modeling utilities function: apply_chunking_to_forward, chunking should be in the chunking dimension, an exception was raised if the complete shape of the inputs was not the same rather than only the chunking dimension (huggingface#8391)

Co-authored-by: pedro <pe25171@mit.edu>

* [model_cards] harmonization

* Fix TF Longformer (huggingface#8460)

* Add next sentence prediction loss computation (huggingface#8462)

* Add next sentence prediction loss computation

* Apply style

* Fix tests

* Add forgotten import

* Add forgotten import

* Use a new parameter

* Remove kwargs and use positional arguments

* Fix next sentence output (huggingface#8466)

* Example NER script predicts on tokenized dataset (huggingface#8468)

The new run_ner.py script tries to run prediction on the input
test set `datasets["test"]`, but it should be the tokenized set
`tokenized_datasets["test"]`

* Add TFDPR (huggingface#8203)

* Create modeling_tf_dpr.py

* Add TFDPR

* Add back TFPegasus, TFMarian, TFMBart, TFBlenderBot

last commit accidentally deleted these 4 lines, so I recover them back

* Add TFDPR

* Add TFDPR

* clean up some comments, add TF input-style doc string

* Add TFDPR

* Make return_dict=False as default

* Fix return_dict bug (in .from_pretrained)

* Add get_input_embeddings()

* Create test_modeling_tf_dpr.py

The current version is already passed all 27 tests!
Please see the test run at : 
https://colab.research.google.com/drive/1czS_m9zy5k-iSJbzA_DP1k1xAAC_sdkf?usp=sharing

* fix quality

* delete init weights

* run fix copies

* fix repo consis

* del config_class, load_tf_weights

They shoud be 'pytorch only'

* add config_class back

after removing it, test failed ... so totally only removing "use_tf_weights = None" on Lysandre suggestion

* newline after .. note::

* import tf, np (Necessary for ModelIntegrationTest)

* slow_test from_pretrained with from_pt=True

At the moment we don't have TF weights (since we don't have official official TF model)
Previously, I did not run slow test, so I missed this bug

* Add simple TFDPRModelIntegrationTest

Note that this is just a test that TF and Pytorch gives approx. the same output.
However, I could not test with the official DPR repo's output yet

* upload correct tf model

* remove position_ids as missing keys

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patrickvonplaten <patrick@huggingface.co>

* Replaced some iadd operations on lists with proper list methods. (huggingface#8433)

* Skip test until investigation

* Flax/Jax documentation (huggingface#8331)

* First addition of Flax/Jax documentation

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* make style

* Ensure input order match between Bert & Roberta

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Install dependencies "all" when building doc

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* wraps build_doc deps with ""

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing @sgugger comments.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use list to highlight JAX features.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Make style.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Let's not look to much into the future for now.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Style

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

* [s2s] distill t5-large -> t5-small (huggingface#8376)

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Update deploy-docs dependencies on CI to enable Flax (huggingface#8475)

* Update deploy-docs dependencies on CI to enable Flax

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added pair of ""

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* [model_cards] other chars than [\w\-_] not allowed anymore in model names

cc @Pierrci

* Fix typo in roberta-base-squad2-v2 model card (huggingface#8489)

* quick fix on concatenating text to support more datasets (huggingface#8474)

* Fix doc bug (huggingface#8500)

* fix doc bug

Signed-off-by: mymusise <mymusise1@gmail.com>

* fix example bug

Signed-off-by: mymusise <mymusise1@gmail.com>

* Model sharing doc (huggingface#8498)

* Model sharing doc

* Style

* fix SqueezeBertForMaskedLM (huggingface#8479)

* Try to understand and apply Sylvain's comments (huggingface#8458)

* Use LF instead of os.linesep (huggingface#8491)

* Add pretraining loss computation for TF Bert pretraining (huggingface#8470)

* Add pretraining loss computation for TF Bert pretraining

* Fix labels creation

* Fix T5 model

* restore T5 kwargs

* try a generic fix for pretraining models

* Apply style

* Overide the prepare method for the BERT tests

* Remove typo

* Update deepset/roberta-base-squad2 model card (huggingface#8522)

* Update README.md

* Update README.md

* Update doc for v3.5.1

* [T5] Bug correction & Refactor (huggingface#8518)

* fix bug

* T5 refactor

* refactor tf

* apply sylvains suggestions

* Model templates encoder only (huggingface#8509)

* Model templates

* TensorFlow

* Remove pooler

* CI

* Tokenizer + Refactoring

* Encoder-Decoder

* Let's go testing

* Encoder-Decoder in TF

* Let's go testing in TF

* Documentation

* README

* Fixes

* Better names

* Style

* Update docs

* Choose to skip either TF or PT

* Code quality fixes

* Add to testing suite

* Update file path

* Cookiecutter path

* Update `transformers` path

* Handle rebasing

* Remove seq2seq from model templates

* Remove s2s config

* Apply Sylvain and Patrick comments

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Last fixes from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix paths in github YAML

* Model sharing doc: more tweaks (huggingface#8520)

* More doc tweaks

* Update model_sharing.rst

* make style

* missing newline

* Add email tip

Co-authored-by: Pierric Cistac <pierric@huggingface.co>

* Add bart-large-mnli model card (huggingface#8527)

* fix load weights (huggingface#8528)

* fix load weights

* delete line

* Rework some TF tests (huggingface#8492)

* Update some tests

* Small update

* Apply style

* Use max_position_embeddings

* Create a fake attribute

* Create a fake attribute

* Update wrong name

* Wrong TransfoXL model file

* Keep the common tests agnostic

* [breaking|pipelines|tokenizers] Adding slow-fast tokenizers equivalence tests pipelines - Removing sentencepiece as a required dependency (huggingface#8073)

* Fixing roberta for slow-fast tests

* WIP getting equivalence on pipelines

* slow-to-fast equivalence - working on question-answering pipeline

* optional FAISS tests

* Pipeline Q&A

* Move pipeline tests to their own test job again

* update tokenizer to add sequence id methods

* update to tokenizers 0.9.4

* set sentencepiecce as optional

* clean up squad

* clean up pipelines to use sequence_ids

* style/quality

* wording

* Switch to use_fast = True by default

* update tests for use_fast at True by default

* fix rag tokenizer test

* removing protobuf from required dependencies

* fix NER test for use_fast = True by default

* fixing example tests (Q&A examples use slow tokenizers for now)

* protobuf in main deps extras["sentencepiece"] and example deps

* fix protobug install test

* try to fix seq2seq by switching to slow tokenizers for now

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Create README.md for Chinese RoBERTa Miniatures (huggingface#8550)

* Create README.md

* Update model_cards/uer/chinese_roberta_L-2_H-128/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Readme for News Headline Generation (bert2bert) (huggingface#8557)

* Readme for Wiki Summary [Persian] bert2bert (huggingface#8558)

* Clearer Model Versioning Example (huggingface#8562)

* [doc] typo fix (huggingface#8535)

* [doc] typo fix

@sgugger

* Update src/transformers/modeling_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Adding the prepare_seq2seq_batch function to ProphetNet (huggingface#8515)

* Simply insert T5Tokenizer's prepare_seq2seq_batch

* Update/Add some 'import'

* fix RunTimeError caused by '.view'

* Moves .view related error avoidance from seq2seq_trainer to inside prophetnet

* Update test_tokenization_prophetnet.py

* Format the test code with black

* Re-format the test code

* Update test_tokenization_prophetnet.py

* Add importing require_torch in the test code

* Add importing BatchEncoding in the test code

* Re-format the test code on Colab

* Fix GPT2DoubleHeadsModel to work with model.generate() (huggingface#6601)

* Fix passing token_type_ids during GPT2DoubleHeadsModel.generate() if used

and for GPT2LMHeadModel too

* Update tests to check token_type_ids usage in GPT2 models

* Update version to v4.0.0-dev (huggingface#8568)

* Switch `return_dict` to `True` by default. (huggingface#8530)

* Use the CI to identify failing tests

* Remove from all examples and tests

* More default switch

* Fixes

* More test fixes

* More fixes

* Last fixes hopefully

* Use the CI to identify failing tests

* Remove from all examples and tests

* More default switch

* Fixes

* More test fixes

* More fixes

* Last fixes hopefully

* Run on the real suite

* Fix slow tests

* Fix mixed precision issue for GPT2 (huggingface#8572)

* Fix mixed precision issue for GPT2

* Forgot one cast

* oops

* Forgotten casts

* Reorganize repo (huggingface#8580)

* Put models in subfolders

* Styling

* Fix imports in tests

* More fixes in test imports

* Sneaky hidden imports

* Fix imports in doc files

* More sneaky imports

* Finish fixing tests

* Fix examples

* Fix path for copies

* More fixes for examples

* Fix dummy files

* More fixes for example

* More model import fixes

* Is this why you're unhappy GitHub?

* Fix imports in conver command

* model_card for indolem/indobert-base-uncased (huggingface#8579)

* T5 & mT5 (huggingface#8552)

* add mt5 and t5v1_1 model

* fix tests

* correct some imports

* add tf model

* finish tf t5

* improve examples

* fix copies

* clean doc

* [MT5] More docs (huggingface#8589)

* add docs

* make style

* Add __init__ to the models folder

* Fix init for MT5 (huggingface#8591)

* Tokenizers: ability to load from model subfolder (huggingface#8586)

* <small>tiny typo</small>

* Tokenizers: ability to load from model subfolder

* use subfolder for local files as well

* Uniformize model shortcut name => model id

* from s3 => from huggingface.co

Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>

* Fix model templates (huggingface#8595)

* First fixes

* Fix imports and add init

* Fix typo

* Move init to final dest

* Fix tokenization import

* More fixes

* Styling

* these should run fine on multi-gpu (huggingface#8582)

* Fix check repo utils (huggingface#8600)

* Tokenizers should be framework agnostic (huggingface#8599)

* Tokenizers should be framework agnostic

* Run the slow tests

* Not testing

* Fix documentation

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Remove deprecated (huggingface#8604)

* Remove old deprecated arguments

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>

* Remove needless imports

* Fix tests

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>

* Add Harry Potter Model Card (huggingface#8605)

* Add Harry Potter Model

* Update model_cards/ceostroff/harry-potter-gpt2-fanfiction/README.md

* Update model_cards/ceostroff/harry-potter-gpt2-fanfiction/README.md

* Update model_cards/ceostroff/harry-potter-gpt2-fanfiction/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Remove old doc

* Fixed link to the wrong paper. (huggingface#8607)

* Reset loss to zero on logging in Trainer to avoid bfloat16 issues (huggingface#8561)

* make tr_loss regular float

* Revert "make tr_loss regular float"

This reverts commit c9d7ccf.

* reset loss at each logging step

* keep track of total loss with _total_loss_scalar

* add remaining tr_loss at the end

* Fix DataCollatorForLanguageModeling (huggingface#8621)

* Fix missing space in multiline warning (huggingface#8593)

Multiline string informing about missing PyTorch/TensorFlow had missing space.

* [s2s] broken test (huggingface#8613)

* fix to adjust for huggingface#8530 changes (huggingface#8612)

* self.self.activation_dropout -> self.activation_dropout (huggingface#8611)

(one line typo)

* New TF loading weights (huggingface#8490)

* New TF loading weights

* apply style

* Better naming

* Largely comment the loading method

* Apply style

* Address Patrick's comments

* Remove useless line of code

* Update Docstring

* Address Sylvain's and Lysandre's comments

* Simplify the names computation

* Typos

* Adding PrefixConstrainedLogitsProcessor (huggingface#8529)

* Adding PrefixConstrainedLogitsProcessor

* fixing RAG and style_doc

* fixing black (v20 instead of v19)

* Improving doc in generation_logits_process.py

* Improving docs and typing in generation_utils.py

* docs improvement

* adding test and fixing doc typo

* fixing doc_len

* isort on test

* fixed test

* improve docstring a bit

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* [Tokenizer Doc] Improve tokenizer summary (huggingface#8622)

* improve summary

* small fixes

* cleaned line length

* correct "" formatting

* apply sylvains suggestions

* Fixes the training resuming with gradient accumulation (huggingface#8624)

* Fix training from scratch in new scripts (huggingface#8623)

* model_cards for Chinese Couplet and Poem GPT2 models (huggingface#8620)

* replace performance table with markdown (huggingface#8565)

* replace performance table with markdown

* Update model_cards/smanjil/German-MedBERT/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Update README.md (huggingface#8544)

Modified Model in Action section. The class `AutoModelWithLMHead` is deprecated so changed it to `AutoModelForSeq2SeqLM` for encoder-decoder models. Removed duplicate eos token.

* Model Card for abhilash1910/financial_roberta (huggingface#8625)

* Model Card for abhilash1910/financial_roberta

* Update model_cards/abhilash1910/financial_roberta/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Update README.md (huggingface#8405)

* Update README.md

* Update README.md

* Add model card for ai4bharat/indic-bert (huggingface#8464)

* Create README.md (huggingface#8363)

* Model card: T5-base fine-tuned on QuaRTz (huggingface#8369)

* Model card: T5-base fine-tuned on QuaRTz

* Update model_cards/mrm8488/t5-base-finetuned-quartz/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Create README.md (huggingface#8362)

* Created ModelCard for Hel-ach-en MT model (huggingface#8496)

* Updated ModelCard

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* [s2s] distillation apex breaks return_dict obj (huggingface#8631)

* apex breaks return_dict obj

* style

* grammar (huggingface#8639)

* Update README.md (huggingface#8635)

* Updated the Extractive Question Answering code snippets (huggingface#8636)

* Updated the Extractive Question Answering code snippets

The Extractive Question Answering code snippets do not work anymore since the models return task-specific output objects. This commit fixes the pytorch and tensorflow examples but adding `.values()` to the model call.

* Update task_summary.rst

* Add cards for all Geotrend models (huggingface#8617)

* docs(bert-base-15lang-cased): add model card

* add cards for all Geotrend models

* [model cards] fix language tag for all Geotrend models

* [model card] : fix bert-base-15lang-cased (huggingface#8655)

the table was badly formatted because of a single line break

* fix missing return dict (huggingface#8653)

* fixed imports

* update example conversion

* removed redundant tests

* added back marker

* reverted to old QA Argument Handler

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sean Naren <sean@grid.ai>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Santiago Castro <sacastro@umich.edu>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Zhiqi Huang <mazicwong@gmail.com>
Co-authored-by: Manuel Romero <mrm8488@gmail.com>
Co-authored-by: Ashwani Tanwar <ashwanitanwar333@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Ethan <9592150+Ethan-yt@users.noreply.github.com>
Co-authored-by: gurkan08 <33202187+gurkan08@users.noreply.github.com>
Co-authored-by: dartrevan <awesombatsy@gmail.com>
Co-authored-by: Branden Chan <33759007+brandenchan@users.noreply.github.com>
Co-authored-by: yantan <yantan@effyic.com>
Co-authored-by: Santiago Castro <santi.1410@hotmail.com>
Co-authored-by: wlhgtc <hgtcwl@foxmail.com>
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: TFUsers <25044281+TFUsers@users.noreply.github.com>
Co-authored-by: TFUsers <TFUsers@gmail.com>
Co-authored-by: Avital Oliver <avital@thewe.net>
Co-authored-by: Abi See <abigail.e.see@gmail.com>
Co-authored-by: guillaume-be <guillaume.becquin@gmail.com>
Co-authored-by: Kushal <32245327+kushalj001@users.noreply.github.com>
Co-authored-by: Martin Monperrus <martin.monperrus@gnieh.org>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Philip May <eniak.info@gmail.com>
Co-authored-by: Ceyda Cinarel <15624271+cceyda@users.noreply.github.com>
Co-authored-by: Ceyda Cinarel <snu-ceyda@users.noreply.github.com>
Co-authored-by: Pengzhi Gao <pengzhi.gao@petuum.com>
Co-authored-by: Victor SANH <victorsanh@gmail.com>
Co-authored-by: Yifan Peng <pengyifan.mail@gmail.com>
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: Guillem García Subies <37592763+GuillemGSubies@users.noreply.github.com>
Co-authored-by: Bobby Donchev <contact@donchev.is>
Co-authored-by: Guillaume Filion <guillaume.filion@gmail.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Jiaxin Pei <pedropei@vip.qq.com>
Co-authored-by: smanjil <shresthamanjil21@gmail.com>
Co-authored-by: Karthik Uppuluri <karthik.uppuluri@gmail.com>
Co-authored-by: hasantanvir79 <hasantanvir79@gmail.com>
Co-authored-by: ktrapeznikov <ktrapeznikov@gmail.com>
Co-authored-by: hassoudi <hassoudi@gmail.com>
Co-authored-by: Jonathan Chang <31893406+cccntu@users.noreply.github.com>
Co-authored-by: Yossi Synett <github@yossisynett.com>
Co-authored-by: Chengxi Guo <mymusise1@gmail.com>
Co-authored-by: Manav Rathod <manav.rathod@berkeley.edu>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
Co-authored-by: Shashank Gupta <shaz4194@gmail.com>
Co-authored-by: Teven <teven.lescao@gmail.com>
Co-authored-by: Shichao Sun <sunshichao1995@gmail.com>
Co-authored-by: Pedro <pedro.colon4@upr.edu>
Co-authored-by: pedro <pe25171@mit.edu>
Co-authored-by: sarnoult <31313050+sarnoult@users.noreply.github.com>
Co-authored-by: Ratthachat (Jung) <56621342+ratthachat@users.noreply.github.com>
Co-authored-by: patrickvonplaten <patrick@huggingface.co>
Co-authored-by: Beomsoo Kim <bluewhale8202@gmail.com>
Co-authored-by: Funtowicz Morgan <mfuntowicz@users.noreply.github.com>
Co-authored-by: Sumithra Bhakthavatsalam <sumithra.b@gmail.com>
Co-authored-by: Antonio Lanza <antoniolanza1996@gmail.com>
Co-authored-by: zeyuyun1 <43428393+zeyuyun1@users.noreply.github.com>
Co-authored-by: Forrest Iandola <fiandola@gmail.com>
Co-authored-by: Pierric Cistac <pierric@huggingface.co>
Co-authored-by: Joe Davison <josephddavison@gmail.com>
Co-authored-by: zhezhaoa <1152543959@qq.com>
Co-authored-by: Mehrdad Farahani <m3hrdadfi@gmail.com>
Co-authored-by: Yusuke Mori <mori@mi.t.u-tokyo.ac.jp>
Co-authored-by: LSinev <LSinev@users.noreply.github.com>
Co-authored-by: fajri91 <fajri91@users.noreply.github.com>
Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
Co-authored-by: Caitlin Ostroff <caitlin.ostroff@gmail.com>
Co-authored-by: cronoik <johannes.schaffrath@mail.de>
Co-authored-by: Benjamin Minixhofer <bminixhofer@gmail.com>
Co-authored-by: Michał Pogoda <237372@student.pwr.edu.pl>
Co-authored-by: Nicola De Cao <nicola.decao@gmail.com>
Co-authored-by: hhou435 <59219579+hhou435@users.noreply.github.com>
Co-authored-by: Vishal Singh <vishalsingh7x@gmail.com>
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
Co-authored-by: Divyanshu Kakwani <divkakwani@gmail.com>
Co-authored-by: Perez Ogayo <pogayo17@alustudent.com>
Co-authored-by: Tim Isbister <timisbister@gmail.com>
Co-authored-by: Amine Abdaoui <abdaoui@lirmm.fr>
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
@cceyda cceyda mentioned this pull request Apr 6, 2021
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants