sync to transformers master #1

fabiocapsouza · 2020-11-15T15:27:18Z

No description provided.

* Create README.md * Update model_cards/sachaarbonel/bert-italian-cased-finetuned-pos/README.md * Apply suggestions from code review Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Make Seq2Seq Trainer more similar to Trainer * fix typo * fix seq2seq trainer * remove from tests * remove lock * remove train files * delete test files * correct typo * check at init * make sure trainer is not slowed down on TPU * correct isort * remove use cache * fix use cache * add last use chache = false

* Create README.md * Update model_cards/ynie/roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli/README.md Co-authored-by: Julien Chaumond <chaumond@gmail.com> * Add Meta information for dataset identifier. Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Create README.md * Update README.md

Close #8030

#8030

…8006) * fixing #8001 * make T5 tokenizer serialization more robust - style

Minor typo fixes to the tokenizer summary

* Add mixed precision evaluation * use original flag

* distributed training * fix * fix formatting * wording

* Add MLflow integration class Add integration code for MLflow in integrations.py along with the code that checks that MLflow is installed. * Add MLflowCallback import Add import of MLflowCallback in trainer.py * Handle model argument Allow the callback to handle model argument and store model config items as hyperparameters. * Log parameters to MLflow in batches MLflow cannot log more than a hundred parameters at once. Code added to split the parameters into batches of 100 items and log the batches one by one. * Fix style * Add docs on MLflow callback * Fix issue with unfinished runs The "fluent" api used in MLflow integration allows only one run to be active at any given moment. If the Trainer is disposed off and a new one is created, but the training is not finished, it will refuse to log the results when the next trainer is created. * Add MLflow integration class Add integration code for MLflow in integrations.py along with the code that checks that MLflow is installed. * Add MLflowCallback import Add import of MLflowCallback in trainer.py * Handle model argument Allow the callback to handle model argument and store model config items as hyperparameters. * Log parameters to MLflow in batches MLflow cannot log more than a hundred parameters at once. Code added to split the parameters into batches of 100 items and log the batches one by one. * Fix style * Add docs on MLflow callback * Fix issue with unfinished runs The "fluent" api used in MLflow integration allows only one run to be active at any given moment. If the Trainer is disposed off and a new one is created, but the training is not finished, it will refuse to log the results when the next trainer is created.

* Fix minor typos Fix minor typos in the docs. * Update docs/source/preprocessing.rst Clearer data structure description. Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

…doc (#8053)

--wwm cant be used as an argument given run_language_modeling.py and should be changed to --whole_word_mask

… pad_token (#8043) * make sure padding is implemented for non-padding tokens models as well * add better error message * add better warning * remove results files * Update examples/seq2seq/seq2seq_trainer.py * remove unnecessary copy line * correct usage of labels * delete test files

@sgugger

* First addition of Flax/Jax documentation Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * make style * Ensure input order match between Bert & Roberta Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Install dependencies "all" when building doc Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * wraps build_doc deps with "" Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Addressing @sgugger comments. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Use list to highlight JAX features. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Make style. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Let's not look to much into the future for now. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Style Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Update deploy-docs dependencies on CI to enable Flax Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Added pair of "" Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

@Pierrci

…ames cc @Pierrci

* fix doc bug Signed-off-by: mymusise <mymusise1@gmail.com> * fix example bug Signed-off-by: mymusise <mymusise1@gmail.com>

* Model sharing doc * Style

* Add pretraining loss computation for TF Bert pretraining * Fix labels creation * Fix T5 model * restore T5 kwargs * try a generic fix for pretraining models * Apply style * Overide the prepare method for the BERT tests

* Update README.md * Update README.md

* fix bug * T5 refactor * refactor tf * apply sylvains suggestions

* Model templates * TensorFlow * Remove pooler * CI * Tokenizer + Refactoring * Encoder-Decoder * Let's go testing * Encoder-Decoder in TF * Let's go testing in TF * Documentation * README * Fixes * Better names * Style * Update docs * Choose to skip either TF or PT * Code quality fixes * Add to testing suite * Update file path * Cookiecutter path * Update `transformers` path * Handle rebasing * Remove seq2seq from model templates * Remove s2s config * Apply Sylvain and Patrick comments * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Last fixes from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* More doc tweaks * Update model_sharing.rst * make style * missing newline * Add email tip Co-authored-by: Pierric Cistac <pierric@huggingface.co>

* fix load weights * delete line

* Update some tests * Small update * Apply style * Use max_position_embeddings * Create a fake attribute * Create a fake attribute * Update wrong name * Wrong TransfoXL model file * Keep the common tests agnostic

* neFLOs calculation, logging, and reloading (#1) * testing distributed consecutive batches * fixed AttributeError from DataParallel * removed verbosity * rotate with use_mtime=True * removed print * fixed interaction with gradient accumulation * indent formatting * distributed neflo counting * fixed typo * fixed typo * mean distributed losses * exporting log history * moved a few functions * floating_point_ops clarification for transformers with parameter-reuse * code quality * double import * made flo estimation more task-agnostic * only logging flos if computed * code quality * unused import * Update src/transformers/trainer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Sylvain review * Update src/transformers/modeling_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * black Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* ready for PR * cleanup * correct FSMT_PRETRAINED_MODEL_ARCHIVE_LIST * fix * perfectionism * revert change from another PR * odd, already committed this one * non-interactive upload workaround * backup the failed experiment * store langs in config * workaround for localizing model path * doc clean up as in huggingface#6956 * style * back out debug mode * document: run_eval.py --num_beams 10 * remove unneeded constant * typo * re-use bart's Attention * re-use EncoderLayer, DecoderLayer from bart * refactor * send to cuda and fp16 * cleanup * revert (moved to another PR) * better error message * document run_eval --num_beams * solve the problem of tokenizer finding the right files when model is local * polish, remove hardcoded config * add a note that the file is autogenerated to avoid losing changes * prep for org change, remove unneeded code * switch to model4.pt, update scores * s/python/bash/ * missing init (but doesn't impact the finetuned model) * cleanup * major refactor (reuse-bart) * new model, new expected weights * cleanup * cleanup * full link * fix model type * merge porting notes * style * cleanup * have to create a DecoderConfig object to handle vocab_size properly * doc fix * add note (not a public class) * parametrize * - add bleu scores integration tests * skip test if sacrebleu is not installed * cache heavy models/tokenizers * some tweaks * remove tokens that aren't used * more purging * simplify code * switch to using decoder_start_token_id * add doc * Revert "major refactor (reuse-bart)" This reverts commit 226dad1. * decouple from bart * remove unused code #1 * remove unused code huggingface#2 * remove unused code huggingface#3 * update instructions * clean up * move bleu eval to examples * check import only once * move data+gen script into files * reuse via import * take less space * add prepare_seq2seq_batch (auto-tested) * cleanup * recode test to use json instead of yaml * ignore keys not needed * use the new -y in transformers-cli upload -y * [xlm tok] config dict: fix str into int to match definition (huggingface#7034) * [s2s] --eval_max_generate_length (huggingface#7018) * Fix CI with change of name of nlp (huggingface#7054) * nlp -> datasets * More nlp -> datasets * Woopsie * More nlp -> datasets * One last * extending to support allen_nlp wmt models - allow a specific checkpoint file to be passed - more arg settings - scripts for allen_nlp models * sync with changes * s/fsmt-wmt/wmt/ in model names * s/fsmt-wmt/wmt/ in model names (p2) * s/fsmt-wmt/wmt/ in model names (p3) * switch to a better checkpoint * typo * make non-optional args such - adjust tests where possible or skip when there is no other choice * consistency * style * adjust header * cards moved (model rename) * use best custom hparams * update info * remove old cards * cleanup * s/stas/facebook/ * update scores * s/allen_nlp/allenai/ * url maps aren't needed * typo * move all the doc / build /eval generators to their own scripts * cleanup * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * fix indent * duplicated line * style * use the correct add_start_docstrings * oops * resizing can't be done with the core approach, due to 2 dicts * check that the arg is a list * style * style Co-authored-by: Sam Shleifer <sshleifer@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* neFLOs calculation, logging, and reloading (#1) * testing distributed consecutive batches * fixed AttributeError from DataParallel * removed verbosity * rotate with use_mtime=True * removed print * fixed interaction with gradient accumulation * indent formatting * distributed neflo counting * fixed typo * fixed typo * mean distributed losses * exporting log history * moved a few functions * floating_point_ops clarification for transformers with parameter-reuse * code quality * double import * made flo estimation more task-agnostic * only logging flos if computed * code quality * unused import * Update src/transformers/trainer.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Sylvain review * Update src/transformers/modeling_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * black Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* ready for PR * cleanup * correct FSMT_PRETRAINED_MODEL_ARCHIVE_LIST * fix * perfectionism * revert change from another PR * odd, already committed this one * non-interactive upload workaround * backup the failed experiment * store langs in config * workaround for localizing model path * doc clean up as in huggingface#6956 * style * back out debug mode * document: run_eval.py --num_beams 10 * remove unneeded constant * typo * re-use bart's Attention * re-use EncoderLayer, DecoderLayer from bart * refactor * send to cuda and fp16 * cleanup * revert (moved to another PR) * better error message * document run_eval --num_beams * solve the problem of tokenizer finding the right files when model is local * polish, remove hardcoded config * add a note that the file is autogenerated to avoid losing changes * prep for org change, remove unneeded code * switch to model4.pt, update scores * s/python/bash/ * missing init (but doesn't impact the finetuned model) * cleanup * major refactor (reuse-bart) * new model, new expected weights * cleanup * cleanup * full link * fix model type * merge porting notes * style * cleanup * have to create a DecoderConfig object to handle vocab_size properly * doc fix * add note (not a public class) * parametrize * - add bleu scores integration tests * skip test if sacrebleu is not installed * cache heavy models/tokenizers * some tweaks * remove tokens that aren't used * more purging * simplify code * switch to using decoder_start_token_id * add doc * Revert "major refactor (reuse-bart)" This reverts commit 226dad1. * decouple from bart * remove unused code #1 * remove unused code huggingface#2 * remove unused code huggingface#3 * update instructions * clean up * move bleu eval to examples * check import only once * move data+gen script into files * reuse via import * take less space * add prepare_seq2seq_batch (auto-tested) * cleanup * recode test to use json instead of yaml * ignore keys not needed * use the new -y in transformers-cli upload -y * [xlm tok] config dict: fix str into int to match definition (huggingface#7034) * [s2s] --eval_max_generate_length (huggingface#7018) * Fix CI with change of name of nlp (huggingface#7054) * nlp -> datasets * More nlp -> datasets * Woopsie * More nlp -> datasets * One last * extending to support allen_nlp wmt models - allow a specific checkpoint file to be passed - more arg settings - scripts for allen_nlp models * sync with changes * s/fsmt-wmt/wmt/ in model names * s/fsmt-wmt/wmt/ in model names (p2) * s/fsmt-wmt/wmt/ in model names (p3) * switch to a better checkpoint * typo * make non-optional args such - adjust tests where possible or skip when there is no other choice * consistency * style * adjust header * cards moved (model rename) * use best custom hparams * update info * remove old cards * cleanup * s/stas/facebook/ * update scores * s/allen_nlp/allenai/ * url maps aren't needed * typo * move all the doc / build /eval generators to their own scripts * cleanup * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * fix indent * duplicated line * style * use the correct add_start_docstrings * oops * resizing can't be done with the core approach, due to 2 dicts * check that the arg is a list * style * style Co-authored-by: Sam Shleifer <sshleifer@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

mazicwong and others added 30 commits October 23, 2020 10:53

Create README.md (#7997)

43fdafe

Add model cards for DynaBERT (#7999)

6e07c1f

Create model card for bert-italian-cased-finetuned-pos (#8003)

59b5953

* Create README.md * Update model_cards/sachaarbonel/bert-italian-cased-finetuned-pos/README.md * Apply suggestions from code review Co-authored-by: Julien Chaumond <chaumond@gmail.com>

[doc prepare_seq2seq_batch] fix docs (#8013)

38f6739

[Model Card] DJSammy/bert-base-danish-uncased_BotXO,ai (#8025)

5148f43

* Create README.md * Update README.md

Fixup #8025

efc4a21

Close #8030

[model_cards] bert-base-danish Fixup

7087d9b

#8030

[tokenizers] Fixing #8001 - Adding tests on tokenizers serialization (#…

79eb391

…8006) * fixing #8001 * make T5 tokenizer serialization more robust - style

Remove codecov.yml

829b9f8

Minor typo fixes to the tokenizer summary (#8045)

9aa2826

Minor typo fixes to the tokenizer summary

Add mixed precision evaluation (#8036)

c153bcc

* Add mixed precision evaluation * use original flag

[docs] [testing] distributed training (#7993)

101186b

* distributed training * fix * fix formatting * wording

fsmt slow test uses lists (#8031)

f20aec1

update version for scipy (#7998)

20a0894

Cleanup pytorch tests (#8033)

8bbe824

Fix label name in DataCollatorForNextSentencePrediction test (#8048)

0774786

Tiny TF Bart fixes (#8023)

8be9cb0

minor model card description updates (#8051)

b0a9076

Minor error fix of 'bart-large-cnn' details in the pretrained_models …

a9ac1db

…doc (#8053)

add mutliclass field to default zero shot example

fbcddb8

Update README.md (#8050)

098ddc2

--wwm cant be used as an argument given run_language_modeling.py and should be changed to --whole_word_mask

Fix + Test (#8049)

cbad90d

fixing crash (#8057)

7ff7c49

[TF] from_pt should respect authorized_unexpected_keys (#8056)

bc9332b

Fix TF training arguments instantiation (#8063)

3a10764

bombs-kim and others added 25 commits November 11, 2020 12:29

Replaced some iadd operations on lists with proper list methods. (#8433)

aa2a2c6

Skip test until investigation

c7b6bbe

[s2s] distill t5-large -> t5-small (#8376)

81ebd70

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

Update deploy-docs dependencies on CI to enable Flax (#8475)

121c24e

* Update deploy-docs dependencies on CI to enable Flax Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Added pair of "" Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

[model_cards] other chars than [\w\-_] not allowed anymore in model n…

c6c08eb

…ames cc @Pierrci

Fix typo in roberta-base-squad2-v2 model card (#8489)

17b1fd8

quick fix on concatenating text to support more datasets (#8474)

924c624

Fix doc bug (#8500)

d65e0bf

* fix doc bug Signed-off-by: mymusise <mymusise1@gmail.com> * fix example bug Signed-off-by: mymusise <mymusise1@gmail.com>

Model sharing doc (#8498)

7933054

* Model sharing doc * Style

fix SqueezeBertForMaskedLM (#8479)

0fa0349

Try to understand and apply Sylvain's comments (#8458)

27b3ff3

Use LF instead of os.linesep (#8491)

91a67b7

Add pretraining loss computation for TF Bert pretraining (#8470)

5d80539

* Add pretraining loss computation for TF Bert pretraining * Fix labels creation * Fix T5 model * restore T5 kwargs * try a generic fix for pretraining models * Apply style * Overide the prepare method for the BERT tests

Remove typo

0c9bae0

Update deepset/roberta-base-squad2 model card (#8522)

4df6b59

* Update README.md * Update README.md

Update doc for v3.5.1

bb03a14

Merge remote-tracking branch 'origin/master'

42f63e3

[T5] Bug correction & Refactor (#8518)

42e2d02

* fix bug * T5 refactor * refactor tf * apply sylvains suggestions

Fix paths in github YAML

9d519da

Model sharing doc: more tweaks (#8520)

7252697

* More doc tweaks * Update model_sharing.rst * make style * missing newline * Add email tip Co-authored-by: Pierric Cistac <pierric@huggingface.co>

Add bart-large-mnli model card (#8527)

f6f4da8

fix load weights (#8528)

f6cdafd

* fix load weights * delete line

Rework some TF tests (#8492)

24184e7

* Update some tests * Small update * Apply style * Use max_position_embeddings * Create a fake attribute * Create a fake attribute * Update wrong name * Wrong TransfoXL model file * Keep the common tests agnostic

fabiocapsouza merged commit 3d2671a into fabiocapsouza:master Nov 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync to transformers master #1

sync to transformers master #1

fabiocapsouza commented Nov 15, 2020

sync to transformers master #1

sync to transformers master #1

Conversation

fabiocapsouza commented Nov 15, 2020