Sync/4.6.1 #183

hSterz · 2021-06-03T15:14:27Z

Updating to v4.6.1

* added model_kwargs to infer_framework_from_model * added model_kwargs to tokenizer * added use_auth_token as named parameter * added dynamic get for use_auth_token

* fix: docstrings in prediction_step * ci: Satisfy line length requirements * ci: character length requirements

…145) * clarify why we get the warning here * Update examples/language-modeling/run_clm.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * wording * style Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* synced gpus * fix * fix * need to use t5-small for quality tests * notes * complete merge * fix a disappearing std stream problem * start zero3 tests * wip * tune params * sorting out the pre-trained model loading * reworking generate loop wip * wip * style * fix tests * split the tests * refactor tests * wip * parameterized * fix * workout the resume from non-ds checkpoint pass + test * cleanup * remove no longer needed code * split getter/setter functions * complete the docs * suggestions * gpus and their compute capabilities link * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * style * remove invalid paramgd * automatically configure zero3 params that rely on hidden size * make _get_resized_embeddings zero3-aware * add test exercising resize_token_embeddings() * add docstring Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Add support for NVIDIA Megatron models * Add support for NVIDIA Megatron GPT2 and BERT Add the megatron_gpt2 model. That model reuses the existing GPT2 model. This commit includes a script to convert a Megatron-GPT2 checkpoint downloaded from NVIDIA GPU Cloud. See examples/megatron-models/README.md for details. Add the megatron_bert model. That model is implemented as a modification of the existing BERT model in Transformers. This commit includes a script to convert a Megatron-BERT checkpoint downloaded from NVIDIA GPU Cloud. See examples/megatron-models/README.md for details. * Update src/transformers/models/megatron_bert/configuration_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/megatron_bert/configuration_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/megatron_bert/configuration_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Remove model.half in tests + add "# Copied ..." Remove the model.half() instruction which makes tests fail on the CPU. Add a comment "# Copied ..." before many classes in the model to enable automatic tracking in CI between the new Megatron classes and the original Bert ones. * Fix issues * Fix Flax/TF tests * Fix copyright * Update src/transformers/models/megatron_bert/configuration_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/megatron_bert/configuration_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update docs/source/model_doc/megatron_bert.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update docs/source/model_doc/megatron_gpt2.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/megatron_bert/modeling_megatron_bert.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Resolve most of 'sgugger' comments * Fix conversion issue + Run make fix-copies/quality/docs * Apply suggestions from code review * Causal LM & merge * Fix init * Add CausalLM to last auto class Co-authored-by: Julien Demouth <jdemouth@nvidia.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

* solve "scheduler before optimizer step" warning * style * correct the state evaluation test

* Add fairscale and deepspeed back to the CI * Add deepspeed to single GPU tests

* Add mlm collator pad to multiple option (#10627) * Use padding to 8x in run mlm (#10627)

* relocate core integration tests * add sys.path context manager * cleanup * try * try2 * fix path * doc * style * add dep * add 2 more deps

* extras[doc] must include 'all' * fix * better * regroup

* Add support for multiple models for one config in auto classes * Use get_values everywhere * Prettier doc

* make fairscale and deepspeed setup extras * fix default * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * no reason not to ask for the good version * update the CIs Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* typo * style

…YT Clips (#11142) * Add Wav2Vec Inference notebook * Update docs/source/community.md Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Change duplicated LogitsProcessor to LogitsWarper in LogitsProcessorList document * Write more detailed information about LogitsProcessor's scores argument * apply suggestion from review * style Co-authored-by: Suraj Patil <surajp815@gmail.com>

Corrected a typo ('Downlowd' to 'Download')

* Add a special tokenizer for CPM model * make style * fix * Add docs * styles * cpm doc * fix ci * fix the overview * add test * make style * typo * Custom tokenizer flag * Add REAMDE.md Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

* keep a list of multilingual tokenizers * add forced_bos_token argument

* fix _LazyModule hasher error * reword

* added json dump and extraction of train run time * make style happy

* Autogenerate model cards from the Trainer * ModelCard deprecated * Fix test * Style * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Address review comments * Quality * With all metadata * Metadata * Post-merge conflict mess * Data args and all examples * Default license and languages when possible Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Add test and see where CI is unhappy * Load with strict=False

* Adds Flax BERT finetuning example * fix traced jax tensor type * Use Optax losses and learning schedulers * Add 1GPU training results * merge into master & make style * fix input * del file * Fix bug in loss and add torch runs * finish bert flax fine-tune * Update examples/flax/text-classification/README.md * Update examples/flax/text-classification/run_flax_glue.py * add requirements * finalize * finalize Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Patrick von Platen <patrick@huggingface.co>

* begin second draft * fix import, style * add loss * fix embeds, logits_scale, and projection * fix imports * add conversion script * add feature_extractor and processor * style * add tests for tokenizer, extractor and processor * add vision model tests * add weight init * add more tests * fix save_load test * model output, dosstrings, causal mask * config doc * add clip model tests * return dict * bigin integration test * add integration tests * fix-copies * fix init * Clip => CLIP * fix module name * docs * fix doc * output_dim => projection_dim * fix checkpoint names * remoe fast tokenizer file * fix conversion script * fix tests, quality * put causal mask on device * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * fix attribute test * style * address sylvains comments * style * fix docstrings * add qucik_gelu in activations, docstrings * clean-up attention test * fix act fun * fix config * fix torchscript tests * even batch_size * remove comment * fix ouput tu_tuple * fix save load tests * fix add tokens test * add fast tokenizer * update copyright * new processor API * fix docs * docstrings * docs * fix doc * fix doc * fix tokenizer * fix import in doc example * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * check types of config * valhalla => openai * load image using url * fix test * typo Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix doc url * fix example

* fix encoder-decoder & RAG * finalize * Update src/transformers/models/encoder_decoder/modeling_encoder_decoder.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/rag/modeling_rag.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Patrick von Platen <patrick@huggingface.co> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Fix regression in regression * Add test

calpt

HF changed the structure of their examples folder such that each supported framework has its own subfolder. I think we could move all our examples to the pytorch folder and remove the other (unsupported) framework folders.

(Also, one file in the examples/legacy folder sneaked in again :)

Delete docs folder Add missing imports Add changes to BartModelWithHeads Apply changes to MBartModelWithHeads Fix Fixed examples folder

philschmid and others added 30 commits April 7, 2021 20:32

Adds use_auth_token with pipelines (#11123)

3fd7eee

* added model_kwargs to infer_framework_from_model * added model_kwargs to tokenizer * added use_auth_token as named parameter * added dynamic get for use_auth_token

Fix and refactor check_repo (#11127)

ffe0761

Fix typing error in Trainer class (prediction_step) (#11138)

f8e90d6

* fix: docstrings in prediction_step * ci: Satisfy line length requirements * ci: character length requirements

Typo fix of the name of BertLMHeadModel in BERT doc (#11133)

5bf5d50

[trainer] solve "scheduler before optimizer step" warning (#11144)

1ed24af

* solve "scheduler before optimizer step" warning * style * correct the state evaluation test

Add fairscale and deepspeed back to the CI (#11147)

ba2cf5f

* Add fairscale and deepspeed back to the CI * Add deepspeed to single GPU tests

Updates SageMaker docs for updating DLCs (#11140)

9c9b8e7

Don't duplicate logs in TensorBoard and handle --use_env (#11141)

dfed4ec

Run mlm pad to multiple for fp16 (#11128)

6c40e49

* Add mlm collator pad to multiple option (#10627) * Use padding to 8x in run mlm (#10627)

[tests] relocate core integration tests (#11146)

6644690

* relocate core integration tests * add sys.path context manager * cleanup * try * try2 * fix path * doc * style * add dep * add 2 more deps

[setup] extras[docs] must include 'all' (#11148)

97ccf67

* extras[doc] must include 'all' * fix * better * regroup

Add support for multiple models for one config in auto classes (#11150)

ba8b1f4

* Add support for multiple models for one config in auto classes * Use get_values everywhere * Prettier doc

Skip Megatron tests for now

d31c7b1

Merge branch 'master' of github.com:huggingface/transformers

269c963

typo (#11152)

0311ba2

* typo * style

[Community notebooks] Add Wav2Vec notebook for creating captions for …

8b78a32

…YT Clips (#11142) * Add Wav2Vec Inference notebook * Update docs/source/community.md Co-authored-by: Suraj Patil <surajp815@gmail.com>

Update README.md (#11161)

6060746

Corrected a typo ('Downlowd' to 'Download')

Make get_special_tokens_mask consider all tokens (#11163)

45fc8c7

[examples/translation] support mBART-50 and M2M100 fine-tuning (#11170)

c161dd5

* keep a list of multilingual tokenizers * add forced_bos_token argument

[examples run_clm] fix _LazyModule hasher error (#11168)

07f0bb6

* fix _LazyModule hasher error * reword

added json dump and extraction of train run time (#11167)

6f90c29

* added json dump and extraction of train run time * make style happy

Fix Typo

716120c

Reactivate Megatron tests an use less workers

26212c1

Minor typos fixed (#11182)

a99f7f5

sgugger and others added 18 commits May 11, 2021 11:30

Fix TF Roberta for mixed precision training (#11675)

d9b2862

Test checkpointing (#11682)

f13f1f8

* Add test and see where CI is unhappy * Load with strict=False

Fix clip docs (#11694)

f063c56

* fix doc url * fix example

Updates README and fixes bug (#11701)

6797cdc

remove defaults to None if optional (#11703)

77f4c46

fix example in config doc (#11696)

5c1cda9

Release: v4.6.0

64e7856

Fix doc deployment

25dee4a

Fix pattern in conf.py (#11784)

265c26e

Fix regression in regression (#11785)

c81584a

* Fix regression in regression * Add test

Use new evaluation loop in TrainerQA (#11746)

8924a5f

Fix checkpoint deletion (#11748)

8c8a5d3

Release: v4.6.1

fb27b27

Sync with v4.6.1

0ea5783

calpt added the sync label Jun 3, 2021

hSterz marked this pull request as ready for review June 6, 2021 16:48

hSterz requested a review from calpt June 6, 2021 16:48

calpt reviewed Jun 9, 2021

View reviewed changes

calpt changed the base branch from master to develop June 11, 2021 13:07

calpt force-pushed the sync/4.6.1 branch from 34a7215 to f67aea5 Compare June 14, 2021 07:50

hSterz and others added 3 commits June 14, 2021 10:12

Cleanup after sync

a71d1c8

Delete docs folder Add missing imports Add changes to BartModelWithHeads Apply changes to MBartModelWithHeads Fix Fixed examples folder

Move examples one level up

6b60ce0

Merge branch 'develop' into sync/4.6.1

52e7725

calpt force-pushed the sync/4.6.1 branch from f67aea5 to 52e7725 Compare June 14, 2021 08:15

calpt merged commit a36295c into adapter-hub:develop Jun 14, 2021

hSterz deleted the sync/4.6.1 branch September 29, 2021 14:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync/4.6.1 #183

Sync/4.6.1 #183

hSterz commented Jun 3, 2021

calpt left a comment •

edited

Loading

Sync/4.6.1 #183

Sync/4.6.1 #183

Conversation

hSterz commented Jun 3, 2021

calpt left a comment • edited Loading

Choose a reason for hiding this comment

calpt left a comment •

edited

Loading