Sync/v3.0.2 #55

calpt · 2020-09-07T08:40:52Z

No description provided.

* utils_ner: do not add extra sep token for RoBERTa model * run_pl_ner: do not add extra sep token for RoBERTa model

* Create README.md * Update model_cards/ipuneetrathore/bert-base-cased-finetuned-finBERT/README.md Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* manually set device in trainer args * check if current device is cuda before set_device * Explicitly set GPU ID when using single GPU This addresses huggingface/transformers#4657 (comment)

* Make DataCollator a callable * Update src/transformers/data/data_collator.py Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Increase pipeline support for ONNX export. * Style.

* feat(tftrainer): improve logging * fix(trainer): consider case with evaluation only * refactor(tftrainer): address comments * refactor(tftrainer): move self.epoch_logging to __init__

* fix test * Update tests/test_modeling_common.py * Update tests/test_modeling_common.py

…kenized pipeline - fast tokenizers - tests (#4510) * Use tokenizers pre-tokenized pipeline * failing pretrokenized test * Fix is_pretokenized in python * add pretokenized tests * style and quality * better tests for batched pretokenized inputs * tokenizers clean up - new padding_strategy - split the files * [HUGE] refactoring tokenizers - padding - truncation - tests * style and quality * bump up requied tokenizers version to 0.8.0-rc1 * switched padding/truncation API - simpler better backward compat * updating tests for custom tokenizers * style and quality - tests on pad * fix QA pipeline * fix backward compatibility for max_length only * style and quality * Various cleans up - add verbose * fix tests * update docstrings * Fix tests * Docs reformatted * __call__ method documented Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

* Add `DistilBertForMultipleChoice`

* Added is_fast property on BatchEncoding to indicate if the object comes from a Fast Tokenizer. * Added __get_state__() & __set_state__() to be pickable. * Correct tokens() return type from List[int] to List[str] * Added unittest for BatchEncoding pickle/unpickle * Added unittest for BatchEncoding is_fast * More careful checking on BatchEncoding unpickle tests. * Formatting. * is_fast should assertTrue on Rust tokenizers. * Ensure tensorflow has correct way of checking array_equal * More formatting.

* Add reference to NLP (package) dataset * Update README.md Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Add reference to NLP dataset * Update README.md Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Convert hans to Trainer * Tick box

* add eli5 examples * add dense query script * query_di * merging * merging * add_utils * adds nearest neighbor wikipedia * batch queries * training_retriever * new notebooks * moved retriever traiing script * finished wiki40b * max_len_fix * train_s2s * retriever_batch_checkpointing * cleanup * merge * dim_fix * fix_indexer * fix_wiki40b_snippets * fix_embed_for_r * fp32 index * fix_sparse_q * joint_training * remove obsolete datasets * add_passage_nn_results * add_passage_nn_results * add_batch_nn * add_batch_nn * add_data_scripts * notebook * notebook * notebook * fix_multi_gpu * add_app * full_caching * full_caching * notebook * sparse_done * images * notebook * add_image_gif * with_Gif * add_contr_image * notebook * notebook * notebook * train_functions * notebook * min_retrieval_length * pandas_option * notebook * min_retrieval_length * notebook * notebook * eval_Retriever * notebook * images * notebook * add_example * add_example * notebook * fireworks * notebook * notebook * joe's notebook comments * app_update * notebook * notebook_link * captions * notebook * assing RetriBert model * add RetriBert to Auto * change AutoLMHead to AutoSeq2Seq * notebook downloads from hf models * style_black * style_black * app_update * app_update * fix_app_update * style * style * isort * Delete WikiELI5training.ipynb * Delete evaluate_eli5.py * Delete WikiELI5explore.ipynb * Delete ExploreWikiELI5Support.html * Delete explainlikeimfive.py * Delete wiki_snippets.py * children before parent * children before parent * style_black * style_black_only * isort * isort_new * Update src/transformers/modeling_retribert.py Co-authored-by: Julien Chaumond <chaumond@gmail.com> * typo fixes * app_without_asset * cleanup * Delete ELI5animation.gif * Delete ELI5contrastive.svg * Delete ELI5wiki_index.svg * Delete choco_bis.svg * Delete fireworks.gif * Delete huggingface_logo.jpg * Delete huggingface_logo.svg * Delete Long_Form_Question_Answering_with_ELI5_and_Wikipedia.ipynb * Delete eli5_app.py * Delete eli5_utils.py * readme * Update README.md * unused imports * moved_info * default_beam * ftuned model * disclaimer * Update src/transformers/modeling_retribert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * black * add_doc * names * isort_Examples * isort_Examples * Add doc to index Co-authored-by: Julien Chaumond <chaumond@gmail.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

* Reorganize topics and add all models

* Update installation page and add contributing to the doc * Remove mention of symlinks

* fix warning * style and quality

* GPT2 tokenizer should not output token type IDs * Same for OpenAIGPT

* Fix #5507 * Fix formatting

* BertTokenizerFast - Do not specify strip_accents by default * Bump tokenizers to new version * Add test for AddedToken serialization

stefan-it and others added 30 commits June 15, 2020 08:30

NER: fix construction of input examples for RoBERTa (#4943)

d812e6d

* utils_ner: do not add extra sep token for RoBERTa model * run_pl_ner: do not add extra sep token for RoBERTa model

Create README.md (#4975)

66bcfbb

* Create README.md * Update model_cards/ipuneetrathore/bert-base-cased-finetuned-finBERT/README.md Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Possible fix to make AMP work with DDP in the trainer (#4728)

f7c93b3

* manually set device in trainer args * check if current device is cuda before set_device * Explicitly set GPU ID when using single GPU This addresses huggingface/transformers#4657 (comment)

Make DataCollator a callable (#5015)

1affde2

* Make DataCollator a callable * Update src/transformers/data/data_collator.py Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Increase pipeline support for ONNX export. (#5005)

7b685f5

* Increase pipeline support for ONNX export. * Style.

Add bart-base (#5014)

a9f1fc6

Fix importing transformers on Windows (#4997)

7b5a1e7

feat(TFTrainer): improve logging (#4946)

1bf4098

* feat(tftrainer): improve logging * fix(trainer): consider case with evaluation only * refactor(tftrainer): address comments * refactor(tftrainer): move self.epoch_logging to __init__

Add position_ids (#5021)

bbad4c6

[Bart] Question Answering Model is added to tests (#5024)

ebba39e

* fix test * Update tests/test_modeling_common.py * Update tests/test_modeling_common.py

Add DistilBertForMultipleChoice (#5032)

f9f8a53

* Add `DistilBertForMultipleChoice`

refactor(wandb): consolidate import (#5044)

edcb3ac

Add reference to NLP (package) dataset (#5029)

0946d12

* Add reference to NLP (package) dataset * Update README.md Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Add reference to NLP dataset (#5028)

0c55a38

* Add reference to NLP dataset * Update README.md Co-authored-by: Julien Chaumond <chaumond@gmail.com>

[cleanup] Hoist ModelTester objects to top level (#4939)

c852036

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

Convert hans to Trainer (#5025)

d5477ba

* Convert hans to Trainer * Tick box

Fix marian tokenizer save pretrained (#5043)

3d495c6

Remove old section + caching in install (#5027)

439aa1d

[cleanup] examples test_run_squad uses tiny model (#5059)

c3e6074

Typo (#5069)

af497b5

Fix all sphynx warnings (#5068)

011cc0b

Update pipeline examples to doctest syntax (#5030)

e4aaa45

Reorganize documentation (#5064)

7291ea0

* Reorganize topics and add all models

[TextClassificationPipeline] Hotfix: make json serializable

70bc3ea

Add header and fix command (#5082)

cd40f65

[examples] SummarizationModule improvements (#4951)

043f9f5

Update installation page and add contributing to the doc (#5084)

204ebc2

* Update installation page and add contributing to the doc * Remove mention of symlinks

LysandreJik and others added 9 commits July 6, 2020 10:27

Imports organization

1bbc28b

Fix the tokenization warning noted in #5505 (#5550)

c473484

* fix warning * style and quality

Fix #5544 (#5551)

7833b21

GPT2 tokenizer should not output token type IDs (#5546)

d6b0b9d

* GPT2 tokenizer should not output token type IDs * Same for OpenAIGPT

The add_space_before_punct_symbol is only for TransfoXL (#5549)

9d9b872

Fix #5507 (#5559)

21f28c3

* Fix #5507 * Fix formatting

Various tokenizers fixes (#5558)

5787e4c

* BertTokenizerFast - Do not specify strip_accents by default * Bump tokenizers to new version * Add test for AddedToken serialization

Fix fast tokenizers too (#5562)

f1e2e42

Release: v3.0.2

b0892fa

calpt added sync do-not-merge labels Sep 7, 2020

Merge branch 'master' into sync/v3.0.2

2071797

calpt force-pushed the sync/v3.0.2 branch from 2566166 to 2071797 Compare September 7, 2020 09:18

calpt removed the do-not-merge label Sep 7, 2020

Adapt to model parameter changes & test changes

8312e8a

calpt force-pushed the sync/v3.0.2 branch from c48e427 to 8312e8a Compare September 8, 2020 13:52

calpt requested review from arueckle and JoPfeiff September 14, 2020 08:58

Merge branch 'master' into sync/v3.0.2

7a5ae91

calpt force-pushed the sync/v3.0.2 branch from c1de0e9 to 7a5ae91 Compare September 14, 2020 09:02

Merge branch 'master' into sync/v3.0.2

6268523

calpt force-pushed the sync/v3.0.2 branch from a2f4b77 to 6268523 Compare October 8, 2020 15:49

calpt assigned arueckle and JoPfeiff Oct 9, 2020

calpt mentioned this pull request Oct 12, 2020

Upgrade transformers from 1.0.1 to 3.x #69

Closed

calpt unassigned arueckle and JoPfeiff Oct 14, 2020

Merge branch 'master' into sync/v3.0.2

f6b734e

calpt merged commit 64d6cda into adapter-hub:master Oct 14, 2020

calpt deleted the sync/v3.0.2 branch October 14, 2020 16:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync/v3.0.2 #55

Sync/v3.0.2 #55

calpt commented Sep 7, 2020

Sync/v3.0.2 #55

Sync/v3.0.2 #55

Conversation

calpt commented Sep 7, 2020