Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept Upstream Changes #1

Merged
merged 3,179 commits into from
May 6, 2021
Merged
Changes from 1 commit
Commits
Show all changes
3179 commits
Select commit Hold shift + click to select a range
d0b3797
Add more metadata to the user agent (#10972)
sgugger Mar 31, 2021
acc3bd9
Enforce string-formatting with f-strings (#10980)
sgugger Mar 31, 2021
b6dddda
add notebook (#10995)
patrickvonplaten Mar 31, 2021
cd56f3f
Merge trainers (#10975)
sgugger Mar 31, 2021
01068ab
add blog to docs (#10997)
patrickvonplaten Mar 31, 2021
455f817
Update training_args.py (#11000)
JohnnyC08 Mar 31, 2021
838f83d
Add `examples/language_modeling/run_mlm_no_trainer.py` (#11001)
hemildesai Mar 31, 2021
c301c26
Fix Adafactor documentation (recommend correct settings) (#10526)
jsrozner Apr 1, 2021
af67322
Improve the speed of adding tokens from added_tokens.json (#10780)
cchen-dialpad Apr 1, 2021
30677dc
Add Vision Transformer and ViTFeatureExtractor (#10950)
NielsRogge Apr 1, 2021
57c1749
DebertaTokenizer Rework closes #10258 (#10703)
cronoik Apr 1, 2021
f4ad3d8
minor typo fix
joeddav Apr 1, 2021
e8da77d
[doc] no more bucket
julien-c Apr 1, 2021
34e1bec
added new notebook and merge of trainer (#11015)
philschmid Apr 1, 2021
335c0ca
fixed typo: logging instead of logger (#11025)
versis Apr 2, 2021
b0d49fd
Add a script to check inits are consistent (#11024)
sgugger Apr 5, 2021
3d39226
s|Pretrained|PreTrained| (#11048)
stas00 Apr 5, 2021
6e31014
[doc] update code-block rendering (#11053)
erensahin Apr 5, 2021
ef62f03
Pin docutils (#11062)
LysandreJik Apr 5, 2021
773e4c7
Remove unnecessary space (#11060)
LysandreJik Apr 5, 2021
eb3479e
Some models have no tokenizers (#11064)
LysandreJik Apr 5, 2021
6c25f52
Refactor AutoModel classes and add Flax Auto classes (#11027)
sgugger Apr 5, 2021
9f4e0c2
Documentation about loading a fast tokenizer within Transformers (#11…
LysandreJik Apr 5, 2021
e1c02e0
Add example for registering callbacks with trainers (#10928)
amalad Apr 5, 2021
b51b87c
Add `examples/language_modeling/run_clm_no_trainer.py` (#11026)
hemildesai Apr 5, 2021
abb7430
Replace pkg_resources with importlib_metadata (#11061)
konstin Apr 5, 2021
090e3e6
Add center_crop to ImageFeatureExtractoMixin (#11066)
sgugger Apr 5, 2021
f05a8a0
Document common config attributes (#11070)
sgugger Apr 5, 2021
04ceee7
Fix distributed gather for tuples of tensors of varying sizes (#11071)
sgugger Apr 5, 2021
2199608
Make a base init in FeatureExtractionMixin (#11074)
sgugger Apr 5, 2021
6ab7d1a
Add Readme for language modeling scripts with accelerate (#11073)
hemildesai Apr 6, 2021
f7328de
HF emoji unicode doesn't work in console (#11081)
stas00 Apr 6, 2021
6c1bee7
Link to new blog
sgugger Apr 6, 2021
b219d6b
added social thumbnail for docs (#11083)
philschmid Apr 6, 2021
76800fb
added new merged Trainer test (#11090)
philschmid Apr 6, 2021
2a8115f
[WIP] GPT Neo cleanup (#10985)
patil-suraj Apr 6, 2021
4906a29
Release v4.5.0
LysandreJik Apr 6, 2021
9853c5d
Development on v4.6.0dev0
LysandreJik Apr 6, 2021
520198f
[doc] gpt-neo (#11098)
stas00 Apr 6, 2021
403d530
Auto feature extractor (#11097)
sgugger Apr 6, 2021
aef4cf8
accelerate question answering examples with no trainer (#11091)
theainerd Apr 6, 2021
fd338ab
Style
sgugger Apr 6, 2021
083ad7d
dead link fixed (#11103)
cronoik Apr 7, 2021
247bed3
GPTNeo: handle padded wte (#11079)
leogao2 Apr 7, 2021
c9035e4
fix: The 'warn' method is deprecated (#11105)
stas00 Apr 7, 2021
424419f
[examples] fix white space (#11099)
stas00 Apr 7, 2021
11505fa
Dummies multi backend (#11100)
sgugger Apr 7, 2021
02f7c2f
Some styling of the training table in Notebooks (#11118)
sgugger Apr 7, 2021
c0d97ce
Adds a note to resize the token embedding matrix when adding special …
LysandreJik Apr 7, 2021
7442801
fix tests (#11109)
thevasudevgupta Apr 7, 2021
1c15128
[versions] handle version requirement ranges (#11110)
stas00 Apr 7, 2021
3fd7eee
Adds use_auth_token with pipelines (#11123)
philschmid Apr 7, 2021
ffe0761
Fix and refactor check_repo (#11127)
sgugger Apr 7, 2021
f8e90d6
Fix typing error in Trainer class (prediction_step) (#11138)
jannisborn Apr 8, 2021
5bf5d50
Typo fix of the name of BertLMHeadModel in BERT doc (#11133)
forest1988 Apr 8, 2021
acc851e
[run_clm] clarify why we get the tokenizer warning on long input (#11…
stas00 Apr 8, 2021
c6d6648
[DeepSpeed] ZeRO Stage 3 (#10753)
stas00 Apr 8, 2021
02ec02d
Add nvidia megatron models (#10911)
jdemouth Apr 8, 2021
1ed24af
[trainer] solve "scheduler before optimizer step" warning (#11144)
stas00 Apr 8, 2021
ba2cf5f
Add fairscale and deepspeed back to the CI (#11147)
LysandreJik Apr 8, 2021
9c9b8e7
Updates SageMaker docs for updating DLCs (#11140)
philschmid Apr 8, 2021
dfed4ec
Don't duplicate logs in TensorBoard and handle --use_env (#11141)
sgugger Apr 8, 2021
6c40e49
Run mlm pad to multiple for fp16 (#11128)
ak314 Apr 8, 2021
6644690
[tests] relocate core integration tests (#11146)
stas00 Apr 8, 2021
97ccf67
[setup] extras[docs] must include 'all' (#11148)
stas00 Apr 8, 2021
ba8b1f4
Add support for multiple models for one config in auto classes (#11150)
sgugger Apr 8, 2021
c2e0fd5
[setup] make fairscale and deepspeed setup extras (#11151)
stas00 Apr 8, 2021
d31c7b1
Skip Megatron tests for now
sgugger Apr 9, 2021
269c963
Merge branch 'master' of github.com:huggingface/transformers
sgugger Apr 9, 2021
0311ba2
typo (#11152)
stas00 Apr 9, 2021
8b78a32
[Community notebooks] Add Wav2Vec notebook for creating captions for …
Muennighoff Apr 9, 2021
b9b60c1
Fix LogitsProcessor documentation (#11130)
k-tahiro Apr 9, 2021
6060746
Update README.md (#11161)
Seyviour Apr 9, 2021
45fc8c7
Make `get_special_tokens_mask` consider all tokens (#11163)
sgugger Apr 9, 2021
fb41f9f
Add a special tokenizer for CPM model (#11068)
JetRunner Apr 9, 2021
c161dd5
[examples/translation] support mBART-50 and M2M100 fine-tuning (#11170)
patil-suraj Apr 9, 2021
07f0bb6
[examples run_clm] fix _LazyModule hasher error (#11168)
stas00 Apr 9, 2021
6f90c29
added json dump and extraction of train run time (#11167)
philschmid Apr 9, 2021
716120c
Fix Typo
LysandreJik Apr 9, 2021
26212c1
Reactivate Megatron tests an use less workers
sgugger Apr 9, 2021
a99f7f5
Minor typos fixed (#11182)
cronoik Apr 12, 2021
623cd6a
Fix style
sgugger Apr 12, 2021
ef102c4
model_path should be ignored as the checkpoint path (#11157)
tsuchm Apr 12, 2021
0c6fcd3
Added documentation for data collator. (#10941)
fghuman Apr 12, 2021
cb251ba
Fix typo (#11188)
tma15 Apr 12, 2021
9f12609
Add DeiT (PyTorch) (#11056)
NielsRogge Apr 12, 2021
38a10c6
Replaced `which` with `who` (#11183)
cronoik Apr 12, 2021
74d7c24
Import torch.utils.checkpoint in ProphetNet (#11214)
LysandreJik Apr 12, 2021
f243a5e
Sagemaker test docs update for framework upgrade (#11206)
philschmid Apr 12, 2021
d49d3cf
Use MSELoss in (M)BartForSequenceClassification (#11178)
calpt Apr 13, 2021
7c205bf
wav2vec2 converter: create the proper vocab.json while converting fai…
cceyda Apr 13, 2021
0cd89d8
Add Matt as the TensorFlow reference (#11212)
LysandreJik Apr 13, 2021
823df93
Fix GPT-2 warnings (#11213)
LysandreJik Apr 13, 2021
896d7be
fix docstrings (#11221)
patil-suraj Apr 13, 2021
22fa0a6
Add documentation for BertJapanese (#11219)
forest1988 Apr 13, 2021
81009b7
Replace error by warning when loading an architecture in another (#11…
sgugger Apr 13, 2021
893e51a
Document v4.5.1
sgugger Apr 13, 2021
edca520
Refactor GPT2 (#11225)
patil-suraj Apr 13, 2021
3312e96
Doc check: a bit of clean up (#11224)
sgugger Apr 13, 2021
9fa2995
added cache_dir=model_args.cache_dir to all example with cache_dir ar…
philschmid Apr 13, 2021
9d8e8a8
Avoid using no_sync on SageMaker DP (#11229)
sgugger Apr 13, 2021
f38cd43
Indent code block in the documentation (#11233)
sgugger Apr 13, 2021
1ad7b03
Run CI on deepspeed and fairscale (#11172)
LysandreJik Apr 13, 2021
3d339ee
[Deepspeed] zero3 tests band aid (#11235)
stas00 Apr 13, 2021
653076c
Save the Wav2Vec2 processor before training starts (#10910)
Nithin-Holla Apr 14, 2021
9337c6c
make embeddings plural in warning message (#11228)
jstremme Apr 14, 2021
7fe5aaa
Stale bot updated (#10562)
LysandreJik Apr 14, 2021
f25444c
Close open files to suppress ResourceWarning (#11240)
parakalan Apr 14, 2021
4670b57
Fix dimention misspellings. (#11238)
odellus Apr 14, 2021
075e821
Add prefix to examples in model_doc rst (#11226)
forest1988 Apr 14, 2021
63ca402
[troubleshooting] add 2 points of reference to the offline mode (#11236)
stas00 Apr 14, 2021
25e1af3
Fix #10128 (#11248)
sgugger Apr 14, 2021
83206ca
[deepspeed] test on one node 2 gpus max (#11237)
stas00 Apr 14, 2021
aaaed56
Trainer iterable dataset (#11254)
sgugger Apr 14, 2021
c3fcba3
Adding pipeline task aliases. (#11247)
Narsil Apr 15, 2021
6e1ee47
Support for set_epoch (#11258)
sgugger Apr 15, 2021
2550b41
Tokenizer fast save (#11234)
sgugger Apr 15, 2021
dfc6dd8
update dependency_versions_table (#11273)
stas00 Apr 16, 2021
5254220
Workflow fixes (#11270)
LysandreJik Apr 16, 2021
92970c0
Enabling multilingual models for translation pipelines. (#10536)
Narsil Apr 16, 2021
e783ea7
Fix failing workflows
LysandreJik Apr 16, 2021
d9c6204
Trainer support for IterableDataset for evaluation and predict (#11286)
sgugger Apr 16, 2021
5a34d8d
move device statements outside if statements (#11292)
e-yi Apr 19, 2021
3981ce3
modify double considering special tokens in `language_modeling.py` (#…
taepd Apr 19, 2021
95ffbe1
[Trainer] fix the placement on device with fp16_full_eval (#11322)
stas00 Apr 19, 2021
95037a1
[Trainer] Add a progress bar for batches skipped (#11324)
sgugger Apr 19, 2021
c0328a6
Load checkpoint without re-creating the model (#11318)
sgugger Apr 20, 2021
bfd83c1
Added translation example script (#11196)
rajvi-k Apr 20, 2021
f464f10
[Generate] Remove outdated code (#11331)
patrickvonplaten Apr 20, 2021
cfd2eaa
[GPTNeo] create local attention mask ones (#11335)
patil-suraj Apr 20, 2021
f1b938f
Update to use datasets remove_cloumns method (#11343)
sgugger Apr 20, 2021
95dab34
Add an error message that fires when Reformer is not in training mode…
forest1988 Apr 20, 2021
aad95c7
Removed `max_length` from being mandatory within `generate`. (#11314)
Narsil Apr 21, 2021
74712e2
Honor contributors to models (#11329)
sgugger Apr 21, 2021
ca7ff64
[deepspeed] fix resume from checkpoint (#11352)
stas00 Apr 21, 2021
dabeb15
Examples reorg (#11350)
sgugger Apr 21, 2021
41f3133
Extract metric_key_prefix during NotebookProgressCallback.on_evaluate…
lewtun Apr 21, 2021
9f72e8f
[testing doc] bring doc up to date (#11359)
stas00 Apr 21, 2021
ac58859
Merge new TF example script (#11360)
Rocketknight1 Apr 21, 2021
50595a3
Remove boiler plate code (#11340)
patrickvonplaten Apr 21, 2021
6fe79e5
Move old TF text classification script to legacy (#11361)
Rocketknight1 Apr 21, 2021
5aaf5aa
[contributing doc] explain/link to good first issue (#11346)
stas00 Apr 21, 2021
5e04d70
Fix token_type_ids error for big_bird model. (#11355)
wlhgtc Apr 21, 2021
ff26f8e
Add huggingface_hub dep for #11328
sgugger Apr 21, 2021
6f14eab
Add in torchhub
sgugger Apr 21, 2021
880154d
[Wav2Vec2] Fix special tokens for Wav2Vec2 tokenizer (#11349)
patrickvonplaten Apr 22, 2021
58d8795
[Flax] Correct typo (#11374)
patrickvonplaten Apr 22, 2021
5b5e4ca
[run_translation.py] fix typo (#11372)
johnson7788 Apr 22, 2021
881945c
Add space (#11373)
tma15 Apr 22, 2021
2617396
Correctly cast num_train_epochs to int (#11379)
Rocketknight1 Apr 22, 2021
0f3ad15
Fix typo (#11369)
penut85420 Apr 22, 2021
3ed5e97
Fix Trainer with remove_unused_columns=False (#11382)
sgugger Apr 22, 2021
8c9b5fc
[Flax] Big FlaxBert Refactor (#11364)
patrickvonplaten Apr 23, 2021
b48cf71
correct typo (#11393)
patrickvonplaten Apr 23, 2021
2dc2d79
correct conversion (#11394)
patrickvonplaten Apr 23, 2021
a90d3f1
Fix typo in text (#11396)
MaksymDel Apr 23, 2021
c3d6f33
fixed typos (#11391)
yoshitomo-matsubara Apr 23, 2021
74e84f1
make blenderbot test slow (#11395)
patrickvonplaten Apr 23, 2021
7bc86be
Fixed trainer total_flos relaoding in distributed mode (#11383)
TevenLeScao Apr 23, 2021
bf2e0cf
Trainer push to hub (#11328)
sgugger Apr 23, 2021
50f4539
push (#11400)
patrickvonplaten Apr 23, 2021
5c00918
added support for exporting of t5 to onnx with past_key_values (#10651)
Ki6an Apr 23, 2021
1811883
Fixing bug in generation (#11297)
nicola-decao Apr 23, 2021
bd41a0f
Style
sgugger Apr 23, 2021
3951fc5
Try to trigger failure more
sgugger Apr 23, 2021
ca6b80c
Wrong branch Sylvain...
sgugger Apr 23, 2021
e3ff165
Fix cross-attention head mask for Torch encoder-decoder models (#10605)
stancld Apr 23, 2021
1ef152e
Default to accuracy metric (#11405)
sgugger Apr 23, 2021
195bfd1
Enable option for subword regularization in `XLMRobertaTokenizer` (#1…
PhilipMay Apr 23, 2021
81a6c7c
Use 3 workers for torch tests
sgugger Apr 23, 2021
b7fc043
Merge branch 'master' of github.com:huggingface/transformers
sgugger Apr 23, 2021
9cac4fa
documentation linked to the parent class PreTrainedTokenizerFast but …
cronoik Apr 24, 2021
52166f6
Style
sgugger Apr 24, 2021
f45cb66
Add head_mask, decoder_head_mask, cross_head_mask to ProphetNet (#9964)
stancld Apr 25, 2021
35cd8ee
EncoderDecoderConfigs should not create new objects (#11300)
cronoik Apr 25, 2021
30f0658
updating the checkpoint for GPT2ForSequence Classification to one wit…
abiolaTresor Apr 26, 2021
04ab2ca
add pooling layer support (#11439)
thevasudevgupta Apr 26, 2021
32dbb2d
make style (#11442)
patrickvonplaten Apr 26, 2021
4b72cfd
Pin black to 20.8.b1
sgugger Apr 26, 2021
c1625b3
With style
sgugger Apr 26, 2021
4bd6b54
Pin black to 21.4b0
sgugger Apr 26, 2021
38a716c
TF BART models - Add `cross_attentions` to model output and fix cross…
stancld Apr 26, 2021
d7633a4
Add basic support for FP16 in SageMaker model parallelism (#11407)
sgugger Apr 26, 2021
e3e70f9
docs(examples): fix link to TPU launcher script (#11427)
Apr 26, 2021
b24ead8
fix some typos in docs, comments, logging/errors (#11432)
LSinev Apr 26, 2021
ab2cabb
Pass along seed to DistributedSampler (#11406)
sgugger Apr 26, 2021
6715e3b
Clarify description of the is_split_into_words argument (#11449)
kstathou Apr 26, 2021
a753caf
[docs] fix invalid class name (#11438)
stas00 Apr 26, 2021
ce11318
make sure to test against the local checkout (#11437)
stas00 Apr 26, 2021
b03b2a6
Style
sgugger Apr 26, 2021
7959d83
Give each test a different repo name (#11453)
sgugger Apr 26, 2021
1d30ec9
[Examples] Fixes inconsistency around eval vs val and predict vs test…
bhadreshpsavani Apr 26, 2021
0661abc
Variable Correction for Consistency in Distillation Example (#11444)
jaimeenahn Apr 26, 2021
bc2571e
[Deepspeed] ZeRO-Infinity integration plus config revamp (#11418)
stas00 Apr 26, 2021
741d48f
Remove max length beam scorer (#11378)
GeetDsa Apr 26, 2021
88ac60f
update QuickTour docs to reflect model output object (#11462)
Apr 27, 2021
7ceff67
Finish Making Quick Tour respect the model object (#11467)
Apr 27, 2021
8d43c71
fix docs for decoder_input_ids (#11466)
patil-suraj Apr 27, 2021
2d27900
Update min versions in README and add Flax (#11472)
sgugger Apr 28, 2021
c0eb218
Update `PreTrainedTokenizerBase` to check/handle batch length for `te…
hamelsmu Apr 28, 2021
3f6add8
fix #1149 (#11493)
hamelsmu Apr 28, 2021
f748bd4
[Flax] Add docstrings & model outputs (#11498)
patrickvonplaten Apr 29, 2021
ad1f7be
Reformat to make code clearer in tokenizer call (#11497)
sgugger Apr 29, 2021
d6ec54b
solved coefficient issue for the TF version of gelu_fast (#11514)
michaelbenayoun Apr 29, 2021
b29eb24
Split checkpoint from model_name_or_path in examples (#11492)
sgugger Apr 29, 2021
60d5bda
Patch notification service
LysandreJik Apr 30, 2021
f37f2ad
Pin HuggingFace Hub dependency (#11502)
LysandreJik Apr 30, 2021
b43e3f9
correct the dimension comment of matrix multiplication (#11494)
fredo838 Apr 30, 2021
e0db827
add sp_model_kwargs to unpickle of xlm roberta tok (#11430)
PhilipMay Apr 30, 2021
022a1e9
make style (#11520)
patrickvonplaten Apr 30, 2021
58c789e
Update README.md (#11489)
mrm8488 Apr 30, 2021
76116f4
T5 Gradient Checkpointing (#11353)
ceshine Apr 30, 2021
db9dd09
Adding `AutomaticSpeechRecognitionPipeline`. (#11337)
Narsil Apr 30, 2021
30ede89
Implement Fast Tokenization for Deberta (#11387)
ShubhamSanghvi Apr 30, 2021
c2cd02a
Accepts BatchEncoding in LengthSampler (#11431)
tma15 Apr 30, 2021
8b945ef
Fix do_eval default value in training_args.py (#11511)
bonniehyeon Apr 30, 2021
20d6931
Update TF text classification example (#11496)
Rocketknight1 Apr 30, 2021
57c8e82
reszie token embeds (#11524)
patil-suraj Apr 30, 2021
af0692a
Run model templates on master (#11527)
LysandreJik Apr 30, 2021
84326a2
[Examples] Added support for test-file in QA examples with no trainer…
bhadreshpsavani Apr 30, 2021
bc80f8b
Add Stas and Suraj as authors (#11526)
sgugger Apr 30, 2021
804c297
Improve task summary docs (#11513)
Apr 30, 2021
282f3ac
[debug utils] activation/weights underflow/overflow detector (#11274)
stas00 Apr 30, 2021
4e7bf94
[DeepSpeed] fp32 support (#11499)
stas00 Apr 30, 2021
9802086
Fixed docs for the shape of `scores` in `generate()` (#10057)
kylie-box May 2, 2021
a5d2967
Fix examples in M2M100 docstrings (#11540)
lewtun May 3, 2021
623281a
[Flax BERT/Roberta] few small fixes (#11558)
patil-suraj May 3, 2021
c448c01
[Wav2Vec2] Fix convert (#11562)
patrickvonplaten May 3, 2021
1c86157
Remove `datasets` submodule. (#11563)
LysandreJik May 3, 2021
6a11e4c
fix the mlm longformer example by changing [MASK] to <mask> (#11559)
fredo838 May 3, 2021
f3cf8ae
Add LUKE (#11223)
NielsRogge May 3, 2021
a721a5e
[Wav2vec2] Fixed tokenization mistakes while adding single-char token…
Muktan May 3, 2021
87dd1a0
Fix metric computation in `run_glue_no_trainer` (#11569)
sgugger May 3, 2021
1e8e068
Fixes a useless warning. (#11566)
Narsil May 3, 2021
f4c9a7e
Accumulate opt state dict on do_rank 0 (#11481)
sgugger May 3, 2021
fe82b1b
Update training tutorial (#11533)
sgugger May 3, 2021
7c62248
fix resize_token_embeddings (#11572)
stas00 May 3, 2021
c40c7e2
Add multi-class, multi-label and regression to transformers (#11012)
abhishekkrthakur May 4, 2021
09b0bcf
Enable added tokens (#11325)
LysandreJik May 4, 2021
2ce0fb8
Make quality scripts work when one backend is missing. (#11573)
sgugger May 4, 2021
084a187
[FlaxRoberta] Add FlaxRobertaModels & adapt run_mlm_flax.py (#11470)
patrickvonplaten May 4, 2021
226e74b
Removes SageMakerTrainer code but keeps class as wrapper (#11587)
philschmid May 4, 2021
0afe4a9
[Flax] Add Electra models (#11426)
CoderPat May 4, 2021
6b241e0
Reproducible checkpoint (#11582)
sgugger May 4, 2021
c065025
[trainer] document resume randomness (#11588)
stas00 May 4, 2021
bf0dfa9
copies need to be fixed too (#11585)
stas00 May 5, 2021
83e59d8
add importlib_metadata and huggingface_hub as dependency in the conda…
cdeepali May 5, 2021
8fa8e19
Skip Funnel test
LysandreJik May 5, 2021
3e3e41a
Pytorch - Lazy initialization of models (#11471)
patrickvonplaten May 5, 2021
864c1df
Accept tensorflow-rocm package when checking TF availability (#11595)
mvsjober May 5, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Improve task summary docs (huggingface#11513)
* fix task summary docs

* refactor to use model.config.id2label instead of list

* fix nit

* Update docs/source/task_summary.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Hamel Husain and sgugger authored Apr 30, 2021
commit 804c2974d5e1c95e71afe57f8f97b3a8bcd921eb
94 changes: 63 additions & 31 deletions docs/source/task_summary.rst
Original file line number Diff line number Diff line change
@@ -85,9 +85,8 @@ each other. The process is the following:

1. Instantiate a tokenizer and a model from the checkpoint name. The model is identified as a BERT model and loads it
with the weights stored in the checkpoint.
2. Build a sequence from the two sentences, with the correct model-specific separators token type ids and attention
masks (:func:`~transformers.PreTrainedTokenizer.encode` and :func:`~transformers.PreTrainedTokenizer.__call__` take
care of this).
2. Build a sequence from the two sentences, with the correct model-specific separators, token type ids and attention
masks (which will be created automatically by the tokenizer).
3. Pass this sequence through the model so that it is classified in one of the two available classes: 0 (not a
paraphrase) and 1 (is a paraphrase).
4. Compute the softmax of the result to get probabilities over the classes.
@@ -108,6 +107,7 @@ each other. The process is the following:
>>> sequence_1 = "Apples are especially bad for your health"
>>> sequence_2 = "HuggingFace's headquarters are situated in Manhattan"

>>> # The tokekenizer will automatically add any model specific separators (i.e. <CLS> and <SEP>) and tokens to the sequence, as well as compute the attention masks.
>>> paraphrase = tokenizer(sequence_0, sequence_2, return_tensors="pt")
>>> not_paraphrase = tokenizer(sequence_0, sequence_1, return_tensors="pt")

@@ -141,6 +141,7 @@ each other. The process is the following:
>>> sequence_1 = "Apples are especially bad for your health"
>>> sequence_2 = "HuggingFace's headquarters are situated in Manhattan"

>>> # The tokekenizer will automatically add any model specific separators (i.e. <CLS> and <SEP>) and tokens to the sequence, as well as compute the attention masks.
>>> paraphrase = tokenizer(sequence_0, sequence_2, return_tensors="tf")
>>> not_paraphrase = tokenizer(sequence_0, sequence_1, return_tensors="tf")

@@ -504,8 +505,8 @@ This outputs a (hopefully) coherent next token following the original sequence,
>>> print(resulting_string)
Hugging Face is based in DUMBO, New York City, and has

In the next section, we show how this functionality is leveraged in :func:`~transformers.PreTrainedModel.generate` to
generate multiple tokens up to a user-defined length.
In the next section, we show how :func:`~transformers.PreTrainedModel.generate` can be used to generate multiple tokens
up to a specified length instead of one token at a time.

Text Generation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -526,10 +527,11 @@ As a default all models apply *Top-K* sampling when used in pipelines, as config


Here, the model generates a random text with a total maximal length of *50* tokens from context *"As far as I am
concerned, I will"*. The default arguments of ``PreTrainedModel.generate()`` can be directly overridden in the
pipeline, as is shown above for the argument ``max_length``.
concerned, I will"*. Behind the scenes, the pipeline object calls the method
:func:`~transformers.PreTrainedModel.generate` to generate text. The default arguments for this method can be
overridden in the pipeline, as is shown above for the arguments ``max_length`` and ``do_sample``.

Here is an example of text generation using ``XLNet`` and its tokenizer.
Below is an example of text generation using ``XLNet`` and its tokenizer, which includes calling ``generate`` directly:

.. code-block::

@@ -627,8 +629,8 @@ It leverages a fine-tuned model on CoNLL-2003, fine-tuned by `@stefan-it <https:

>>> nlp = pipeline("ner")

>>> sequence = "Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO, therefore very"
... "close to the Manhattan Bridge which is visible from the window."
>>> sequence = """Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO,
... therefore very close to the Manhattan Bridge which is visible from the window."""


This outputs a list of all words that have been identified as one of the entities from the 9 classes defined above.
@@ -659,15 +661,14 @@ Here is an example of doing named entity recognition, using a model and a tokeni

1. Instantiate a tokenizer and a model from the checkpoint name. The model is identified as a BERT model and loads it
with the weights stored in the checkpoint.
2. Define the label list with which the model was trained on.
3. Define a sequence with known entities, such as "Hugging Face" as an organisation and "New York City" as a location.
4. Split words into tokens so that they can be mapped to predictions. We use a small hack by, first, completely
2. Define a sequence with known entities, such as "Hugging Face" as an organisation and "New York City" as a location.
3. Split words into tokens so that they can be mapped to predictions. We use a small hack by, first, completely
encoding and decoding the sequence, so that we're left with a string that contains the special tokens.
5. Encode that sequence into IDs (special tokens are added automatically).
6. Retrieve the predictions by passing the input to the model and getting the first output. This results in a
4. Encode that sequence into IDs (special tokens are added automatically).
5. Retrieve the predictions by passing the input to the model and getting the first output. This results in a
distribution over the 9 possible classes for each token. We take the argmax to retrieve the most likely class for
each token.
7. Zip together each token with its prediction and print it.
6. Zip together each token with its prediction and print it.

.. code-block::

@@ -706,18 +707,6 @@ Here is an example of doing named entity recognition, using a model and a tokeni
>>> model = TFAutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")

>>> label_list = [
... "O", # Outside of a named entity
... "B-MISC", # Beginning of a miscellaneous entity right after another miscellaneous entity
... "I-MISC", # Miscellaneous entity
... "B-PER", # Beginning of a person's name right after another person's name
... "I-PER", # Person's name
... "B-ORG", # Beginning of an organisation right after another organisation
... "I-ORG", # Organisation
... "B-LOC", # Beginning of a location right after another location
... "I-LOC" # Location
... ]

>>> sequence = "Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO, therefore very" \
... "close to the Manhattan Bridge."

@@ -731,12 +720,49 @@ Here is an example of doing named entity recognition, using a model and a tokeni

This outputs a list of each token mapped to its corresponding prediction. Differently from the pipeline, here every
token has a prediction as we didn't remove the "0"th class, which means that no particular entity was found on that
token. The following array should be the output:
token.

In the above example, ``predictions`` is an integer that corresponds to the predicted class. We can use the
``model.config.id2label`` property in order to recover the class name corresponding to the class number, which is
illustrated below:

.. code-block::

>>> print([(token, label_list[prediction]) for token, prediction in zip(tokens, predictions[0].numpy())])
[('[CLS]', 'O'), ('Hu', 'I-ORG'), ('##gging', 'I-ORG'), ('Face', 'I-ORG'), ('Inc', 'I-ORG'), ('.', 'O'), ('is', 'O'), ('a', 'O'), ('company', 'O'), ('based', 'O'), ('in', 'O'), ('New', 'I-LOC'), ('York', 'I-LOC'), ('City', 'I-LOC'), ('.', 'O'), ('Its', 'O'), ('headquarters', 'O'), ('are', 'O'), ('in', 'O'), ('D', 'I-LOC'), ('##UM', 'I-LOC'), ('##BO', 'I-LOC'), (',', 'O'), ('therefore', 'O'), ('very', 'O'), ('##c', 'O'), ('##lose', 'O'), ('to', 'O'), ('the', 'O'), ('Manhattan', 'I-LOC'), ('Bridge', 'I-LOC'), ('.', 'O'), ('[SEP]', 'O')]
>>> for token, prediction in zip(tokens, predictions[0].numpy()):
... print((token, model.config.id2label[prediction]))
('[CLS]', 'O')
('Hu', 'I-ORG')
('##gging', 'I-ORG')
('Face', 'I-ORG')
('Inc', 'I-ORG')
('.', 'O')
('is', 'O')
('a', 'O')
('company', 'O')
('based', 'O')
('in', 'O')
('New', 'I-LOC')
('York', 'I-LOC')
('City', 'I-LOC')
('.', 'O')
('Its', 'O')
('headquarters', 'O')
('are', 'O')
('in', 'O')
('D', 'I-LOC')
('##UM', 'I-LOC')
('##BO', 'I-LOC')
(',', 'O')
('therefore', 'O')
('very', 'O')
('##c', 'O')
('##lose', 'O')
('to', 'O')
('the', 'O')
('Manhattan', 'I-LOC')
('Bridge', 'I-LOC')
('.', 'O')
('[SEP]', 'O')

Summarization
-----------------------------------------------------------------------------------------------------------------------
@@ -819,6 +845,12 @@ CNN / Daily Mail), it yields very good results.
>>> inputs = tokenizer.encode("summarize: " + ARTICLE, return_tensors="tf", max_length=512)
>>> outputs = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)

.. code-block::

>>> print(tokenizer.decode(outputs[0]))
<pad> prosecutors say the marriages were part of an immigration scam. if convicted, barrientos faces two criminal counts of "offering a false instrument for filing in the first degree" she has been married 10 times, nine of them between 1999 and 2002.</s>


Translation
-----------------------------------------------------------------------------------------------------------------------