Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync/v4.9.2 #232

Merged
merged 242 commits into from
Sep 20, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
242 commits
Select commit Hold shift + click to select a range
9252a51
Release: v4.8.0
sgugger Jun 23, 2021
2150dfe
v4.9.0.dev0
sgugger Jun 23, 2021
3694484
Update training_args.py (#12328)
sam-writer Jun 23, 2021
07ae610
[Deepspeed] new docs (#12077)
stas00 Jun 23, 2021
cf3c919
Fix default to logging_dir lost in merge conflict
sgugger Jun 23, 2021
7875b63
try-this (#12338)
richardliaw Jun 24, 2021
aef3823
[examples/Flax] move the examples table up (#12341)
patil-suraj Jun 24, 2021
8ef62ec
Fix torchscript tests (#12336)
LysandreJik Jun 24, 2021
5b1b563
Document patch release v4.8.1
sgugger Jun 24, 2021
f2c4ce7
Add flax/jax quickstart (#12342)
marcvanzee Jun 24, 2021
aa550c4
Update README.md
patrickvonplaten Jun 25, 2021
d4ce31e
fixed typo (#12356)
MichalPitr Jun 25, 2021
55bb4c0
Fix exception in prediction loop occurring for certain batch sizes (#…
jglaser Jun 25, 2021
332a245
Add FlaxBigBird QuestionAnswering script (#12233)
thevasudevgupta Jun 25, 2021
238521b
Replace NotebookProgressReporter by ProgressReporter in Ray Tune run …
krfricke Jun 25, 2021
a3daabf
Style
sgugger Jun 25, 2021
4a872ca
remove extra white space from log format (#12360)
stas00 Jun 25, 2021
f866425
fixed multiplechoice tokenization (#12362)
cronoik Jun 25, 2021
64e6098
[trainer] add main_process_first context manager (#12351)
stas00 Jun 25, 2021
539ee45
[Examples] Replicates the new --log_level feature to all trainer-base…
bhadreshpsavani Jun 25, 2021
9a75459
updated example template (#12365)
bhadreshpsavani Jun 26, 2021
ff5cdc0
replace print with logger (#12368)
bhadreshpsavani Jun 26, 2021
c7faf2c
[Documentation] Warn that DataCollatorForWholeWordMask is limited to …
ionicsolutions Jun 28, 2021
9490d66
Update run_mlm.py (#12344)
TahaAslani Jun 28, 2021
57461ac
Add possibility to maintain full copies of files (#12312)
sgugger Jun 28, 2021
d25ad34
[CI] add dependency table sync verification (#12364)
stas00 Jun 28, 2021
04dbea3
[Examples] Added context manager to datasets map (#12367)
bhadreshpsavani Jun 28, 2021
89b57a6
[Flax community event] Add more description to readme (#12398)
patrickvonplaten Jun 28, 2021
27b6ac4
Update README.md
patrickvonplaten Jun 28, 2021
276bc14
Fix copies
sgugger Jun 28, 2021
a7d0b28
Remove the need for `einsum` in Albert's attention computation (#12394)
mfuntowicz Jun 28, 2021
2d70c91
[Flax] Adapt flax examples to include `push_to_hub` (#12391)
patrickvonplaten Jun 28, 2021
7e22609
Tensorflow LM examples (#12358)
Rocketknight1 Jun 28, 2021
e277074
pass the matching trainer log level to deepspeed (#12401)
stas00 Jun 28, 2021
31c3e7e
[Flax] Add T5 pretraining script (#12355)
patrickvonplaten Jun 28, 2021
7682e97
[models] respect dtype of the model when instantiating it (#12316)
stas00 Jun 29, 2021
1fc6817
Rename detr targets to labels (#12280)
NielsRogge Jun 29, 2021
bc08493
Add out of vocabulary error to ASR models (#12288)
will-rice Jun 29, 2021
3886104
Fix TFWav2Vec2 SpecAugment (#12289)
will-rice Jun 29, 2021
aecae53
[example/flax] add summarization readme (#12393)
patil-suraj Jun 29, 2021
8133286
[Flax] Example scripts - correct weight decay (#12409)
patrickvonplaten Jun 29, 2021
e3f39a2
fix ids_to_tokens naming error in tokenizer of deberta v2 (#12412)
hjptriplebee Jun 29, 2021
5257818
minor fixes in original RAG training (#12395)
Jun 29, 2021
b440b8d
Added talks (#12415)
suzana-ilic Jun 29, 2021
dc42e77
Easily train a new fast tokenizer from a given one (#12361)
sgugger Jun 29, 2021
990540b
[modelcard] fix (#12422)
stas00 Jun 29, 2021
31a8110
Add option to save on each training node (#12421)
sgugger Jun 30, 2021
90d6945
Added to talks section (#12433)
suzana-ilic Jun 30, 2021
c9486fd
Fix default bool in argparser (#12424)
sgugger Jun 30, 2021
69f5701
Add default bos_token and eos_token for tokenizer of deberta_v2 (#12429)
hjptriplebee Jun 30, 2021
6e68597
Add CANINE (#12024)
NielsRogge Jun 30, 2021
89073a9
Document patch release v4.8.2
LysandreJik Jun 30, 2021
42477d6
fix typo in mt5 configuration docstring (#12432)
fcakyon Jun 30, 2021
1ad1c4a
Add to talks section (#12442)
suzana-ilic Jun 30, 2021
3f36a2c
[JAX/Flax readme] add philosophy doc (#12419)
patil-suraj Jun 30, 2021
0d1f67e
[Flax] Add wav2vec2 (#12271)
patrickvonplaten Jun 30, 2021
3aa37b9
Add test for a WordLevel tokenizer model (#12437)
SaulLu Jul 1, 2021
b655f16
[Flax community event] How to use hub during training (#12447)
patrickvonplaten Jul 1, 2021
27d348f
[Wav2Vec2, Hubert] Fix ctc loss test (#12458)
patrickvonplaten Jul 1, 2021
2a501ac
Comment fast GPU TF tests (#12452)
LysandreJik Jul 1, 2021
6c5b20a
Fix training_args.py barrier for torch_xla (#12464)
jysohn23 Jul 1, 2021
c18af5d
Added talk details (#12465)
suzana-ilic Jul 1, 2021
1457839
Update README.md
patrickvonplaten Jul 1, 2021
7f87bfc
Add TPU README (#12463)
patrickvonplaten Jul 1, 2021
f929462
Import check_inits handling of duplicate definitions. (#12467)
Iwontbecreative Jul 1, 2021
d5b8fe3
Validation split added: custom data files @sgugger, @patil-suraj (#12…
Souvic Jul 1, 2021
7f0027d
Fixing bug with param count without embeddings (#12461)
TevenLeScao Jul 1, 2021
2d1d921
[roberta] fix lm_head.decoder.weight ignore_key handling (#12446)
stas00 Jul 1, 2021
e52288a
Rework notebooks and move them to the Notebooks repo (#12471)
sgugger Jul 2, 2021
b4ecc6b
fixed typo in flax-projects readme (#12466)
mplemay Jul 2, 2021
b889d3f
Fix TAPAS test uncovered by #12446 (#12480)
LysandreJik Jul 2, 2021
e3fce2f
Update README.md
patrickvonplaten Jul 2, 2021
d24a523
Update README.md
patrickvonplaten Jul 2, 2021
b21905e
Update README.md
patrickvonplaten Jul 2, 2021
a76eebf
Add guide on how to build demos for the Flax sprint (#12468)
osanseviero Jul 2, 2021
2df6328
Update README.md
patrickvonplaten Jul 4, 2021
89a8739
Add `Repository` import to the FLAX example script (#12501)
LysandreJik Jul 5, 2021
23ab0b6
[examples/flax] clip style image-text training example (#12491)
patil-suraj Jul 5, 2021
0e1718a
create LxmertModelIntegrationTest Pytorch (#9989)
sadakmed Jul 5, 2021
e799e0f
[Flax] Fix wav2vec2 pretrain arguments (#12498)
Wikidepia Jul 5, 2021
f1c81d6
[Flax] ViT training example (#12300)
patil-suraj Jul 5, 2021
eceb104
flax.linen.apply takes state as the first param, followed by the inpu…
navjotts Jul 5, 2021
9b90810
[Flax] Dataset streaming example (#12470)
patrickvonplaten Jul 5, 2021
ea55675
NER example for Tensorflow (#12469)
Rocketknight1 Jul 5, 2021
bb4ac2b
[Flax] Correct flax training scripts (#12514)
patrickvonplaten Jul 5, 2021
d0f7508
[Flax] Correct logging steps flax (#12515)
patrickvonplaten Jul 5, 2021
4605b2b
[Flax] Fix another bug in logging steps (#12516)
patrickvonplaten Jul 5, 2021
7d6285a
[Wav2Vec2] Flax - Adapt wav2vec2 script (#12520)
patrickvonplaten Jul 5, 2021
f5b0c1e
[Flax] Fix hybrid clip (#12519)
patil-suraj Jul 6, 2021
626a0a0
[RoFormer] Fix some issues (#12397)
JunnYu Jul 6, 2021
7a259c1
FlaxGPTNeo (#12493)
patil-suraj Jul 6, 2021
029b9d3
Update README (#12540)
suzana-ilic Jul 6, 2021
f42a0ab
Update README.md
patrickvonplaten Jul 6, 2021
09af5bd
Replace `nn.Moudle` by `nn.Module` (#12541)
SaulLu Jul 6, 2021
3fd8577
implementing tflxmertmodel integration test (#12497)
sadakmed Jul 6, 2021
2870fd1
Bump CircleCI machine sizes
LysandreJik Jul 6, 2021
208df20
[Flax] Adapt examples to be able to use eval_steps and save_steps (#1…
patrickvonplaten Jul 6, 2021
2d42915
[examples/flax] add adafactor optimizer (#12544)
patil-suraj Jul 7, 2021
61400e1
[Flax] Add FlaxMBart (#12236)
stancld Jul 7, 2021
45dcfde
Add a warning for broken ProphetNet fine-tuning (#12511)
JetRunner Jul 7, 2021
3488ef5
[trainer] add option to ignore keys for the train function too (#1171…
shabie Jul 7, 2021
1d6623c
MLM training fails with no validation file(same as #12406 for pytorch…
Souvic Jul 7, 2021
7d321b7
[Flax] Allow retraining from save checkpoint (#12559)
patrickvonplaten Jul 7, 2021
ebc69af
Adding support for `pipeline("automatic-speech-recognition")`. (#11525)
Narsil Jul 7, 2021
95425d5
Adding prepare_decoder_input_ids_from_labels methods to all Condition…
Rocketknight1 Jul 7, 2021
0d2bffa
Remove tf.roll wherever not needed (#12512)
szutenberg Jul 7, 2021
b868260
Double check for attribute num_examples (#12562)
sgugger Jul 7, 2021
d7e156b
fix loading clip vision model (#12566)
patil-suraj Jul 7, 2021
122d7dc
Remove logging of GPU count etc logging. (#12569)
ibraheem-moosa Jul 7, 2021
b29c394
raise exception when arguments to pipeline are incomplete (#12548)
hwijeen Jul 8, 2021
0a6b904
Init pickle (#12567)
sgugger Jul 8, 2021
6f1adc4
Fix group_lengths for short datasets (#12558)
sgugger Jul 8, 2021
0085e71
Don't stop at num_epochs when using IterableDataset (#12561)
sgugger Jul 8, 2021
2aa3cd9
[RFC] Laying down building stone for more flexible ONNX export capabi…
mfuntowicz Jul 8, 2021
4da568c
Fixing the pipeline optimization by reindexing targets (V2) (#12330)
Narsil Jul 8, 2021
75e63db
Fix MT5 init (#12591)
sgugger Jul 8, 2021
f0dde60
[model.from_pretrained] raise exception early on failed load (#12574)
stas00 Jul 8, 2021
ce111fe
[doc] fix broken ref (#12597)
stas00 Jul 8, 2021
8fe836a
Add Flax sprint project evaluation section (#12592)
osanseviero Jul 9, 2021
cc12e1d
This will reduce "Already borrowed error": (#12550)
Narsil Jul 9, 2021
65e2721
[Flax] Add flax marian (#12595)
patrickvonplaten Jul 9, 2021
deecdd4
[Flax] Fix cur step flax examples (#12608)
patrickvonplaten Jul 9, 2021
0cc2dc2
Simplify unk token (#12582)
sgugger Jul 9, 2021
18ca59e
Fix arg count for partial functions (#12609)
sgugger Jul 9, 2021
e7f33e8
Pass `model_kwargs` when loading a model in `pipeline()` (#12449)
aphedges Jul 9, 2021
51eb6d3
[Flax] Fix mt5 auto (#12612)
patrickvonplaten Jul 9, 2021
165606e
[Flax Marian] Add marian flax example (#12614)
patrickvonplaten Jul 9, 2021
934222e
[FLax] Fix marian docs 2 (#12615)
patrickvonplaten Jul 9, 2021
fb65f65
Add TFHubertModel (#12206)
will-rice Jul 9, 2021
4cdbf63
[debugging utils] minor doc improvements (#12525)
stas00 Jul 10, 2021
0dcc3c8
[doc] DP/PP/TP/etc parallelism (#12524)
stas00 Jul 10, 2021
9ee66ad
fix anchor (#12620)
stas00 Jul 10, 2021
de23ece
added test file (#12630)
bhadreshpsavani Jul 12, 2021
2dd9440
Point to the right file for hybrid CLIP (#12599)
edugp Jul 12, 2021
f8f9a67
fix type check (#12638)
patil-suraj Jul 12, 2021
a882b9f
Add tokenizer_file parameter to PreTrainedTokenizerFast docstring (#1…
lewisbails Jul 12, 2021
0af8579
Skip TestMarian_MT_EN (#12649)
LysandreJik Jul 12, 2021
fb5665b
The extended trainer tests should require torch (#12650)
LysandreJik Jul 12, 2021
ad42054
Minimum requirement for pyyaml
sgugger Jul 12, 2021
9adff7a
Fix syntax in conda file
sgugger Jul 12, 2021
0f43e74
Fix typo
sgugger Jul 12, 2021
379f649
TF summarization example (#12617)
Rocketknight1 Jul 12, 2021
9b3aab2
Pickle auto models (#12654)
sgugger Jul 12, 2021
fd41e2d
Pipeline should be agnostic (#12656)
LysandreJik Jul 12, 2021
b189226
Fix transfo xl integration test (#12652)
LysandreJik Jul 12, 2021
da0e9ee
remove documentation (#12657)
philschmid Jul 12, 2021
b90d499
fixed docs (#12646)
KickItLikeShika Jul 12, 2021
21a81c1
fix typo in modeling_t5.py docstring (#12640)
PhilipMay Jul 12, 2021
9d771c5
Translate README.md to Simplified Chinese (#12596)
JetRunner Jul 12, 2021
dc06e43
Fix typo in README_zh-hans.md (#12663)
JetRunner Jul 12, 2021
c523b24
Update timeline for Flax event evaluation
osanseviero Jul 12, 2021
a6938c4
Patch BigBird tokenization test (#12653)
LysandreJik Jul 13, 2021
9da1aca
**encode_plus() shouldn't run for W2V2CTC (#12655)
LysandreJik Jul 13, 2021
5803a2a
Add ByT5 option to example run_t5_mlm_flax.py (#12634)
mapmeld Jul 13, 2021
9519f0c
Wrong model is used in example, should be character instead of subwor…
jsteggink Jul 13, 2021
7f6d375
[Blenderbot] Fix docs (#12227)
patrickvonplaten Jul 13, 2021
90178b0
Add option to load a pretrained model with mismatched shapes (#12664)
sgugger Jul 13, 2021
711d901
Fix minor docstring typos. (#12682)
qqaatw Jul 13, 2021
7a22a02
[tokenizer.prepare_seq2seq_batch] change deprecation to be easily act…
stas00 Jul 13, 2021
cee2d21
[Flax Generation] Correct inconsistencies PyTorch/Flax (#12662)
patrickvonplaten Jul 13, 2021
65bf05c
Adding TF translation example (#12667)
Rocketknight1 Jul 13, 2021
78f5fe1
[Deepspeed] adapt multiple models, add zero_to_fp32 tests (#12477)
stas00 Jul 13, 2021
83f0251
Add timeout to CI. (#12684)
LysandreJik Jul 13, 2021
4cdb7ee
fix #11724 (#11897)
JunnYu Jul 13, 2021
5dd0c95
non-native optimizers are mostly ok with zero-offload (#12690)
stas00 Jul 14, 2021
144cea2
Fix multiple choice doc examples (#12679)
sgugger Jul 14, 2021
d94773e
Provide mask_time_indices to `_mask_hidden_states` to avoid double ma…
mfuntowicz Jul 14, 2021
f4399ec
Update README.md
patrickvonplaten Jul 14, 2021
f9ac677
Update TF examples README (#12703)
Rocketknight1 Jul 14, 2021
11edecd
Fix uninitialized variables when `config.mask_feature_prob > 0` (#12705)
mfuntowicz Jul 14, 2021
084873b
Only test the files impacted by changes in the diff (#12644)
sgugger Jul 14, 2021
79c57e1
Deprecate TFTrainer (#12706)
Rocketknight1 Jul 14, 2021
44f5b26
flax model parallel training (#12590)
patil-suraj Jul 14, 2021
a18a17d
[test] split test into 4 sub-tests to avoid timeout (#12710)
stas00 Jul 14, 2021
1a3deae
[trainer] release tmp memory in checkpoint load (#12718)
stas00 Jul 14, 2021
8244c5a
[Flax] Correct shift labels for seq2seq models in Flax (#12720)
patrickvonplaten Jul 15, 2021
6fb58d3
Fix typo in example (#12716)
will-rice Jul 15, 2021
199b4c5
Init adds its own files as impacted (#12709)
sgugger Jul 15, 2021
01cb2f2
LXMERT integration test typo (#12736)
LysandreJik Jul 15, 2021
3290315
Fix AutoModel tests (#12733)
LysandreJik Jul 15, 2021
8c7bd1b
Skip test while the model is not available (#12739)
LysandreJik Jul 15, 2021
eb2e006
Skip test while the model is not available (#12740)
LysandreJik Jul 15, 2021
2349ac5
Translate README.md to Traditional Chinese (#12701)
qqaatw Jul 15, 2021
370be9c
Fix MBart failing test (#12737)
LysandreJik Jul 15, 2021
f42d9dc
Patch T5 device test (#12742)
LysandreJik Jul 15, 2021
f03580f
Fix DETR integration test (#12734)
LysandreJik Jul 15, 2021
959d448
Fix led torchscript (#12735)
LysandreJik Jul 15, 2021
eb4d7ef
Remove framework mention (#12731)
LysandreJik Jul 15, 2021
68605e9
[doc] parallelism: Which Strategy To Use When (#12712)
stas00 Jul 15, 2021
31cfcbd
[doc] performance: batch sizes (#12725)
stas00 Jul 15, 2021
5f2791c
Replace specific tokenizer in log message by AutoTokenizer (#12745)
SaulLu Jul 15, 2021
2e9fb13
[Wav2Vec2] Correctly pad mask indices for PreTraining (#12748)
patrickvonplaten Jul 15, 2021
a76dd7e
Update README.md
patrickvonplaten Jul 15, 2021
6989264
[doc] testing: how to trigger a self-push workflow (#12724)
stas00 Jul 15, 2021
c07334c
add intel-tensorflow-avx512 to the candidates (#12751)
zzhou612 Jul 16, 2021
8ef3f36
fix typos (#12757)
patil-suraj Jul 16, 2021
fbf1397
Turn on eval mode when exporting to ONNX (#12758)
mfuntowicz Jul 16, 2021
6e87010
Preserve `list` type of `additional_special_tokens` in `special_token…
SaulLu Jul 16, 2021
b4b562d
[Wav2Vec2] Padded vectors should not allowed to be sampled (#12764)
patrickvonplaten Jul 16, 2021
08d609b
Add tokenizers class mismatch detection between `cls` and checkpoint …
europeanplaice Jul 17, 2021
da72ac6
Fix push_to_hub docstring and make it appear in doc (#12770)
sgugger Jul 17, 2021
c6b9095
Update README.md
patrickvonplaten Jul 17, 2021
534f6eb
Create README.md
patrickvonplaten Jul 17, 2021
cab3b86
[ray] Fix `datasets_modules` ImportError with Ray Tune (#12749)
Yard1 Jul 19, 2021
546dc24
Longer timeout for slow tests (#12779)
LysandreJik Jul 19, 2021
0118ef8
Enforce eval and save strategies are compatible when --load_best_mode…
sgugger Jul 19, 2021
7fae535
add troubleshooting docs (#12791)
stas00 Jul 20, 2021
6f8e367
Fix Padded Batch Error 12282 (#12487)
will-rice Jul 20, 2021
66197ad
Flax MLM: Allow validation split when loading dataset from local file…
fgaim Jul 20, 2021
13fefdf
Update README.md
patrickvonplaten Jul 20, 2021
2955d50
[Longformer] Correct longformer docs (#12809)
patrickvonplaten Jul 20, 2021
31d0672
Update README.md
patrickvonplaten Jul 20, 2021
b5b4e54
add and fix examples (#12810)
patil-suraj Jul 20, 2021
acdd78d
Update README.md
patrickvonplaten Jul 20, 2021
cabcc75
[trainer] sanity checks for `save_steps=0|None` and `logging_steps=0`…
stas00 Jul 20, 2021
c3d9ac7
Expose get_config() on ModelTesters (#12812)
LysandreJik Jul 21, 2021
15d19ec
fix convert_tokens_to_string calls (#11716)
PhilipMay Jul 21, 2021
037bdf8
Refer warmup_ratio when setting warmup_num_steps. (#12818)
tsuchm Jul 21, 2021
786ced3
Add versioning system to fast tokenizer files (#12713)
sgugger Jul 21, 2021
ac3cb66
Add _CHECKPOINT_FOR_DOC to all models (#12811)
LysandreJik Jul 21, 2021
cf0755a
[debug] DebugUnderflowOverflow doesn't work with DP (#12816)
stas00 Jul 21, 2021
8c2384d
Raise warning in HP search when hp is not in args (#12831)
sgugger Jul 21, 2021
807b6bd
[Deepspeed] warmup_ratio docs (#12830)
stas00 Jul 21, 2021
27a8c9e
[parallelism doc] document Deepspeed-Inference and parallelformers (#…
stas00 Jul 21, 2021
fcf8301
Fix type of max_seq_length arg in run_swag.py (#12832)
mbforbes Jul 22, 2021
72aee83
Release: v4.9.0
LysandreJik Jul 22, 2021
6cab8b3
Add doc for v4.9.0
sgugger Jul 26, 2021
8ee16d8
Fix barrier for SM distributed (#12853)
sgugger Jul 26, 2021
bff1c71
Release: v4.9.1
sgugger Jul 26, 2021
ca272fc
ONNX v2 raises an Exception when using PyTorch < 1.8.0 (#12933)
mfuntowicz Jul 29, 2021
2c255a2
Fix push_to_hub for TPUs (#12895)
sgugger Jul 26, 2021
94b7db9
GPT-Neo ONNX export (#12911)
michaelbenayoun Aug 5, 2021
a12fa50
T5 with past ONNX export (#13014)
michaelbenayoun Aug 6, 2021
f595ea3
Put smaller ALBERT model (#13028)
LysandreJik Aug 6, 2021
226763a
Add MBART to models exportable with ONNX (#13049)
LysandreJik Aug 9, 2021
bfd5354
Add to ONNX docs (#13048)
LysandreJik Aug 9, 2021
ec78422
Tpu tie weights (#13030)
sgugger Aug 6, 2021
41981a2
Patch release: v4.9.2
LysandreJik Aug 9, 2021
68730a4
remove files from 'v4.9.2' before merge
calpt Sep 17, 2021
f8e84af
Merge stripped branch 'v4.9.2'
calpt Sep 17, 2021
226e87b
Fix config keys in GPT2ModelTester
calpt Sep 17, 2021
4ac892d
Post-merge fixes for tests & example scripts.
calpt Sep 20, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
7 changes: 7 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,12 @@ modified_only_fixup:
deps_table_update:
@python setup.py deps_table_update

deps_table_check_updated:
@md5sum src/transformers/dependency_versions_table.py > md5sum.saved
@python setup.py deps_table_update
@md5sum -c --quiet md5sum.saved || (printf "\nError: the version dependency table is outdated.\nPlease run 'make fixup' or 'make style' and commit the changes.\n\n" && exit 1)
@rm md5sum.saved

# autogenerating code

autogenerate_code: deps_table_update
Expand All @@ -32,6 +38,7 @@ autogenerate_code: deps_table_update
# python utils/check_copies.py
# python utils/check_table.py
# python utils/check_dummies.py
# python utils/tests_fetcher.py --sanity_check
extra_quality_checks:
python utils/check_repo.py
python utils/check_inits.py
Expand Down
84 changes: 53 additions & 31 deletions examples/language-modeling/run_clm.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
from dataclasses import dataclass, field
from typing import Optional

import datasets
from datasets import load_dataset

import transformers
Expand All @@ -54,7 +55,7 @@


# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.8.0")
check_min_version("4.9.0")

require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/language-modeling/requirements.txt")

Expand Down Expand Up @@ -206,22 +207,23 @@ def main():

# Setup logging
logging.basicConfig(
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
datefmt="%m/%d/%Y %H:%M:%S",
handlers=[logging.StreamHandler(sys.stdout)],
)
logger.setLevel(logging.INFO if training_args.should_log else logging.WARN)

log_level = training_args.get_process_log_level()
logger.setLevel(log_level)
datasets.utils.logging.set_verbosity(log_level)
transformers.utils.logging.set_verbosity(log_level)
transformers.utils.logging.enable_default_handler()
transformers.utils.logging.enable_explicit_format()

# Log on each process the small summary:
logger.warning(
f"Process rank: {training_args.local_rank}, device: {training_args.device}, n_gpu: {training_args.n_gpu}"
+ f"distributed training: {bool(training_args.local_rank != -1)}, 16-bits training: {training_args.fp16}"
)
# Set the verbosity to info of the Transformers logger (on main process only):
if training_args.should_log:
transformers.utils.logging.set_verbosity_info()
transformers.utils.logging.enable_default_handler()
transformers.utils.logging.enable_explicit_format()
logger.info(f"Training/evaluation parameters {training_args}")

# Detecting last checkpoint.
Expand Down Expand Up @@ -253,15 +255,17 @@ def main():
# download the dataset.
if data_args.dataset_name is not None:
# Downloading and loading a dataset from the hub.
datasets = load_dataset(data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir)
if "validation" not in datasets.keys():
datasets["validation"] = load_dataset(
raw_datasets = load_dataset(
data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir
)
if "validation" not in raw_datasets.keys():
raw_datasets["validation"] = load_dataset(
data_args.dataset_name,
data_args.dataset_config_name,
split=f"train[:{data_args.validation_split_percentage}%]",
cache_dir=model_args.cache_dir,
)
datasets["train"] = load_dataset(
raw_datasets["train"] = load_dataset(
data_args.dataset_name,
data_args.dataset_config_name,
split=f"train[{data_args.validation_split_percentage}%:]",
Expand All @@ -280,7 +284,22 @@ def main():
)
if extension == "txt":
extension = "text"
datasets = load_dataset(extension, data_files=data_files, cache_dir=model_args.cache_dir)
raw_datasets = load_dataset(extension, data_files=data_files, cache_dir=model_args.cache_dir)
# If no validation data is there, validation_split_percentage will be used to divide the dataset.
if "validation" not in raw_datasets.keys():
raw_datasets["validation"] = load_dataset(
extension,
data_files=data_files,
split=f"train[:{data_args.validation_split_percentage}%]",
cache_dir=model_args.cache_dir,
)
raw_datasets["train"] = load_dataset(
extension,
data_files=data_files,
split=f"train[{data_args.validation_split_percentage}%:]",
cache_dir=model_args.cache_dir,
)

# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html.

Expand Down Expand Up @@ -392,9 +411,9 @@ def main():
# Preprocessing the datasets.
# First we tokenize all the texts.
if training_args.do_train:
column_names = datasets["train"].column_names
column_names = raw_datasets["train"].column_names
else:
column_names = datasets["validation"].column_names
column_names = raw_datasets["validation"].column_names
text_column_name = "text" if "text" in column_names else column_names[0]

# since this will be pickled to avoid _LazyModule error in Hasher force logger loading before tokenize_function
Expand All @@ -410,14 +429,15 @@ def tokenize_function(examples):
)
return output

tokenized_datasets = datasets.map(
tokenize_function,
batched=True,
num_proc=data_args.preprocessing_num_workers,
remove_columns=column_names,
load_from_cache_file=not data_args.overwrite_cache,
desc="Running tokenizer on dataset",
)
with training_args.main_process_first(desc="dataset map tokenization"):
tokenized_datasets = raw_datasets.map(
tokenize_function,
batched=True,
num_proc=data_args.preprocessing_num_workers,
remove_columns=column_names,
load_from_cache_file=not data_args.overwrite_cache,
desc="Running tokenizer on dataset",
)

if data_args.block_size is None:
block_size = tokenizer.model_max_length
Expand All @@ -442,7 +462,8 @@ def group_texts(examples):
total_length = len(concatenated_examples[list(examples.keys())[0]])
# We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
# customize this part to your needs.
total_length = (total_length // block_size) * block_size
if total_length >= block_size:
total_length = (total_length // block_size) * block_size
# Split by chunks of max_len.
result = {
k: [t[i : i + block_size] for i in range(0, total_length, block_size)]
Expand All @@ -458,13 +479,14 @@ def group_texts(examples):
# To speed up this part, we use multiprocessing. See the documentation of the map method for more information:
# https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.map

lm_datasets = tokenized_datasets.map(
group_texts,
batched=True,
num_proc=data_args.preprocessing_num_workers,
load_from_cache_file=not data_args.overwrite_cache,
desc=f"Grouping texts in chunks of {block_size}",
)
with training_args.main_process_first(desc="grouping texts together"):
lm_datasets = tokenized_datasets.map(
group_texts,
batched=True,
num_proc=data_args.preprocessing_num_workers,
load_from_cache_file=not data_args.overwrite_cache,
desc=f"Grouping texts in chunks of {block_size}",
)

if training_args.do_train:
if "train" not in tokenized_datasets:
Expand Down
24 changes: 21 additions & 3 deletions examples/language-modeling/run_clm_no_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
from tqdm.auto import tqdm

import transformers
from accelerate import Accelerator
from accelerate import Accelerator, DistributedType
from transformers import (
CONFIG_MAPPING,
MODEL_MAPPING,
Expand Down Expand Up @@ -200,7 +200,7 @@ def main():
accelerator = Accelerator()
# Make one log on every process with the configuration for debugging.
logging.basicConfig(
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
datefmt="%m/%d/%Y %H:%M:%S",
level=logging.INFO,
)
Expand Down Expand Up @@ -253,6 +253,19 @@ def main():
if extension == "txt":
extension = "text"
raw_datasets = load_dataset(extension, data_files=data_files)
# If no validation data is there, validation_split_percentage will be used to divide the dataset.
if "validation" not in raw_datasets.keys():
raw_datasets["validation"] = load_dataset(
extension,
data_files=data_files,
split=f"train[:{args.validation_split_percentage}%]",
)
raw_datasets["train"] = load_dataset(
extension,
data_files=data_files,
split=f"train[{args.validation_split_percentage}%:]",
)

# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html.

Expand Down Expand Up @@ -330,7 +343,8 @@ def group_texts(examples):
total_length = len(concatenated_examples[list(examples.keys())[0]])
# We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
# customize this part to your needs.
total_length = (total_length // block_size) * block_size
if total_length >= block_size:
total_length = (total_length // block_size) * block_size
# Split by chunks of max_len.
result = {
k: [t[i : i + block_size] for i in range(0, total_length, block_size)]
Expand Down Expand Up @@ -389,6 +403,10 @@ def group_texts(examples):
model, optimizer, train_dataloader, eval_dataloader
)

# On TPU, the tie weights in our model have been disconnected, so we need to restore the ties.
if accelerator.distributed_type == DistributedType.TPU:
model.tie_weights()

# Note -> the training dataloader needs to be prepared before we grab his length below (cause its length will be
# shorter in multiprocess)

Expand Down
Loading