Update training #4

ylacombe · 2024-04-02T16:31:22Z

No description provided.

…gingface#29771) * update quality check * make it nice * update * let's make sure it runs and we have the logs actually * update workflow * nits

…IterableDataset. Issue 29678 (huggingface#29738) * Fixed typehint for train_dataset param in Trainer.__init__(). Added IterableDataset option. * make fixup

* enable amd ci * remove unnecessary clean up

…#29389) * correct llava mask * fix vipllava as wlel * mask out embedding for padding tokens * add test * fix style * add setter * fix test on suggestion

* rm input dtype change in CPU * add warning when use CPU low-precision * rm useless logging

…uggingface#29787) remove unused attrs

huggingface#29785) replaced concatenation to f-strings to improve readability and unify with the rest code

nit

) * Security policy * Apply suggestions from code review Co-authored-by: Luc Georges <McPatate@users.noreply.github.com> Co-authored-by: Michelle Habonneau <83347449+Michellehbn@users.noreply.github.com> * Update SECURITY.md Co-authored-by: Diogo Teles Sant'Anna <diogoteles@google.com> --------- Co-authored-by: Luc Georges <McPatate@users.noreply.github.com> Co-authored-by: Michelle Habonneau <83347449+Michellehbn@users.noreply.github.com> Co-authored-by: Diogo Teles Sant'Anna <diogoteles@google.com>

[SuperPoint] Fix doc example

Fix typo for llava next docs

…uggingface#29702) * model_summary.md - Add link to Harvard's Annotated Transformer. * model_summary.md - slight wording change + capitalize name of the paper * model_summary.md - moves the Annotated Transformer link in a praenthesis next to the link to the original paper (great idea, stevhliu!) * model_summary.md - moves the Annotated Transformer link in a praenthesis next to the link to the original paper (commit pt. 2, accidentally removed "has" in pt. 1)

…ce#29112) * [test_all] Remove static pretrained maps from the library's internals * Deprecate archive maps instead of removing them * Revert init changes * [test_all] Deprecate instead of removing * [test_all] PVT v2 support * [test_all] Tests should all pass * [test_all] Style * Address review comments * Update src/transformers/models/deprecated/_archive_maps.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/deprecated/_archive_maps.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * [test_all] trigger tests * [test_all] LLAVA * [test_all] Bad rebase --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

…9099) fix the behavior of collecting 'num_input_tokens_seen' See huggingface#28791 for more details.

* Populate torch_dtype from model to pipeline Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * use property Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * lint Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * Remove default handling Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> --------- Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

Co-authored-by: Johannes <johannes.kolbe@tech.better.team>

…uggingface#29255) * add warnings if training args differ from checkpoint args stored in trainer_state.json * run formatting and styling * add a test * format and styling --------- Co-authored-by: Jonathan Flynn <jonl.flynn@guardian.co.uk>

…e#29747) * replace the 'decord' with 'av' in VideoClassificationPipeline * fix the check of backend in VideoClassificationPipeline * adjust the order of imports * format 'video_classification.py' * format 'video_classification.py' with ruff --------- Co-authored-by: wanqiancheng <13541261013@163.com>

Update image_feature_extraction.md

…s` (huggingface#29772) * update * add ut * update

* Add cosine_with_min_lr scheduler * Update error message for missing min_lr or min_lr_rate

* remove py3nvml to skip amd memory benchmarks * uninstall pynvml from docker images

…Implementation (huggingface#29557) * fix tinyllama flax modelling * rename vars to minimize changes * move * formatting * remove unused var

* add support for qwen2 MoE models * update docs * add support for qwen2 MoE models * update docs * update model name & test * update readme * update class names & readme & model_doc of Qwen2MoE. * update architecture name * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fix style * fix test when there are sparse and non sparse layers * fixup * Update README.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup * fixup * add archive back * add support for qwen2 MoE models * update docs * update model name & test * update readme * update class names & readme & model_doc of Qwen2MoE. * update architecture name * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * update modeling_qwen2_moe.py * fix model architecture * fixup * fix qwen2_moe tests * use Qwen2Tokenizer instead of Qwen2MoeTokenizer * fix style * fix test when there are sparse and non sparse layers * fixup * add archive back * fix integration test * fixup --------- Co-authored-by: bozheng-hit <dsoul0621@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* FIX: Cached slow forward in mamba - additionally added mamba cached test - added unused test (mamba causal lm forward and backward) - fixed typo: "causl" --> "causal" * formatting * fix: use real `slow_forward` call instead of torch module's * add shape assertion for mixer block test * adjust shape assertion

…huggingface#29813) * Check for requires_grad when initing weights * Add unit test * Move sinusoidal positional encoding generation after post_init() * Add modules to skip init list * Move create_sinusoidal_embeddings to _init_weights

* fi xbc? * nit

…#29842) Fix doc issue in DebertaV2Config class Co-authored-by: Vinayakk Garg <vigar@akamai.com>

* fix * fix test * style * nit * rather rely on concert token to id * fix quality * Update src/transformers/convert_slow_tokenizer.py

Trainer with PyTorch now requires accelerate to be installed. Partly resolves huggingface#29174

…gface#29479) * fix * revert for qwen2 * revert for qwen2 * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* with with * style

fix: rope_theta for open llama

* improve: error message for best model metric * update: raise warning instead of error

Co-authored-by: Alexander Jipa <azzhipa@amazon.com>

Fixes huggingface#29690

* Update qwen2_moe.md * update link of blogpost. * fixup --------- Co-authored-by: bozheng-hit <dsoul0621@gmail.com>

fix awq quant

* Start rework * Fix failing test * Include max * Update src/transformers/trainer.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix FA2 tests * refactor inference test name

* fix copies * nit * style * Update utils/check_copies.py

…ional_generation_llava` (huggingface#29975) bug fix

* update * feedback

* fix bug and add tests * nit * otherway to get the cur len instead of attention mask * more places where this might have been broken * nit * oups * inputs_embeds vs input_embeds * test generated outptus * style * nit * fix * skip failing biogpt

…LM (huggingface#29904) * Fix sinusoidal_embeddings in FlaubertModel * Fix for Informer * Fix for XLM * Move sinusoidal emb for XLM * Move sinusoidal emb for Flaubert * Small cleanup * Add comments on tests code copied from * Add with Distilbert->

fix bug

* fix issue with logit processor in beam search in Flax * adding FlaxNoRepeatNGramLogitsProcessor class + unit test * style correction and code verification * add FlaxNoRepeatNGramLogitsProcessor to the test_processor_list and test_processor_list_jitted tests * fix an issue where ngrams are banned only if they appear ==1 time + update description of get_previous_ngrams * replace non-jit compatible masking of ngrams that are not yet generated with jittable version * Revert "fix issue with logit processor in beam search in Flax" This reverts commit 09b70d7. * add FlaxNoRepeatNGramLogitsProcessor to _get_logits_processor * change the method of casting to boolean of banned tokens indices * fix code style * remove some useless operations + significantly faster computation of update indices using jax.lax.fori_loop * remove useless loop iterations * set some variables that were calculated and used multiple times * fix format

…gface#29939) * add FA2 to o.g Musicgen * make style * add FA2 support to Musicgen Melody * add generation FA2 tests to o.g Musicgen * make style and fix copies * add Musicgen to FA2 docs + deprecate list * add sdpa supports to Musicgen's * make style and fix copies * refactor attention implementation arguments * add Copied from to sdpa tests * add copied form in sdpa tests melody * add copied for FA2 generation tests * add FA2 inference copied from * make style

…ingface#29949)

…face#29311) * Fix skip_special_tokens process for Wav2Vec2CTCTokenizer._decode * Fix skip_special_tokens for Wav2Vec2CTCTokenizer._decode * Exclude pad_token filtering since it is used as CTC-blank token * Add small test for skip_special_tokens * Update decoding test for added new token

) * Hard error when ignoring tensors. (huggingface#27484) * [WIP] Hard error when ignoring tensors. * Better selection/error when saving a checkpoint. - Find all names we should normally drop (those are in the transformers config) - Find all disjoint tensors (for those we can safely trigger a copy to get rid of the sharing before saving) - Clone those disjoint tensors getting rid of the issue - Find all identical names (those should be declared in the config but we try to find them all anyway.) - For all identical names: - If they are in the config, just ignore them everything is fine - If they are not, warn about them. - For all remainder tensors which are shared yet neither identical NOR disjoint. raise a hard error. * Adding a failing test on `main` that passes here. * We don't need to keep the subfolder logic in this test. * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add small tests. * Dead variable. * Fixup. * Fixing tied_Weights_keys on generic models. * Fixup + T5 encoder/decoder tying (with different layers) * Code quality. * Dynamic member. * trigger * Fixing encoder name for other types of encoder/decoder combos. * Fix scoping. * Update .github/workflows/self-scheduled.yml Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Fixing the tied_weights after the call. --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix norm * fix logits processors doctests

* inital commit * update * update conversion checkpoint * update conversion script * nits * some fixes * nits * merge * fix permute * nits * fix * nits * nits * nits * fix rope * fix both rope * nites * style * make sure flax works * fix flax init code * fix foward * nits * print flax generation out * current code * nits * SIIIIIIIIIIIIIIIIIII * update * add new tokenizer * correct fast tokenizer * fix conversion * more comments * fix modeling and conversion * nits and nits * nits testing * add some tokenization tests * add some edge cases * add slow tests and fix them * fixup * fix copies for modeling * fix copies * add 7B slow tests * fix * fix * fix tests * make tokenizer cis go green * styling * last tokenizer nits * update jax tests * fix flax for 7b * add jit testing 🤗 * cleanups * isolated nit, inv_freq for rotary_emb.inv_freq * propagate to jax * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * adjust test * fix conversion script * change name * correct file names * update conversion script * Fix bos and eos token ids in the model configuration (#3) * update modelling * update conversion script * add static cache for gemma * fix sdpa generate * fix batched * multiple fixes * fix FA2 * final fix * Rename a few missing strings and filenames (#4) * merge with upstream main * fix copies * fix copies * fix fixup * fix fixup * fix * fix * final tests * fix fx gemma tests * fix fx bf16/fp16 tests * update slow fx tests * fx slow tests: one logits, one generation * move jit test standalone * Apply suggestions from code review * nits * tokenizer updates * more tokenization updates: custom GemmaSentencepieceExtrator * style * Update src/transformers/cache_utils.py * Update src/transformers/models/gemma/__init__.py * Update tests/models/gemma/test_modeling_flax_gemma.py * small nits * style * update tokenization test * fix the rotary embedding * with style * fix slow tests * WARNING this commit might be very important for precisions * Update tests/models/gemma/test_modeling_flax_gemma.py * Update src/transformers/models/gemma/configuration_gemma.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Update src/transformers/models/gemma/modeling_flax_gemma.py Co-authored-by: Lysandre Debut <hi@lysand.re> * small nits here and there! * forgotten nit * remove on the fly computation of inv_freq * revert previous change, let's be safe and for now re-compute freq cis to make sure it's in float * Apply suggestions from code review Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_flax_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_tokenization_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update tests/models/gemma/test_modeling_gemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * nit conversion script link * fix some tests * add not doctest and pr doctest * repo consistency * fix last CIs 🚀 * update all readmes --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: sanchit-gandhi <sanchit@huggingface.co> Co-authored-by: Lysandre Debut <hi@lysand.re>

ArthurZucker and others added 30 commits March 22, 2024 10:11

[quality] update quality check to make sure we check imports 😈 (hug…

e68ff30

…gingface#29771) * update quality check * make it nice * update * let's make sure it runs and we have the logs actually * update workflow * nits

Fix type hint for train_dataset param of Trainer.__init__() to allow …

3479161

…IterableDataset. Issue 29678 (huggingface#29738) * Fixed typehint for train_dataset param in Trainer.__init__(). Added IterableDataset option. * make fixup

Enable AMD docker build CI (huggingface#29803)

aa17cf9

* enable amd ci * remove unnecessary clean up

Correct llava mask & fix missing setter for vocab_size (huggingface…

13b2370

…#29389) * correct llava mask * fix vipllava as wlel * mask out embedding for padding tokens * add test * fix style * add setter * fix test on suggestion

rm input dtype change in CPU (huggingface#28631)

e85654f

* rm input dtype change in CPU * add warning when use CPU low-precision * rm useless logging

Generate: remove unused attributes in AssistedCandidateGenerator (h…

34e07f4

…uggingface#29787) remove unused attrs

replaced concatenation to f-strings to improve readability and unify … (

884b221

huggingface#29785) replaced concatenation to f-strings to improve readability and unify with the rest code

[cleanup] vestiges of causal mask (huggingface#29806)

2e7cb46

nit

[SuperPoint] Fix doc example (huggingface#29816)

c5f0288

[SuperPoint] Fix doc example

[DOCS] Fix typo for llava next docs (huggingface#29829)

dafe370

Fix typo for llava next docs

Fix the behavior of collecting 'num_input_tokens_seen' (huggingface#2…

afe73ae

…9099) fix the behavior of collecting 'num_input_tokens_seen' See huggingface#28791 for more details.

fix 😭

00a09ed

[revert commit] revert 00a09ed

e3e16dd

remove quotes in code example (huggingface#29812)

7eb3ba8

Co-authored-by: Johannes <johannes.kolbe@tech.better.team>

Fix header in IFE task guide (huggingface#29859)

de81a67

Update image_feature_extraction.md

[docs] Indent ordered list in add_new_model.md (huggingface#29796)

b9ceb03

Allow bos_token_id is None during the generation with `inputs_embed…

998b5bb

…s` (huggingface#29772) * update * add ut * update

Add cosine_with_min_lr scheduler in Trainer (huggingface#29341)

ef60995

* Add cosine_with_min_lr scheduler * Update error message for missing min_lr or min_lr_rate

Disable AMD memory benchmarks (huggingface#29871)

07d7952

* remove py3nvml to skip amd memory benchmarks * uninstall pynvml from docker images

Set custom_container in build docs workflows (huggingface#29855)

f01e160

Support num_attention_heads != num_key_value_heads in Flax Llama …

8e08aca

…Implementation (huggingface#29557) * fix tinyllama flax modelling * rename vars to minimize changes * move * formatting * remove unused var

ArthurZucker and others added 27 commits March 28, 2024 15:13

[BC] Fix BC for other libraries (huggingface#29934)

2bbbf1b

* fi xbc? * nit

Fix doc issue huggingface#29758 in DebertaV2Config class (huggingface…

e203646

…#29842) Fix doc issue in DebertaV2Config class Co-authored-by: Vinayakk Garg <vigar@akamai.com>

[LlamaSlowConverter] Slow to Fast better support (huggingface#29797)

536ea2a

* fix * fix test * style * nit * rather rely on concert token to id * fix quality * Update src/transformers/convert_slow_tokenizer.py

Update installs in image classification doc (huggingface#29947)

ba56ed0

Trainer with PyTorch now requires accelerate to be installed. Partly resolves huggingface#29174

Mark test_eager_matches_sdpa_generate flaky for some models (huggin…

43d17c1

…gface#29479) * fix * revert for qwen2 * revert for qwen2 * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Super tiny fix 12 typos about "with with" (huggingface#29926)

5ad7f17

* with with * style

Fix rope theta for OpenLlama (huggingface#29893)

6fd93fe

fix: rope_theta for open llama

Add warning message for run_qa.py (huggingface#29867)

156d30d

* improve: error message for best model metric * update: raise warning instead of error

fix: get mlflow version from mlflow-skinny (huggingface#29918)

e644b60

Co-authored-by: Alexander Jipa <azzhipa@amazon.com>

Reset alarm signal when the function is ended (huggingface#29706)

f6701bc

Fixes huggingface#29690

Update model card and link of blog post. (huggingface#29928)

46d6368

* Update qwen2_moe.md * update link of blogpost. * fixup --------- Co-authored-by: bozheng-hit <dsoul0621@gmail.com>

[BC] Fix BC for AWQ quant (huggingface#29965)

6e58407

fix awq quant

Fix FA2 tests (huggingface#29909)

569f6c7

* fix FA2 tests * refactor inference test name

Fix copies main ci (huggingface#29979)

fa2c49b

* fix copies * nit * style * Update utils/check_copies.py

[tests] fix the wrong output in `ImageToTextPipelineTests.test_condit…

e4f5b57

…ional_generation_llava` (huggingface#29975) bug fix

Generate: move misplaced test (huggingface#29902)

c9f6e5e

[docs] Big model loading (huggingface#29920)

096f304

* update * feedback

[bnb] Fix bug in _replace_with_bnb_linear (huggingface#29958)

33288ff

fix bug

[Docs] Make an ordered list prettier in add_tensorflow_model.md (hugg…

cb5927c

…ingface#29949)

Generate: fix logits processors doctests (huggingface#29718)

5080ab1

* fix norm * fix logits processors doctests

ylacombe force-pushed the main branch from 3294068 to 5080ab1 Compare April 2, 2024 16:40

ylacombe merged commit 03b8bd6 into add-training-musicgen Apr 2, 2024
9 of 19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update training #4

Update training #4

ylacombe commented Apr 2, 2024

Update training #4

Update training #4

Conversation

ylacombe commented Apr 2, 2024