Automated PR: Downstream develop rebase new changes #71

Cemberk · 2024-11-14T07:55:00Z

This PR was created automatically by the Fork Maintenance System to sync changes from the downstream main into downstream develop.

* Update README.md * tests: forward ok * backward test done * done testing * removed check. scripts * Update README.md * added use_mambapy arg * fixed typo in warning * protected imports w/ mambapy package * delete pscan.py + raise rather than assert * Update import_utils.py * fix whitespaces and unused import * trailing whitespace + import block unformatted * Update modeling_mamba.py * transpose before pscan * shape comment * ran make style * use_mambapy=False by default Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * ran make fix-copies --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* renamed phi3 rope_scaling type * fixed trailing whitespaces * fixed test * added warning * fixed format

…e#32148) Revert "Incorrect Whisper long-form decoding timestamps (huggingface#32003)" This reverts commit cd48553.

…ingface#31857) * feat(cache): StaticCache uses index_copy_ to avoid useless copy Using index_copy_ allows for explicit in-place change of the tensor. Some backends (XLA) will otherwise copy the tensor, making the code slower and using more memory. Proposed implementation will end up using less memory and on XLA will result in less compilation, but the change is also quite generic, making no change whatsoever on CUDA or CPU backend. * feat(cache): SlidingWindowCache uses index_copy_ to avoid useless copy Applying the same change done in StaticCache. * fix(cache): fallback of index_copy_ when not implemented * fix(cache): in index_copy_ ensure tensors are on same device * [run slow] llama * fix(cache): add move of cache_position to same device in SlidingWindowCache * Revert "[run slow] llama" This reverts commit 02608dd.

…r search (huggingface#31924) Update integration_utils.py Added additional kwarg

…ith Position IDs (huggingface#31629) * add DataCollatorBatchFlattening * Update data_collator.py * change name * new FA2 flow if position_ids is provided * add comments * minor fix * minor fix data collator * add test cases for models * add test case for data collator * remove extra code * formating for ruff check and check_repo.py * ruff format ruff format tests src utils * custom_init_isort.py

* Updated ruff version and fixed the required code accorindg to the latest version. * Updated ruff version and fixed the required code accorindg to the latest version. * Added noqa directive to ignore 1 error shown by ruff

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>

…face#32160) Fixed an if condition always evaluating to true.

fix

…eights in the layer (huggingface#32171) * adds: extra_repr() to MambaRMSNorm to include the hidden size of the layer * style fix with ruff:

…than the ones present at import time. (huggingface#32153) * fix: default value reflects the runtime environment variables rather than the ones present at import time. * Fix: Change `deterministic` to None by default; use env var if None

* Update qwen2.md outdated description * Update qwen2.md amended * Update qwen2.md Update * Update qwen2.md fix wrong version code, now good to go

Remove conversation pipeline tests

* relaxed rope check * lets also accept rope_type=None, defaulting to the original implementation * type and rope_type can coexist

* let's not warn when someone is running a foward without cache + self.training * more models * fixup

fix resize when deepspeed

* Fix float8_e4m3fn in modeling_utils * style * fix * comment

* support gguf fp16 * support gguf bf16 with pytorch * add gguf f16 test * remove bf16

* No more default chat templates * Add the template to the GPT-SW3 tests since it's not available by default now * Fix GPT2 test * Fix Bloom test * Fix Bloom test * Remove default templates again

…ingface#32198) Replaced deprecated unittest method with the correct one.

* [whisper] fix short-form output type * add test * make style * update long-form tests * fixes * last fix * finalise test

….7.0 (huggingface#32210) remove unnecessary guard code related with pytorch versions 1.4.2 ~ 1.7.0

…gingface#32222) set _supports_param_buffer_assignment to False

fix E721 warnings

* fix * [test_all] trigger full CI --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Update min version of accelerate to 0.26.0 * dev-ci * update min version in import * remove useless check * dev-ci * style * dev-ci * dev-ci

add nx

Co-authored-by: Gal Cohen <galc@ai21.com>

* mamba2 uses norm_before_gate=False * small nit * remove norm_before_gate flag and follow False path only

…nsformer (huggingface#32903) Bump nltk in /examples/research_projects/decision_transformer Bumps [nltk](https://github.com/nltk/nltk) from 3.7 to 3.9. - [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog) - [Commits](nltk/nltk@3.7...3.9) --- updated-dependencies: - dependency-name: nltk dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…xport (huggingface#32887) * Replace .norm() with decomposed version for executorch export * [run_slow] clip

* link for optimizer names Add a note and link to where the user can find more optimizer names easily because there are many more optimizers than are mentioned in the docstring. * make fixup

* Update README.md * Update README.md * Add README_ar.md to i18n/README_de.md * Add README_ar.md to i18n/README_es.md * Add README_ar.md to i18n/README_fr.md * Add README_ar.md to i18n/README_hd.md * Add README_ar.md to i18n/README_ja.md * Add README_ar.md to i18n/README_ko.md * Add README_ar.md to i18n/README_pt-br.md * Add README_ar.md to i18n/README_ru.md * Add README_ar.md to i18n/README_te.md * Add README_ar.md to i18n/README_vi.md * Add README_ar.md to i18n/README_vi.md * Add README_ar.md to i18n/README_zh-hans.md * Add README_ar.md to i18n/README_zh-hant.md * Create README_ar.md

… when `return_timestamps` is not passed to `generate` function (huggingface#31296) [whisper] don't overwrite return_timestamps when not passed to generate

commit

* try test updates * a few more changes * a few more changes * a few more changes * [run slow] jamba * skip logits checks on older gpus * [run slow] jamba * oops * [run slow] jamba * Update tests/models/jamba/test_modeling_jamba.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/jamba/test_modeling_jamba.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

…ngface#32891) Added missing huggingface_hub installation to workflows.

Co-authored-by: Gal Cohen <galc@ai21.com>

* add 4bit optimizer * style * fix msg * style * add qgalore * Revert "add qgalore" This reverts commit 25278e8. * style * version check

* separate step to download nltk files * duplicated * rm comma

…1469) * Update hub.py * Update errors * Apply suggestions from code review Co-authored-by: Lucain <lucainp@gmail.com> --------- Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Lucain <lucainp@gmail.com>

* fix * >= 0.3.0 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Do not call torch.repeat_interleave if expand_size is 1

…e#32908) * add chat_template to gguf tokenizer * add template through tokenizer config

…ainer` with `eval_on_start=True` in Jupyter Notebook. (huggingface#32849) fix: `AttributeError` raised when using `Trainer` with `eval_on_start=True` in Jupyter Notebook.

…1691 (huggingface#32921) fix save_pretrained

…on.md to Korean" (huggingface#32334) * docs: ko: tasks/knowledge_distillation_for_image_classification.md * feat: nmt draft * fix: manual edits * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> * Apply suggestions from code review Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com> * Apply suggestions from code review Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com> * Apply suggestions from code review Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com> * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> * Apply suggestions from code review Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review --------- Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr> Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

…che=False` (huggingface#32863)

fix outdated link

alxndrTL and others added 30 commits July 23, 2024 12:32

Rename Phi-3 rope scaling type (huggingface#31436)

034b477

* renamed phi3 rope_scaling type * fixed trailing whitespaces * fixed test * added warning * fixed format

Revert "Incorrect Whisper long-form decoding timestamps " (huggingfac…

3263b34

…e#32148) Revert "Incorrect Whisper long-form decoding timestamps (huggingface#32003)" This reverts commit cd48553.

Fix typing to be compatible with later py versions (huggingface#32155)

a009fbd

Added additional kwarg for successful running of optuna hyperparamete…

7d92009

…r search (huggingface#31924) Update integration_utils.py Added additional kwarg

Updated ruff to the latest version (huggingface#31926)

d2c687b

* Updated ruff version and fixed the required code accorindg to the latest version. * Updated ruff version and fixed the required code accorindg to the latest version. * Added noqa directive to ignore 1 error shown by ruff

Dev version: v4.44.0.dev0

ff0d708

Llama 3.1 conversion

d5a99df

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>

fix (huggingface#32162)

23f6a43

fix: Fixed an if condition that is always evaluating to true (hugging…

bc2adb0

…face#32160) Fixed an if condition always evaluating to true.

[docs] change temperature to a positive value (huggingface#32077)

c85510f

fix

adds: extra_repr() to MambaRMSNorm to include hidden size / size of w…

01be5b4

…eights in the layer (huggingface#32171) * adds: extra_repr() to MambaRMSNorm to include the hidden size of the layer * style fix with ruff:

Update qwen2.md (huggingface#32108)

5f4ee98

* Update qwen2.md outdated description * Update qwen2.md amended * Update qwen2.md Update * Update qwen2.md fix wrong version code, now good to go

Remove conversational pipeline tests (huggingface#32099)

165116b

Remove conversation pipeline tests

RoPE: relaxed rope validation (huggingface#32182)

e0182f3

* relaxed rope check * lets also accept rope_type=None, defaulting to the original implementation * type and rope_type can coexist

let's not warn when someone is running a forward (huggingface#32176)

8d2534c

* let's not warn when someone is running a foward without cache + self.training * more models * fixup

Fix resize embedding with Deepspeed (huggingface#32192)

1392a68

fix resize when deepspeed

Fix float8_e4m3fn in modeling_utils (huggingface#32193)

af0e4b7

* Fix float8_e4m3fn in modeling_utils * style * fix * comment

Support dequantizing GGUF FP16 format (huggingface#31783)

1c122a4

* support gguf fp16 * support gguf bf16 with pytorch * add gguf f16 test * remove bf16

🚨 No more default chat templates (huggingface#31733)

edd68f4

* No more default chat templates * Add the template to the GPT-SW3 tests since it's not available by default now * Fix GPT2 test * Fix Bloom test * Fix Bloom test * Remove default templates again

fix: Replaced deprecated unittest method with the correct one (hugg…

85a1269

…ingface#32198) Replaced deprecated unittest method with the correct one.

[whisper] fix short-form output type (huggingface#32178)

5658e74

* [whisper] fix short-form output type * add test * make style * update long-form tests * fixes * last fix * finalise test

remove unnecessary guard code related with pytorch versions 1.4.2 ~ 1…

f53a5de

….7.0 (huggingface#32210) remove unnecessary guard code related with pytorch versions 1.4.2 ~ 1.7.0

Update question_answering.py (huggingface#32208)

1ecedf1

[BigBird Pegasus] set _supports_param_buffer_assignment to False (hug…

9b9a54e

…gingface#32222) set _supports_param_buffer_assignment to False

[warnings] fix E721 warnings (huggingface#32223)

de23188

fix E721 warnings

Follow up for huggingface#31973 (huggingface#32025)

df6eee9

* fix * [test_all] trigger full CI --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

SunMarc and others added 29 commits August 20, 2024 11:42

🚨🚨🚨 Update min version of accelerate to 0.26.0 (huggingface#32627)

fd06ad5

* Update min version of accelerate to 0.26.0 * dev-ci * update min version in import * remove useless check * dev-ci * style * dev-ci * dev-ci

Fix repr for conv (huggingface#32897)

65f4bc9

add nx

fix: jamba cache fails to use torch.nn.module (huggingface#32894)

01c4fc4

Co-authored-by: Gal Cohen <galc@ai21.com>

Fix: Mamba2 norm_before_gate usage (huggingface#32686)

c63a3d0

* mamba2 uses norm_before_gate=False * small nit * remove norm_before_gate flag and follow False path only

Replace tensor.norm() with decomposed version for CLIP executorch e…

078d5a8

…xport (huggingface#32887) * Replace .norm() with decomposed version for executorch export * [run_slow] clip

link for optimizer names (huggingface#32400)

1dde50c

* link for optimizer names Add a note and link to where the user can find more optimizer names easily because there are many more optimizers than are mentioned in the docstring. * make fixup

fix: [whisper] don't overwrite GenerationConfig's return_timestamps…

c6d484e

… when `return_timestamps` is not passed to `generate` function (huggingface#31296) [whisper] don't overwrite return_timestamps when not passed to generate

Update docker image building (huggingface#32918)

3bb7b05

commit

fix: Added missing huggingface_hub installation to workflows (huggi…

af638c4

…ngface#32891) Added missing huggingface_hub installation to workflows.

fix: no need to dtype A in jamba (huggingface#32924)

6baa6f2

Co-authored-by: Gal Cohen <galc@ai21.com>

FEAT / Trainer: Add adamw 4bit optimizer (huggingface#31865)

c42d264

* add 4bit optimizer * style * fix msg * style * add qgalore * Revert "add qgalore" This reverts commit 25278e8. * style * version check

CI: separate step to download nltk files (huggingface#32935)

8b94d28

* separate step to download nltk files * duplicated * rm comma

Add SynCode to llm_tutorial (huggingface#32884)

9282413

Fix benchmark script (huggingface#32635)

bf97d4a

* fix * >= 0.3.0 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Improve greedy search memory usage (huggingface#32895)

99d67f1

Do not call torch.repeat_interleave if expand_size is 1

Add chat_template for tokenizer extracted from GGUF model (huggingfac…

ee8c01f

…e#32908) * add chat_template to gguf tokenizer * add template through tokenizer config

fix: (issue huggingface#32689) AttributeError raised when using `Tr…

f1d822b

…ainer` with `eval_on_start=True` in Jupyter Notebook. (huggingface#32849) fix: `AttributeError` raised when using `Trainer` with `eval_on_start=True` in Jupyter Notebook.

Gemma2: eager attention by default (huggingface#32865)

975b988

[run_slow] idefics2 (huggingface#32840)

18199b3

Fix regression on Processor.save_pretrained caused by huggingface#3…

273c0af

…1691 (huggingface#32921) fix save_pretrained

Generate: Deprecate returning legacy cache by default; Handle `use_ca…

a26de15

…che=False` (huggingface#32863)

docs: fix outdated link to TF32 explanation (huggingface#32947)

d806fa3

fix outdated link

Merge downstream main into tmp-main-20241114 with conflicts

cfd2e47

conflict changes 2 11/14/24

0c944eb

Cemberk merged commit 306c591 into main Nov 14, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automated PR: Downstream develop rebase new changes #71

Automated PR: Downstream develop rebase new changes #71

Cemberk commented Nov 14, 2024

Automated PR: Downstream develop rebase new changes #71

Automated PR: Downstream develop rebase new changes #71

Conversation

Cemberk commented Nov 14, 2024