Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated PR: Downstream develop rebase new changes #71

Merged
merged 1,841 commits into from
Nov 14, 2024
Merged

Conversation

Cemberk
Copy link
Owner

@Cemberk Cemberk commented Nov 14, 2024

This PR was created automatically by the Fork Maintenance System to sync changes from the downstream main into downstream develop.

alxndrTL and others added 30 commits July 23, 2024 12:32
* Update README.md

* tests: forward ok

* backward test done

* done testing

* removed check. scripts

* Update README.md

* added use_mambapy arg

* fixed typo in warning

* protected imports w/ mambapy package

* delete pscan.py + raise rather than assert

* Update import_utils.py

* fix whitespaces and unused import

* trailing whitespace + import block unformatted

* Update modeling_mamba.py

* transpose before pscan

* shape comment

* ran make style

* use_mambapy=False by default

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* ran make fix-copies

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* renamed phi3 rope_scaling type

* fixed trailing whitespaces

* fixed test

* added warning

* fixed format
…e#32148)

Revert "Incorrect Whisper long-form decoding timestamps  (huggingface#32003)"

This reverts commit cd48553.
…ingface#31857)

* feat(cache): StaticCache uses index_copy_ to avoid useless copy

Using index_copy_ allows for explicit in-place change of the tensor.
Some backends (XLA) will otherwise copy the tensor, making the code
slower and using more memory.

Proposed implementation will end up using less memory and on XLA will
result in less compilation, but the change is also quite generic, making
no change whatsoever on CUDA or CPU backend.

* feat(cache): SlidingWindowCache uses index_copy_ to avoid useless copy

Applying the same change done in StaticCache.

* fix(cache): fallback of index_copy_ when not implemented

* fix(cache): in index_copy_ ensure tensors are on same device

* [run slow] llama

* fix(cache): add move of cache_position to same device in SlidingWindowCache

* Revert "[run slow] llama"

This reverts commit 02608dd.
…r search (huggingface#31924)

Update integration_utils.py

Added additional kwarg
…ith Position IDs (huggingface#31629)

* add DataCollatorBatchFlattening

* Update data_collator.py

* change name

* new FA2 flow if position_ids is provided

* add comments

* minor fix

* minor fix data collator

* add test cases for models

* add test case for data collator

* remove extra code

* formating for ruff check and check_repo.py

* ruff format

ruff format tests src utils

* custom_init_isort.py
* Updated ruff version and fixed the required code accorindg to the latest version.

* Updated ruff version and fixed the required code accorindg to the latest version.

* Added noqa directive to ignore 1 error shown by ruff
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
…eights in the layer (huggingface#32171)

* adds: extra_repr() to MambaRMSNorm to include the hidden size of the layer

* style fix with ruff:
…than the ones present at import time. (huggingface#32153)

* fix: default value reflects the runtime environment variables rather than the ones present at import time.

* Fix: Change `deterministic` to None by default; use env var if None
* Update qwen2.md

outdated description

* Update qwen2.md

amended

* Update qwen2.md

Update

* Update qwen2.md

fix wrong version code, now good to go
* relaxed rope check

* lets also accept rope_type=None, defaulting to the original implementation

* type and rope_type can coexist
* let's not warn when someone is running a foward without cache + self.training

* more models

* fixup
* Fix float8_e4m3fn in modeling_utils

* style

* fix

* comment
* support gguf fp16

* support gguf bf16 with pytorch

* add gguf f16 test

* remove bf16
* No more default chat templates

* Add the template to the GPT-SW3 tests since it's not available by default now

* Fix GPT2 test

* Fix Bloom test

* Fix Bloom test

* Remove default templates again
…ingface#32198)

Replaced deprecated unittest method with the correct one.
* [whisper] fix short-form output type

* add test

* make style

* update long-form tests

* fixes

* last fix

* finalise test
….7.0 (huggingface#32210)

remove unnecessary guard code related with pytorch versions 1.4.2 ~
1.7.0
* fix

* [test_all] trigger full CI

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
SunMarc and others added 29 commits August 20, 2024 11:42
* Update min version of accelerate to 0.26.0

* dev-ci

* update min version in import

* remove useless check

* dev-ci

* style

* dev-ci

* dev-ci
* mamba2 uses norm_before_gate=False

* small nit

* remove norm_before_gate flag and follow False path only
…nsformer (huggingface#32903)

Bump nltk in /examples/research_projects/decision_transformer

Bumps [nltk](https://github.com/nltk/nltk) from 3.7 to 3.9.
- [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog)
- [Commits](nltk/nltk@3.7...3.9)

---
updated-dependencies:
- dependency-name: nltk
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…xport (huggingface#32887)

* Replace .norm() with decomposed version for executorch export

* [run_slow] clip
* link for optimizer names

Add a note and link to where the user can find more optimizer names easily because there are many more optimizers than are mentioned in the docstring.

* make fixup
* Update README.md

* Update README.md

* Add README_ar.md to i18n/README_de.md

* Add README_ar.md to i18n/README_es.md

* Add README_ar.md to i18n/README_fr.md

* Add README_ar.md to i18n/README_hd.md

* Add README_ar.md to i18n/README_ja.md

* Add README_ar.md to i18n/README_ko.md

* Add README_ar.md to i18n/README_pt-br.md

* Add README_ar.md to i18n/README_ru.md

* Add README_ar.md to i18n/README_te.md

* Add README_ar.md to i18n/README_vi.md

* Add README_ar.md to i18n/README_vi.md

* Add README_ar.md to i18n/README_zh-hans.md

* Add README_ar.md to i18n/README_zh-hant.md

* Create README_ar.md
… when `return_timestamps` is not passed to `generate` function (huggingface#31296)

[whisper] don't overwrite return_timestamps when not passed to generate
* try test updates

* a few more changes

* a few more changes

* a few more changes

* [run slow] jamba

* skip logits checks on older gpus

* [run slow] jamba

* oops

* [run slow] jamba

* Update tests/models/jamba/test_modeling_jamba.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/jamba/test_modeling_jamba.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
…ngface#32891)

Added missing huggingface_hub installation to workflows.
Co-authored-by: Gal Cohen <galc@ai21.com>
* add 4bit optimizer

* style

* fix msg

* style

* add qgalore

* Revert "add qgalore"

This reverts commit 25278e8.

* style

* version check
* separate step to download nltk files

* duplicated

* rm comma
…1469)

* Update hub.py

* Update errors

* Apply suggestions from code review

Co-authored-by: Lucain <lucainp@gmail.com>

---------

Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Lucain <lucainp@gmail.com>
* fix

* >= 0.3.0

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Do not call torch.repeat_interleave if expand_size is 1
…e#32908)

* add chat_template to gguf tokenizer

* add template through tokenizer config
…ainer` with `eval_on_start=True` in Jupyter Notebook. (huggingface#32849)

fix: `AttributeError` raised when using `Trainer` with `eval_on_start=True` in Jupyter Notebook.
…on.md to Korean" (huggingface#32334)

* docs: ko: tasks/knowledge_distillation_for_image_classification.md

* feat: nmt draft

* fix: manual edits

* Apply suggestions from code review

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Apply suggestions from code review

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Apply suggestions from code review

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* Apply suggestions from code review

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* Apply suggestions from code review

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* Apply suggestions from code review

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Apply suggestions from code review

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Apply suggestions from code review

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

---------

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>
Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
@Cemberk Cemberk merged commit 306c591 into main Nov 14, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment