Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve from_pretrained for zero3 multi gpus mode #24964

Merged
merged 4 commits into from
Jul 21, 2023

Conversation

1ytic
Copy link
Contributor

@1ytic 1ytic commented Jul 20, 2023

Decrease RAM consumption during Deepspeed Zero 3 model initialisation with multiple GPUs.

This simple PR will save a ton of RAM in case with multi GPUs and Zero 3 Deepspeed scenario.

The idea is simple. We do not need to load checkpoints for all instances, because deepspeed.zero.GatheredParameters will copy weights from 0 rank.

Issues

Related issue #12273

Who can review?

@stas00
Copy link
Contributor

stas00 commented Jul 20, 2023

Thank you for the PR, @1ytic

The idea is excellent, but we need to think about multiple use-cases here.

what happens if zero3 is used, but not zero.Init - won't this code fail if you try to access any of the meta weights before deepspeed.initialize gets called at which point deepspeed will partition weights from rank 0 to other ranks. I'm flagging the issue of timing of when they are sharded

Unfortunately, when I initially designed this I didn't think anybody would want to use zero-3 w/o zero.Init - so I lumped them together - @pacman100 improved upon it in Accelerate to have 2 separate possibilities - is zero3 and is zero.init enabled, so it gives a more refined control for such optimizations.

So let's wait for Sourab to weigh in.

Meanwhile please test

  1. that this code works with pytorch-1.9 (I don't remember when meta was made to work)
  2. use USE_SLOW=1 pytest tests/deepspeed to do coverage testing, since the PR CI doesn't run deepspeed tests.

@1ytic
Copy link
Contributor Author

1ytic commented Jul 20, 2023

Thank you for feedback, @stas00

Just to clarify a little bit.

My changes effect only state_dict from checkpoints. The function deepspeed.zero.Init() doesn't care about state_dict. The real magic happened here, when we exit from deepspeed.zero.GatheredParameters() context.

I know modelling_utils.py is 4k lines monster and maybe I missed something, but seems like I effect only one scenario when we load checkpoints for already partitioned zero3 model. At least, I tested this scenario with 10GB checkpoint and 4 GPUs. I was able to decrease RAM consumption from 45GB to 17GB on single node.

@stas00
Copy link
Contributor

stas00 commented Jul 20, 2023

Exactly, the model instantiation is super complex, that's why I wrote the above.

The deepspeed integration test suite has a very high coverage so if you try running it and it succeeds then most likely it's all good. The size of the checkpoint doesn't matter for the purpose of accepting the PR, what matters is to ensure it doesn't break things.

and btw, you actually don't need to even load the checkpoint if you're resuming from a deepspeed zero checkpoint. In another project I hacked to have the model created w/o loading the model and then just used deepspeed checkpoint loading directly, which should already be doing that efficiently, since each gpu will only read its own shard of weights.

But, alas, making it generic enough so that it'd satisfy everybody is very difficult, that's why the general case is to ensure ease of use out of the box often at the cost of slow startup and more memory consumption.

Ideally the protocol should be like this:

  1. create a model on meta (~0 secs)
  2. load each shard into the gpu it belongs to (a few secs)

this should be extremely fast even for a huge model like BLOOM-176B

In the case of new training, there should be a way to pre-shard the model before loading it, so resume and new training will be identical model loading-wise. This is eventually will be done when universal checkpoint will be implemented for ZeRO (currently it's only available in Megatron-Deepspeed) microsoft/DeepSpeed#2921

So lots and lots of things to improve there.

And more things to fix on the deepspeed side, e.g. this is very wasteful microsoft/DeepSpeed#1971

@stas00
Copy link
Contributor

stas00 commented Jul 20, 2023

so practically please run the integration tests I described in the first reply of mine and if possible with pytorch-1.9 (minimal supported pytorch version).

@sgugger
Copy link
Collaborator

sgugger commented Jul 21, 2023

Just to quickly chime int, the minimal version is actually 1.10 now ;-) The meta device is in 1.9+ so that shouldn't be an issue.

@stas00
Copy link
Contributor

stas00 commented Jul 21, 2023

Thank you for this insight, Sylvain.

So then any recent pt version should be ok to test with, @1ytic

@pacman100
Copy link
Contributor

pacman100 commented Jul 21, 2023

Hello,

The trainer's behaviour isn't changed at all because the DeepSpeed config is still created using HfTrainerDeepSpeedConfig which sets the weakref _hf_deepspeed_config_weak_ref which is used in is_deepspeed_zero3_enabled to check if it is Stage-3. So, from trainer's perspective, this should work fine.

From Accelerate's perspective, when user specifies zero3_init_flag=False, the weakref _hf_deepspeed_config_weak_ref isn't created and as such the is_deepspeed_zero3_enabled will return False even if it is using Stage-3 because the user doesn't want to use deepspeed.zero.Init context manager. So, in this case too, this PR should work fine as map_location = "cpu" due to absence of weakref.

So, the changes of this PR look good if all the slow tests pass.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jul 21, 2023

The documentation is not available anymore as the PR was closed or merged.

@1ytic
Copy link
Contributor Author

1ytic commented Jul 21, 2023

@stas00 you were right, I caught an uninitialized error while testing. After fixing the tests passed:

RUN_SLOW=1 pytest -rs tests/deepspeed/

================================================= short test summary info ==================================================
SKIPPED [1] tests/deepspeed/test_deepspeed.py:949: test requires bfloat16 hardware support
================================= 108 passed, 1 skipped, 98 warnings in 2273.67s (0:37:53) =================================

I also added one more import. If it's too much, I can revert it.

@stas00
Copy link
Contributor

stas00 commented Jul 21, 2023

@tjruwase, please kindly have a look - do you see any problems to this approach of loading weights only on rank 0 and relying on partitioning to distribute the weights to the rest of the ranks under zero3? Could this somehow cause problem in the future?

The idea is to skip loading weights on all ranks but rank 0, since they will be discarded anyway.

Thank you!

@tjruwase
Copy link
Contributor

@1ytic, this is pretty neat. LGTM. Thanks!

Copy link
Contributor

@stas00 stas00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent. thank you for testing and fixing the dist issue.

I have run the slow tests as well and it all works great.

Thank you very much for this awesome contribution, @1ytic

I'll just have @sgugger have a quick look if he is happy with dist renames.

@stas00 stas00 requested a review from sgugger July 21, 2023 18:29
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR! Can we just leave torch.distributed as it was? dist is way less obvious as a name.

@@ -457,7 +458,11 @@ def load_state_dict(checkpoint_file: Union[str, os.PathLike]):
)
return safe_load_file(checkpoint_file)
try:
return torch.load(checkpoint_file, map_location="cpu")
if is_deepspeed_zero3_enabled() and dist.is_initialized() and dist.get_rank() > 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be the local_rank here? We need to load the state dict once per machine or just once is okay?

Copy link
Contributor

@stas00 stas00 Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good question. For ZeRO-DP just machine:0 rank:0 is enough, it then shards it to all gpus across multiple machines

But I'm yet to try ZeRO++ - @tjruwase, we haven't discussed this case. Won't loading on just machine:0 rank:0 become an issue if there are multiple DP replicas in ZeRO++, so that it'd require loading of local_rank=0 on each node? and then the proposed code should be adapted to load on rank 0 of each node?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just once should be okay, since the subsequent partitioning here is global.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super, thank you for validating, Tunji!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So all good for me, just reverting the dist rename and we can merge this :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stas00, I think this should also work for ZeRO++ because the multiple DP replicas are activated only during the backward pass to reduce allgather overheads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great to know, I appreciate you taking time to address this, Tunji!

@1ytic
Copy link
Contributor Author

1ytic commented Jul 21, 2023

Thanks for your PR! Can we just leave torch.distributed as it was? dist is way less obvious as a name.

I thought dist quite common name for torch.distributed, but up to you. I will rename it back.

@sgugger
Copy link
Collaborator

sgugger commented Jul 21, 2023

It is quite common and I would have no problem if this was in the Trainer file, but this file is not a distributed script and can be read by people less used to this. That's why it's better to spell it out IMO.

@sgugger sgugger merged commit ea41e18 into huggingface:main Jul 21, 2023
@sgugger
Copy link
Collaborator

sgugger commented Jul 21, 2023

Thanks for bearing with me!

@1ytic 1ytic deleted the from-pretrained-zero3 branch July 21, 2023 19:43
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

zachares added a commit to nplan-io/transformers that referenced this pull request Aug 11, 2023
* Enable `ZeroShotAudioClassificationPipelineTests::test_small_model_pt` (#24882)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add DINOv2 (#24016)

* First draft

* More improvements

* Convert patch embedding layer

* Convert all weights

* Make conversion work

* Improve conversion script

* Fix style

* Make all tests pass

* Add image processor to auto mapping

* Add swiglu ffn

* Add image processor to conversion script

* Fix conversion of giant model

* Fix documentation

* Fix style

* Fix tests

* Address comments

* Address more comments

* Remove unused arguments

* Remove more arguments

* Rename parameters

* Include mask token

* Address comments

* Add docstring

* Transfer checkpoints

* Empty commit

* [`InstructBlip`] Fix int8/fp4 issues (#24888)

* fix dtype issue

* revert `.float()`

* fix copies

* [`Blip`] Fix blip output name (#24889)

* fix blip output name

* add property

* oops

* fix failing test

* check if eval dataset is dict (#24877)

* check if eval dataset is dict

* formatting

* Separate CircleCI cache between `main` and `pull` (or other branches) (#24886)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [`Llama2`]  Add support for Llama 2 (#24891)

* add llama

* add other readmes

* update padding id in readme

* add link to paper

* fix paths and tokenizer

* more nits

* styling

* fit operation in 2 lines when possible

* nits

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add form

* update reademe

* update readme, we don't have a default pad token

* update test and tokenization

* LLaMA instead of Llama

* nits

* add expected text

* add greeedy output

* styling

* Update src/transformers/models/llama/modeling_llama.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* sequential device map

* skip relevant changes

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Disable ipex env var if false (#24885)

Disable ipex if in use

* Check for accelerate env var when doing CPU only (#24890)

Check for use-cpu

* Avoid some pipeline tasks to use `use_cache=True` (#24893)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Update tested versions in READMEs (#24895)

* Update supported Python and PyTorch versions in readme

* Update Python, etc. versions in non-English readmes

These were more out of date than in the English readme. This
updates all the versions the readmes claim the repository is tested
with to the same versions stated in the English readme.

Those versions are current at least in the case of the Python and
PyTorch versions (and less out of date for the others).

* Propagate trailing whitespace fix to model list

This runs "make fix-copies". The only change is the removal of
whitespace. No actual information or wording is changed.

* Update tested TensorFlow to 2.6 in all readmes

Per pinning in setup.py

Unlike Python and PyTorch, the minimum supported TensorFlow version
has not very recently changed, but old versions were listed in all
READMEs.

* Fix `test_model_parallelism` for `FalconModel` (#24914)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fixed issue where ACCELERATE_USE_CPU="False" results in bool(True) (#24907)

- This results in cpu mode on Apple Silicon mps

* fix typo in BARK_PRETRAINED_MODEL_ARCHIVE_LIST (#24902)

fix typo in BARK_PRETRAINED_MODEL_ARCHIVE_LIST

suno/barh should be suno/bark

* Fix minor llama2.md model doc typos (#24909)

Update llama2.md

 Fix typos in the llama2 model doc

* [`Llama2`] replace `self.pretraining_tp` with `self.config.pretraining_tp` (#24906)

* add possibility to disable TP

* fixup

* adapt from offline discussions

* [doc] `image_processing_vilt.py` wrong default documented (#24931)

[doc] image_processing_vilt.py wrong default

* 🌐 [i18n-KO] Translated`tasks/document_question_answering.md` to Korean (#24588)

* docs: ko: `document_question_answering.md`

* fix: resolve suggestions

Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>

* fix: resolve suggestions

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>
Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>

---------

Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>
Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>

* Add multi-label text classification support to pytorch example (#24770)

* Add text classification example

* set the problem type and finetuning task

* ruff reformated

* fix bug for unseting label_to_id for regression

* update README.md

* fixed finetuning task

* update comment

* check if label exists in feature before removing

* add useful logging

* Deprecate unused OpenLlama architecture (#24922)

* Resolve typo in check_repo.py

* Specify encoding when opening modeling files

* Deprecate the OpenLlama architecture

* Add disclaimer pointing to Llama

I'm open to different wordings here

* Match the capitalisation of LLaMA

* replace no_cuda with use_cpu in test_pytorch_examples (#24944)

* replace no_cuda with use_cpu in test_pytorch_examples

* remove codes that never be used

* fix style

* Generate: sequence bias can handle same terminations (#24822)

* Bump pygments from 2.11.2 to 2.15.0 in /examples/research_projects/decision_transformer (#24949)

Bump pygments in /examples/research_projects/decision_transformer

Bumps [pygments](https://github.com/pygments/pygments) from 2.11.2 to 2.15.0.
- [Release notes](https://github.com/pygments/pygments/releases)
- [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES)
- [Commits](https://github.com/pygments/pygments/compare/2.11.2...2.15.0)

---
updated-dependencies:
- dependency-name: pygments
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update processing_vision_text_dual_encoder.py (#24950)

Fixing small typo: kwrags -> kwargs

* Fix `main_input_name` in `src/transformers/keras_callbacks.py` (#24916)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [DOCS] Example for `LogitsProcessor` class (#24848)

* make docs

* fixup

* resolved

* remove debugs

* Revert "fixup"

This reverts commit 5e0f636aae0bf8707bc8bdaa6a9427fbf66834ed.

* prev (ignore)

* fixup broke some files

* remove files

* reverting modeling_reformer

* lang fix

* fix type annotations for arguments in training_args (#24550)

* testing

* example script

* fix typehinting

* some tests

* make test

* optional update

* Union of arguments

* does this fix the issue

* remove reports

* set default to False

* documentation change

* None support

* does not need None

* Fix typing annotations for FSDP and DeepSpeed in TrainingArguments (#24549)

* Fix typing annotations for FSDP and DeepSpeed in TrainingArguments

* Change dict to Dict

* Revert "Fix typing annotations for FSDP and DeepSpeed in TrainingArguments" (#24574)

Revert "Fix typing annotations for FSDP and DeepSpeed in TrainingArguments (#24549)"

This reverts commit c5e29d4381d4b9739e6cb427adbca87fbb43a3ad.

* Fix typing annotations for FSDP and DeepSpeed in TrainingArguments (#24549)

* Fix typing annotations for FSDP and DeepSpeed in TrainingArguments

* Change dict to Dict

* merge

* hacky fix

* fixup

---------

Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Bump aiohttp from 3.8.1 to 3.8.5 in /examples/research_projects/decision_transformer (#24954)

Bump aiohttp in /examples/research_projects/decision_transformer

Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.8.1 to 3.8.5.
- [Release notes](https://github.com/aio-libs/aiohttp/releases)
- [Changelog](https://github.com/aio-libs/aiohttp/blob/v3.8.5/CHANGES.rst)
- [Commits](https://github.com/aio-libs/aiohttp/compare/v3.8.1...v3.8.5)

---
updated-dependencies:
- dependency-name: aiohttp
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [`RWKV`] Add Gradient Checkpointing support for RWKV (#24955)

add GC support for RWKV

* Change logic for logging in the examples (#24956)

Change logic

* Contrastive Search peak memory reduction (#24120)

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fallback for missing attribute `Parameter.ds_numel` (#24942)

* [trainer] fallback for deepspeed param count

* [trainer] more readable numel count

* fix fsdp checkpointing issues (#24926)

* fix fsdp load

* Update trainer.py

* remove saving duplicate state_dict

* fix: cast input pixels to appropriate dtype for image_to_text pipelines (#24947)

* fix: cast input pixels to appropriate dtype for image_to_text tasks

* fix: add casting to pixel inputs of additional models after running copy checks

* 🌐 [i18n-KO] Fixed Korean and English `quicktour.md` (#24664)

* fix: english/korean quicktour.md

* fix: resolve suggestions

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>
Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>
Co-authored-by: Kihoon Son <75935546+kihoon71@users.noreply.github.com>

* fix: follow glossary

* 파인튜닝 -> 미세조정

---------

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>
Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>
Co-authored-by: Kihoon Son <75935546+kihoon71@users.noreply.github.com>

* fsdp fixes and enhancements (#24980)

* fix fsdp prepare to remove the warnings and fix excess memory usage

* Update training_args.py

* parity for FSDP+XLA

* Update trainer.py

* Fix missing spaces in system prompt of Llama2 tokenizer (#24930)

* Update tokenization_llama.py

* Update tokenization_llama_fast.py

* Update src/transformers/models/llama/tokenization_llama_fast.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/llama/tokenization_llama.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/llama/tokenization_llama.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/llama/tokenization_llama_fast.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* [`LlamaConfig`] Nit: pad token should be None by default (#24958)

* pad token should be None by default

* fix tests

* nits

* Remove tokenizers from the doc table (#24963)

* Avoid importing all models when instantiating a pipeline (#24960)

* Avoid importing all models when instantiating a pipeline

* Remove sums that don't work

* Fix type annotation for deepspeed training arg (#24988)

* Use main_input_name for include_inputs_for_metrics (#24993)

* Fix `llama` tokenization doctest (#24990)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [`bnb`] Add simple check for bnb import (#24995)

add simple check for bnb

* [`Llama`] remove persistent  `inv_freq` tensor (#24998)

remove persistent tensor

* improve from_pretrained for zero3 multi gpus mode (#24964)

* improve from_pretrained for zero3 multi gpus mode

* Add check if torch.distributed.is_initialized

* Revert torch.distributed

---------

Co-authored-by: Stas Bekman <stas@stason.org>

* Move template doc file to md (#25004)

* 🌐 [i18n-KO] Updated Korean `serialization.md` (#24686)

fix: update ko/serialization.md

* chatgpt draft

* [check_config_docstrings.py] improve diagnostics (#25012)

* [check_config_docstrings.py] improve diagnostics

* style

* rephrase

* fix

* [`logging.py`] set default `stderr`  path if `None` (#25033)

set default logger

* fix(integrations): store serialized `TrainingArgs` to `wandb.config` without sanitization. (#25035)

fix: store training args to wandb config without sanitization.

Allows resuming runs by reusing the wandb config.

Co-authored-by: Bharat Ramanathan <ramanathan.parameshwaran@gohuddl.com>

* [docs] Performance docs tidy up, part 1  (#23963)

* first pass at the single gpu doc

* overview: improved clarity and navigation

* WIP

* updated intro and deepspeed sections

* improved torch.compile section

* more improvements

* minor improvements

* make style

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* feedback addressed

* mdx -> md

* link fix

* feedback addressed

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Support GatedRepoError + use raise from (#25034)

* Support GatedRepoError + use raise from

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Use token instead of use_auth_token in error messages

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Better handling missing SYS in llama conversation tokenizer (#24997)

* Better handling missing SYS in llama conversation tokenizer

The existing code failed to add SYS if the conversation has history
without SYS, but did modify the passed conversation as it did.

Rearrange the code so modification to the conversation object are taken
into account for token id generation.

* Fix formatting with black

* Avoid one-liners

* Also fix fast tokenizer

* Drop List decl

* 🌐[i18n-KO] Translated performance.md to Korean (#24883)

* dos: ko: performance.md

* feat: chatgpt draft

* fix: manual edits

* fix: manual edits

* Update docs/source/ko/performance.md

Co-authored-by: Kihoon Son <75935546+kihoon71@users.noreply.github.com>

* Update docs/source/ko/performance.md

---------

Co-authored-by: Kihoon Son <75935546+kihoon71@users.noreply.github.com>

* 🌐 [i18n-KO] Translated `testing.md` to Korean (#24900)

* docs: ko: testing.md

* feat: draft

* fix: manual edits

* fix: edit ko/_toctree.yml

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: resolve suggestions

* Add dispatch_batches to training arguments (#25038)

* Dispatch batches

* Copy items

* Fix typo in LlamaTokenizerFast docstring example (#25018)

* Make more test models smaller (#25005)

* Make more test models tiny

* Make more test models tiny

* More models

* More models

* Comment again print statement

* Pvt model (#24720)

* pull and push updates

* add docs

* fix modeling

* Add and run test

* make copies

* add task

* fix tests and fix small issues

* Checks on a Pull Request

* fix docs

* add desc pvt.md

* compute_loss in trainer failing to label shift for PEFT model when label smoothing enabled. (#25044)

* added PeftModelForCausalLM to MODEL_FOR_CAUSAL_LM_MAPPING_NAMES dict

* check for PEFT model in compute_loss section

---------

Co-authored-by: Nathan Brake <nbrake3@mmm.com>

* [`8bit`] Fix 8bit corner case with Blip2 8bit (#25047)

fix 8bit corner case with Blip2 8bit

* 🌐 [i18n-KO] Translated `perf_train_cpu.md` to Korean (#24911)

* dos: ko: perf_train_cpu.md

* feat: chatgpt draft

* fix: manual edits

* fix: resolve suggestions

* fix: manual edits

Co-authored-by: Haewon Kim <ehdvkf02@naver.com>

---------

Co-authored-by: Haewon Kim <ehdvkf02@naver.com>

* Better error message when signal is not supported on OS (#25049)

* Better error message when signal is not supported on OS

* Address review comments

* [`RWKV`] Add note in doc on `RwkvStoppingCriteria` (#25055)

* Add note in doc on `RwkvStoppingCriteria`

* give some breathing space to the code

* Generate - add beam indices output in contrained beam search (#25042)

* [Docs] fix rope_scaling doc string (#25072)

fix rope_scaling doc string

* 🌐 [i18n-KO] Translated `<tf_xla>.md` to Korean (#24904)

* docs: ko: tf_xla.md

* feat: chatgpt draft

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: resolve suggestions

* 🌐 [i18n-KO] Translated `perf_hardware.md` to Korean (#24966)

* docs: ko: perf_hardware.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>

* fix: resolve suggestions

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>

* fix: resolve suggestions

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>

* fix: resolve suggestions

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>

* fix: resolve suggestions

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>

* fix: resolve suggestions

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>

* fix: resolve suggestions

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>

* fix: resolve suggestions

Co-authored-by: Haewon Kim <ehdvkf02@naver.com>

* Fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: fix rendering error of perf_hardware.md

---------

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>
Co-authored-by: Haewon Kim <ehdvkf02@naver.com>

* Fix last models for common tests that are too big. (#25058)

* Fix last models for common tests that are too big.

* Remove print statement

* fix: add TOC anchor link (#25066)

* Set `TF32` flag for PyTorch cuDNN backend (#25075)

* Fix broken link in README_hd.md (#25067)

Update README_hd.md

* replace `per_gpu_eval_batch_size` with `per_device_eval_batch_size` in readme of multiple-choice task (#25078)

replace `per_gpu_eval_batch_size` with `per_device_eval_batch_size`
in readme of multiple-choice

* [`generate`]  Only warn users if the `generation_config`'s `max_length` is set to the default value (#25030)

* check max length is default

* nit

* update warning: no-longer deprecate

* comment in the configuration_utils in case max length's default gets changed in the futur

* 🌐 [i18n-KO] Translated `hpo_train.md` to Korean (#24968)

* dos: ko: hpo_train.mdx

* feat: chatgpt draft

* fix: manual edits

* fix: resolve suggestions

* Fix: repeat per sample for SAM image embeddings (#25074)

Repeat per sample for SAM image embeddings

* [`MPT`] Add MosaicML's `MPT` model to transformers (#24629)

* draft add new model like

* some cleaning of the config

* nits

* add nested configs

* nits

* update

* update

* added layer norms + triton kernels

* consider only LPLayerNorm for now.

* update

* all keys match.

* Update

* fixing nits here and there

* working forward pass.

* removed einops dependency

* nits

* format

* add alibi

* byebye head mask

* refactor attention

* nits.

* format

* fix nits.

* nuke ande updates

* nuke tokenizer test

* don't reshape query with kv heads

* added a bit of documentation.

* remove unneeded things

* nuke more stuff

* nit

* logits match - same generations

* rm unneeded methods

* 1 remaining failing CI test

* nit

* fix nits

* fix docs

* fix docs

* rm tokenizer

* fixup

* fixup

* fixup and fix tests

* fixed configuration object.

* use correct activation

* few minor fixes

* clarify docs a bit

* logits match à 1e-12

* skip and unskip a test

* added some slow tests.

* fix readme

* add more details

* Update docs/source/en/model_doc/mpt.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix configuration issues

* more fixes in config

* added more models

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove unneeded position ids

* fix some  comments

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* revert suggestion

* mpt alibi + added batched generation

* Update src/transformers/models/mpt/__init__.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove init config

* Update src/transformers/models/mpt/configuration_mpt.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix nit

* add another slow test

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fits in one line

* some refactor because make fixup doesn't pass

* add ft notebook

* update md

* correct doc path

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* [DOCS] add example NoBadWordsLogitsProcessor (#25046)

* add example NoBadWordsLogitsProcessor

* fix L764 & L767

* make style

* 🌐 [i18n-KO] Translated `perf_infer_cpu.md` to Korean (#24920)

* docs: ko: perf_infer_cpu.md

* feat: chatgpt draft

* fix: manual edits

* Update docs/source/ko/_toctree.yml

* Update docs/source/ko/perf_infer_cpu.md

* Update docs/source/ko/perf_infer_cpu.md

이 부분은 저도 걸리적거렸던 부분입니다. 반영하겠습니다!

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Update docs/source/ko/perf_infer_cpu.md

동의합니다! 제가 원본에 너무 얽매여 있었네요!

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Update docs/source/ko/perf_infer_cpu.md

말씀하신대로 원문에 너무 집착했던것 같습니다

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Update docs/source/ko/perf_infer_cpu.md

더 나은 어휘 사용에 감사드립니다!

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Update docs/source/ko/perf_infer_cpu.md

이 당시 '주기'란 용어를 생각해내질 못했네요...

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Update docs/source/ko/perf_infer_cpu.md

좀 더 자연스러운 문맥이 됐네요!

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Update docs/source/ko/perf_infer_cpu.md

굳이 원본 형식에 얽매일 필요가 없군요!

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Update docs/source/ko/perf_infer_cpu.md

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

---------

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Allow generic composite models to pass more kwargs (#24927)

* fix

* Update src/transformers/generation/utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* [ `ForSequenceClassification`] Support `left` padding (#24979)

* support left padding

* nit

* Update src/transformers/models/gpt_neox/modeling_gpt_neox.py

* Update src/transformers/models/gpt_neox/modeling_gpt_neox.py

* [`TF`]  Also apply patch to support left padding (#25085)

* tf versions

* apply changes to other models

* 3 models slipped through the cracks

* Edit err message and comment in `test_model_is_small` (#25087)

* Edit err message and comment in

* put back 80M comment

* [ `PreTrainedTokenizerFast`] Keep properties from fast tokenizer (#25053)

* draft solution

* use `setdefault`

* nits

* add tests and fix truncation issue

* fix test

* test passes locally

* quality

* updates

* update tsets

* Hotfix for failing `MusicgenForConditionalGeneration` tests (#25091)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [`T5`, `MT5`, `UMT5`] Add [T5, MT5, UMT5]ForSequenceClassification (#24726)

* Initial addition of t5forsequenceclassification

* Adding imports and adding tests

* Formatting

* Running make fix-copies

* Adding mt5forseq

* Formatting

* run make fix-copies

* Adding to docs

* Add model_parallel

* Fix bug

* Fix

* Remove TODO

* Fixing tests for T5ForSequenceClassification

* Undo changes to dependency_versions_table.py

* Change classification head to work with T5Config directly

* Change seq length to let tests pass

* PR comments for formatting

* Formatting

* Initial addition of UMT5ForSequenceClassification

* Adding to inits and formatting

* run make fix-copies

* Add doc for UMT5ForSeqClass

* Update UMT5 config

* Fix docs

* Skip torch fx test for SequenceClassification

* Formatting

* Add skip to UMT5 tests as well

* Fix umt5 tests

* Running make fix-copies

* PR comments

* Fix for change to sentence_representation

* Rename seq_len to hidden_size since that's what it is

* Use base_model to follow format of the rest of the library

* Update docs

* Extract the decoder_input_ids changes and make one liner

* Make one-liner

* Fix doctest (#25031)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Bump certifi from 2022.12.7 to 2023.7.22 in /examples/research_projects/lxmert (#25096)

Bump certifi in /examples/research_projects/lxmert

Bumps [certifi](https://github.com/certifi/python-certifi) from 2022.12.7 to 2023.7.22.
- [Commits](https://github.com/certifi/python-certifi/compare/2022.12.07...2023.07.22)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump certifi from 2022.12.7 to 2023.7.22 in /examples/research_projects/decision_transformer (#25098)

Bump certifi in /examples/research_projects/decision_transformer

Bumps [certifi](https://github.com/certifi/python-certifi) from 2022.12.7 to 2023.7.22.
- [Commits](https://github.com/certifi/python-certifi/compare/2022.12.07...2023.07.22)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump certifi from 2022.12.7 to 2023.7.22 in /examples/research_projects/visual_bert (#25097)

Bump certifi in /examples/research_projects/visual_bert

Bumps [certifi](https://github.com/certifi/python-certifi) from 2022.12.7 to 2023.7.22.
- [Commits](https://github.com/certifi/python-certifi/compare/2022.12.07...2023.07.22)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix tied_params for meta tensor (#25101)

* fix tied_params for meta tensor

* remove duplicate

* documentation for llama2 models (#25102)

* fix documentation

* changes

* 🌐[i18n-KO] Translated pipeline_webserver.md to Korean (#24828)

* translated pipeline_webserver.md

Co-Authored-By: Hyeonseo Yun <0525yhs@gmail.com>
Co-Authored-By: Wonhyeong Seo <wonhseo@kakao.com>
Co-Authored-By: Sohyun Sim <96299403+sim-so@users.noreply.github.com>
Co-Authored-By: Gabriel Yang <gabrielwithhappy@gmail.com>
Co-Authored-By: Nayeon Han <nayeon2.han@gmail.com>
Co-Authored-By: Jungnerd <46880056+jungnerd@users.noreply.github.com>

* Update pipeline_webserver.md

* Apply suggestions from code review

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>
Co-authored-by: Sangam Lee <74291999+augustinLib@users.noreply.github.com>
Co-authored-by: Kim haewon <ehdvkf02@naver.com>

---------

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>
Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>
Co-authored-by: Gabriel Yang <gabrielwithhappy@gmail.com>
Co-authored-by: Nayeon Han <nayeon2.han@gmail.com>
Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: Sangam Lee <74291999+augustinLib@users.noreply.github.com>
Co-authored-by: Kim haewon <ehdvkf02@naver.com>

* Fix `PvtModelIntegrationTest::test_inference_fp16` (#25106)

update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add descriptive docstring to TemperatureLogitsWarper (#24892)

* Add descriptive docstring to TemperatureLogitsWarper

It addresses https://github.com/huggingface/transformers/issues/24783

* Remove niche features

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Commit suggestion

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Refactor the examples to simpler ones

* Add a missing comma

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Make args description more compact

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Remove extra text after making description more compact

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fix linter

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* fix "UserWarning: Creating a tensor from a list of numpy.ndarrays is … (#24772)

fix "UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor."

Co-authored-by: 刘长伟 <hzliuchw@corp.netease.com>

* update `use_auth_token` -> `token` (#25083)

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix past CI after #24334 (#25113)

update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Move common image processing methods to BaseImageProcessor (#25089)

Move out common methods

* Fix ViT docstring regarding default dropout values. (#25118)

Fix docstring for dropout.

* MaskFormer - enable return_dict in order to compile (#25052)

* Enable return_dict in order to compile

* Update tests

* Move center_crop to BaseImageProcessor (#25122)

* fix deepspeed load best model at end when the model gets sharded (#25057)

* fix delete all checkpoints when save_total_limit is set to 1 (#25136)

* [`T5/LlamaTokenizer`] default legacy to `None` to not always warn (#25131)

default legacy to None

* Clarify 4/8 bit loading log message (#25134)

* clarify 4/8 bit loading log message

* make style

* 🚨🚨🚨Change default from `adamw_hf` to `adamw_torch` 🚨🚨🚨 (#25109)

* Change defaults

* Sylvain's comments

* [`MptConfig`] support from pretrained args (#25116)

* support from pretrained args

* draft addition of tests

* update test

* use parrent assert true

* Update src/transformers/models/mpt/configuration_mpt.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Add offload support to Bark (#25037)

* initial Bark offload proposal

* use hooks instead of manually offloading

* add test of bark offload to cpu feature

* Apply nit suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docstrings of offload

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* remove unecessary set_seed in Bark tests

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* More `token` things (#25146)

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add bloom flax (#25094)

* First commit

* step 1 working

* add alibi

* placeholder for `scan`

* add matrix mult alibi

* beta scaling factor for bmm

* working v1 - simple forward pass

* move layer_number from attribute to arg in call

* partial functioning scan

* hacky working scan

* add more modifs

* add test

* update scan for new kwarg order

* fix position_ids problem

* fix bug in attention layer

* small fix

- do the alibi broadcasting only once

* prelim refactor

* finish refactor

* alibi shifting

* incorporate dropout_add to attention module

* make style

* make padding work again

* update

* remove bogus file

* up

* get generation to work

* clean code a bit

* added small tests

* adding albii test

* make CI tests pass:

- change init weight
- add correct tuple for output attention
- add scan test
- make CI tests work

* fix few nits

* fix nit onnx

* fix onnx nit

* add missing dtype args to nn.Modules

* remove debugging statements

* fix scan generate

* Update modeling_flax_bloom.py

* Update test_modeling_flax_bloom.py

* Update test_modeling_flax_bloom.py

* Update test_modeling_flax_bloom.py

* fix small test issue + make style

* clean up

* Update tests/models/bloom/test_modeling_flax_bloom.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* fix function name

* small fix test

* forward contrib credits from PR17761

* Fix failing test

* fix small typo documentation

* fix non passing test

- remove device from build alibi

* refactor call

- refactor `FlaxBloomBlockCollection` module

* make style

* upcast to fp32

* cleaner way to upcast

* remove unused args

* remove layer number

* fix scan test

* make style

* fix i4 casting

* fix slow test

* Update src/transformers/models/bloom/modeling_flax_bloom.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* remove `layer_past`

* refactor a bit

* fix `scan` slow test

* remove useless import

* major changes

- remove unused code
- refactor a bit
- revert import `torch`

* major refactoring

- change build alibi

* remove scan

* fix tests

* make style

* clean-up alibi

* add integration tests

* up

* fix batch norm conversion

* style

* style

* update pt-fx cross tests

* update copyright

* Update src/transformers/modeling_flax_pytorch_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* per-weight check

* style

* line formats

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <haileyschoelkopf@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add new model in doc table of content (#25148)

* Fix `.push_to_hub` and cleanup `get_full_repo_name` usage (#25120)

* Fix .push_to_hub and cleanup get_full_repo_name usage

* Do not rely on Python bool conversion magic

* request changes

* Add test when downloading from gated repo (#25039)

* override .cuda() to check if model is already quantized (#25166)

* Represent query_length in a different way to solve jit issue (#25164)

Fix jit trace

* make run_generation more generic for other devices (#25133)

* make run_generation more generic for other devices

* use Accelerate to support any device type it supports.

* make style

* fix error usage of accelerator.prepare_model

* use `PartialState` to make sure everything is running on the right device

---------

Co-authored-by: statelesshz <jihuazhong1@huawei.com>

* added compiled model support for inference (#25124)

* added compiled model support for inference

* linter

* Fix tests

* linter

* linter

* remove inference mode from pipelines

* Linter

---------

Co-authored-by: amarkov <alexander@inworld.ai>

* Update `use_auth_token` -> `token` in example scripts (#25167)

* pytorch examples

* tensorflow examples

* flax examples

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [`Mpt`] Fix mpt slow test (#25170)

fix mpt slow test

* [`InstructBlip`] Fix instructblip slow test (#25171)

* fix instruct blip slow test

* Update tests/models/instructblip/test_modeling_instructblip.py

* 🌐 [i18n-KO] Translated `transformers_agents.md` to Korean (#24881)

* docs: ko: transformers_agents.md

* docs: ko: transformers_agents.md

* feat: deepl draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Juntae <79131091+sronger@users.noreply.github.com>
Co-authored-by: Injin Paek <71638597+eenzeenee@users.noreply.github.com>

---------

Co-authored-by: Juntae <79131091+sronger@users.noreply.github.com>
Co-authored-by: Injin Paek <71638597+eenzeenee@users.noreply.github.com>

* Fix beam search to sample at least 1 non eos token (#25103) (#25115)

* [MusicGen] Fix integration tests (#25169)

* move to device

* update with cuda values

* fix fp16

* more rigorous

* 🚨🚨🚨  Fix rescale ViVit Efficientnet (#25174)

* Fix rescaling bug

* Add tests

* Update integration tests

* Fix up

* Update src/transformers/image_transforms.py

* Update test - new possible order in list

* Musicgen: CFG is manually added  (#25173)

* Better error message in `_prepare_output_docstrings` (#25202)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [`PreTrainedModel`] Wrap `cuda` and `to` method correctly (#25206)

wrap `cuda` and `to` method correctly

* Fix `all_model_classes` in `FlaxBloomGenerationTest` (#25211)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [quantization.md] fix (#25190)

Update quantization.md

* [`pipeline`] revisit device check for pipeline (#25207)

* revisit device check for pipeline

* let's raise an error.

* Update tiny model info. and pipeline testing (#25213)

* update tiny_model_summary.json

* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix docker image build failure (#25214)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* make build_mpt_alibi_tensor a method of MptModel so that deepspeed co… (#25193)

make build_mpt_alibi_tensor a method of MptModel so that deepspeed could override it to make autoTP work

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* [`Pix2Struct`] Fix pix2struct cross attention (#25200)

* fix pix2struct cross attention

* fix torchscript slow test

* [`Docs`/`quantization`] Clearer explanation on how things works under the hood. + remove outdated info (#25216)

* clearer explanation on how things works under the hood.

* Update docs/source/en/main_classes/quantization.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/main_classes/quantization.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add `load_in_4bit` in `from_pretrained`

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* [`MPT`] Add  `require_bitsandbytes` on MPT integration tests (#25201)

* add  `require_bitsandbytes` on MPT integration tests

* add it on mpt as well

* [`Detr`] Fix detr BatchNorm replacement issue (#25230)

* fix detr weird issue

* Update src/transformers/models/conditional_detr/modeling_conditional_detr.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix copies

* fix copies

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Move rescale dtype recasting to match torchvision ToTensor (#25229)

Move dtype recasting to match torchvision ToTensor

* Fix set of model parallel in the Trainer when no GPUs are available (#25239)

* fix get_keys_to_not_convert() to return correct modules for full precision inference (#25105)

* add test for `get_keys_to_not_convert`

* add minimum patch to keep mpt lm_head from 8bit quantization

* add reivsion to

* add pathname and line number to logging formatter in debug mode (#25203)

* add pathname and lineno to logging formatter in debug mode

* use TRANSFORMERS_VERBOSITY="detail" to print pathname and lineno

* Add `token` arugment in example scripts (#25172)

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* resolving zero3 init when using accelerate config with Trainer (#25227)

* resolving zero3 init when using accelerate config with Trainer

* refactor

* fix

* fix import

* Update rescale tests - cast to float after rescaling to reflect #25229 (#25259)

Rescale tests - cast to float after rescaling to reflect #25229

* Fix some bugs for two stage training of deformable detr (#25045)

* Update modeling_deformable_detr.py

Fix bugs for two stage training

* Update modeling_deformable_detr.py

* Add test_two_stage_training to DeformableDetrModelTest

---------

Co-authored-by: yupeng.jia <yupeng.jia@momenta.ai>

* [DOCS] Add example and modified docs of EtaLogitsWarper (#25125)

* added example and modified docs for EtaLogitsWarper

* make style

* fixed styling issue on 544

* removed error info and added set_seed

* Update src/transformers/generation/logits_process.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/generation/logits_process.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* updated the results

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix return_dict_in_generate bug in InstructBlip generate function (#25246)

Fix bug in InstructBlip generate function

Previously, the postprocessing conducted on generated sequences in InstructBlip's generate function assumed these sequences were tensors (i.e. that `return_dict_in_generate == False`).

This commit checks whether the result of the call to the wrapped language model `generate()` is a tensor, and if not attempts to postprocess the sequence attribute of the returned results object.

* Remove `pytest_options={"rA": None}` in CI (#25263)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* 🌐 [i18n-KO] Translated `perf_infer_gpu_many.md` to Korean (#24943)

* doc: ko: perf_infer_gpu_many.mdx

* feat: chatgpt draft

* fix: manual edits

* Update docs/source/ko/perf_infer_gpu_many.md

Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

---------

Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

* recommend DeepSpeed's Argument Parsing documentation (#25268)

* [MMS] Fix mms (#25267)

* [MMS] Fix mms

* [MMS] Fix mms

* fix mms loading

* Apply suggestions from code review

* make style

* Update tests/models/wav2vec2/test_modeling_wav2vec2.py

* CI with `num_hidden_layers=2` 🚀🚀🚀 (#25266)

* CI with layers=2

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* CI with `pytest_num_workers=8` for torch/tf jobs (#25274)

n8

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Docs: Update list of `report_to` logging integrations in docstring (#25281)

* Update list of logging integrations in docstring

Also update type hint

* Also add 'flyte' to report_to callback list

* Revert 'report_to' type hint update

Due to CLI breaking

* Update InstructBLIP & Align values after rescale update (#25209)

* Update InstructBLIP values
Note: the tests are not independent. Running the test independentely produces different logits compared to running all the integration tests

* Update test values after rescale update

* Remove left over commented out code

* Revert to previous rescaling logic

* Update rescale tests

* Docs: separate generate section (#25235)

Separate generate doc section

* Update bark doc (#25234)

* add mention to optimization in Bark docs

* add offload mention in docs

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update bark docs.

* Update bark.md

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* add generate method to SpeechT5ForTextToSpeech (#25233)

* add generate method to SpeechT5ForTextToSpeech

* update speecht5forTTS docstrings

* Remove defaults to None in generate docstrings

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add timeout parameter to load_image function (#25184)

* Add timeout parameter to load_image function.

* Remove line.

* Reformat code

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Add parameter to docs.

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* [JAX] Bump min version (#25286)

* [JAX] Bump min version

* make fixup

* [small] llama2.md typo (#25295)

`groupe` -> `grouped`

* Fix typo: Roberta -> RoBERTa (#25302)

* Move usage of deprecated logging.warn to logging.warning (#25310)

The former spelling is deprecated and has been discouraged for a
while. The latter spelling seems to be more common in this project
anyway, so this change ought to be safe.

Fixes https://github.com/huggingface/transformers/issues/25283

* Give more memory in test_disk_offload (#25315)

* Generate: get generation mode as an enum (#25292)

* Add offline mode for agents (#25226)

* Add offline mode for agents

* Disable second check too

* Deal with nested configs better in base class (#25237)

* Deal better with nested configs

* Fixes

* More fixes

* Fix last test

* Clean up existing configs

* Remove hack in MPT Config

* Update src/transformers/configuration_utils.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Fix setting a nested config via dict in the kwargs

* Adapt common test

* Add test for nested config load with dict

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Document check copies (#25291)

* Document check copies better and add tests

* Include header in check for copies

* Manual fixes

* Try autofix

* Fixes

* Clean tests

* Finalize doc

* Remove debug print

* More fixes

* Make `bark` could have tiny model (#25290)

* temp

* update

* update

* update

* small dim

* small dim

* small dim

* fix

* update

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Document toc check and doctest check scripts (#25319)

* Clean doc toc check and make doctest list better

* Add to Makefile

* [Whisper] Better error message for outdated generation config (#25298)

* Remove jnp.DeviceArray since it is deprecated. (#24875)

* Remove jnp.DeviceArray since it is deprecated.

* Replace all instances of jnp.DeviceArray with jax.Array

* Update src/transformers/models/bert/modeling_flax_bert.py

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* add CFG for .generate() (#24654)

* 🌐 [i18n-KO] Translated `perf_infer_gpu_one.md` to Korean (#24978)

* docs: ko: perf_infer_gpu_one

* feat: chatgpt draft

* fix: manual edits

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>
Co-authored-by: TaeYupNoh <107118671+TaeYupNoh@users.noreply.github.com>

* fix: resolve suggestions

* fix: resolve suggestions

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------

Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>
Co-authored-by: TaeYupNoh <107118671+TaeYupNoh@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update TF pin in docker image (#25343)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Generalize CFG to allow for positive prompts (#25339)

* Generalize CFG to allow for positive prompts

* Add documentation, fix the correct class

* Loosen output shape restrictions on GPT-style models (#25188)

* Loosen output shape restrictions on GPT-style models

* Use more self-explanatory variables

* Revert "Use more self-explanatory variables"

This reverts commit 5fd9ab39119558b7e750f61aa4a19014dccc5ed5.

* Allow `trust_remote_code` in example scripts (#25248)

* pytorch examples

* pytorch mim no trainer

* cookiecutter

* flax examples

* missed line in pytorch run_glue

* tensorflow examples

* tensorflow run_clip

* tensorflow run_mlm

* tensorflow run_ner

* tensorflow run_clm

* pytorch example from_configs

* pytorch no trainer examples

* Revert "tensorflow run_clip"

This reverts commit 261f86ac1f1c9e05dd3fd0291e1a1f8e573781d5.

* fix: duplicated argument

* Generate: remove Marian hack (#25294)

Remove Marian hack

* Fix more offload edge cases (#25342)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Migrate Trainer from `Repository` to `upload_folder` (#25095)

* First draft

* Deal with progress bars

* Update src/transformers/utils/hub.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Address review comments

* Forgot one

* Pin hf_hub

* Add argument for push all and fix tests

* Fix tests

* Address review comments

---------

Co-authored-by: Lucain <lucainp@gmail.com>

* Adding more information in help parser on train_file and validation_file (#25324)

chorse: adding new doc on train and val

* [DOCS] Add `NoRepeatNGramLogitsProcessor` Example for `LogitsProcessor` class (#25186)

* Add Description And Example to Docstring

* make style corrections

* make style

* Doc Style Consistent With HF

* Apply make style

* Modify Docstring

* Edit Type in Docstring

* Feedback Incorporated

* Edit Docstring

* make style

* Post Review Changes

* Review Feedback Incorporated

* Styling

* Formatting

* make style

* pep8

* Docs: Added benchmarks for `torch.compile()` for vision models (#24748)

* added benchmarks for compile

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* added more models

* added more models fr

* added visualizations

* minor fix

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Added links to models and put charts side by side

* Added batch comparisons

* Added more comparisons

* Fix table

* Added link to wheel

* Update perf_torch_compile.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Add mask2former fp16 support (#25093)

* Add mask2former fp16 support

* Clear consistency/quality issues

* Fix consistency/quality (2)

* Add integration test for mask2former (fp16 case)

* Fix code quality

* Add integration test for maskformer (fp16 case)

* Add integration test for oneformer (fp16 case)

* Remove slow decorator from fp16 tests

* Fix lint

* Remove usage of full inference and value checks for fp16

* Temporarily comment slow for {mask, mask2, one}former

* Add fp16 support to oneformer

* Revert "Temporarily comment slow for {mask, mask2, one}former"

This reverts commit e5371edabd301cf56079def0421a0a87df307cb0.

* Remove dtype conversion noop

* [DOCS] Add descriptive docstring to MinNewTokensLength (#25196)

* Add descriptive docstring to MinNewTokensLength

It addresses https://github.com/huggingface/transformers/issues/24783

* Refine the differences between `min_length` and `min_new_tokens`

* Remove extra line

* Remove extra arguments in generate

* Add a missing space

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Run the linter

* Add clarification comments

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Register ModelOutput subclasses as supported torch.utils._pytree nodes (#25358)

* Register ModelOutput subclasses as supported torch.utils._pytree nodes

Fixes #25357 where DDP with static_graph=True does not sync gradients when calling backward() over tensors contained in ModelOutput subclasses

* Add test for torch pytree ModelOutput serialization and deserialization

* Fix `test_model_parallelism` (#25359)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add warning for missing attention mask when pad tokens are detected (#25345)

* Add attention mask and pad token warning to many of the models

* Remove changes under examples/research_projects

These files are not maintained by HG.

* Skip the warning check during torch.fx or JIT tracing

* Switch ordering for the warning and input shape assignment

This ordering is a little cleaner for some of the cases.

* Add missing line break in one of the files

* [ASR Pipeline] Clarify return timestamps (#25344)

* [ASR Pipeline] Clarify return timestamps

* fix indentation

* fix ctc check

* fix ctc error message!

* fix test

* fix other test

* add new tests

* final comment

* MaskFormer, Mask2Former - replace einsum for tracing (#25297)

* Replace einsum with ops for tracing

* Fix comment

* Load state in else (#25318)

* Load else

* New approach

* Propagate

* Fix `token` in example template (#25351)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Enable tests to run on third-party devcies (#25327)

* enable unit tests to run on third-party devcies other than CUDA and CPU.

* remove the modification that enabled ut on MPS

* control test on third-party device by env variable

* update

---------

Co-authored-by: statelesshz <jihuazhong1@huawei.com>

* 🌐 [i18n-KO] Translated `add_tensorflow_model.md` to Korean (#25017)

* docs: ko: add_tensorflow_model.md

* feat: chatgpt draft

* fix: manual edits

* fix: manual edits

* fix: resolve suggestions

* fix: manual edits

* Fix `torch_job` worker(s) crashing (#25374)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Generate: add config-level validation (#25381)

* Fix missing usage of `token` (#25382)

* add missing tokens

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Use small config for `OneFormerModelTest.test_model_with_labels` (#25383)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add copied from for image processor methods (#25121)

* Add copied from statements for image processors

* Move out rescale and normalize to base image processor

* Remove rescale and normalize from vit (post rebase)

* Update docstrings and tidy up

* PR comments

* change version (#25387)

* [DOCS] Add example for `TopPLogitsWarper`  (#25361)

* [DOCS] Add example for `TopPLogitsWarper`

* fix typo

* address review feedback

* address review nits

* 🌐 [i18n-KO] Translated `perf_train_cpu_many.md` to Korean (#24923)

* docs: ko: perf_train_cpu_many.md

* feat: chatgpt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

---------

Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

* 16059 - Add missing type hints for ASTModel (#25364)

* 16059 - Add missing type hints for ASTModel

* Add an additional type hint

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* rm useless condition since the previous condition contains it. (#25403)

* Fix path for dynamic module creation (#25402)

* YOLOS - Revert default return_pixel_mask value (#25404)

Revert default return_pixel_mask value

* Docs: introduction to generation with LLMs (#25240)

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Generate: length validation (#25384)

* Improve training args (#25401)

* enhanced tips for some training args

* make style

* Generate: generation config validation fixes in docs (#25405)

* 16059 - Add extra type hints for AltCLIPModel (#25399)

* Generate: lower severity of parameterization checks (#25407)

* VQA task guide (#25244)

* initial commit

* semi-finished task guide draft

* image link

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/tasks/visual_question_answering.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* feedback addressed

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* nits addressed

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* 🌐 [i18n-KO] Translated `add_new_model.md` to Korean (#24957)

* docs: ko: add_new_model.md

* feat: chatgpt draft

* fix: manual edits

* fix: change document title

* fix: edit with reviewers

Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

* fix: edit with reviewers

Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

* fix: edit with reviewers

Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

* fix: edit with reviewers

Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

* fix: edit with reviewers

Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com>

* fix: edit with reviewers

Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com>

* fix: edit with reviewers

Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com>

* fix: edit with reviewers

Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

* fix: add anchor to header

* Update docs/source/ko/add_new_model.md

Co-authored-by: 이서정 <97655267+sjlee-wise@users.noreply.github.com>

* Update docs/source/ko/add_new_model.md

Co-authored-by: 이서정 <97655267+sjlee-wise@users.noreply.github.com>

* Update docs/source/ko/add_new_model.md

Co-authored-by: 이서정 <97655267+sjlee-wise@users.noreply.github.com>

* fix: edit with reviews

* feat: edit toctree

---------

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com>
Co-authored-by: 이서정 <97655267+sjlee-wise@users.noreply.github.com>

* 🌐 [i18n-KO] Translated `model_summary.md` to Korean (#24625)

* docs: ko: model_summary.md

* feat: nmt and manual edit model_summary.mdx

* fix: resolve suggestions

Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>
Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* fix: resolve suggestions2

Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>

---------

Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>
Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Update Bark generation configs and tests (#25409)

* update bark generation configs for more coherent parameter

* make style

* update bark hub repo

* aligned sample_beam output selection with beam_search (#25375)

* aligned sample_beam specs with beam_search

* pull origin main

* Revert "pull origin main"

This reverts commit 06d356f1137bb52272e120a03636598c44449cf3.

* update test_utils.py

* fix format

* remove comment

---------

Co-authored-by: Shogo Fujita <shogo.fujita@legalontech.jp>

* Enable passing number of channels when inferring data format (#25412)

* Bark: flexible generation config overload (#25414)

* [DINOv2] Update pooler output (#25392)

Update pooler output

* 🌐 [i18n-KO] Translated `philosophy.md` to Korean (#25010)

* docs: ko: philosophy.md

* feat: chatgpt draft

* fix: manual edits

* fix: resolve suggestions

* Doc checks (#25408)

* Document check_dummies

* Type hints and doc in other files

* Document check inits

* Add documentation to

* Address review comments

* Generation: strict generation config validation at save time (#25411)

* strict gen config save; Add tests

* add note that the warning will be an exception in v4.34

* [WavLM] Fix Arxiv link and authors (#25415)

* [WavLM] Fix Arxiv link and authors

* make style

* Generate: Load generation config when `device_map` is passed (#25413)

* Fix rendering for `torch.compile()` docs (#25432)

fix rendering

* Add `examples`  to tests to run when `setup.py` is modified (#25437)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix issue with ratio evaluation steps and auto find batch size (#25436)

* Fully rebased solution

* 500

* docs: add LLaMA-Efficient-Tuning to awesome-transformers (#25441)

Co-authored-by: statelesshz <jihuazhong1@huawei.com>

* GPTQ integration (#25062)

* GTPQ integration

* Add tests for gptq

* support for more quantization model

* fix style

* typo

* fix method

* Update src/transformers/modeling_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add dataclass and fix quantization_method

* fix doc

* Update tests/quantization/gptq/test_gptq.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* modify dataclass

* add gtpqconfig import

* fix typo

* fix tests

* remove dataset as req arg

* remove tokenizer import

* add offload cpu quantization test

* fix check dataset

* modify dockerfile

* protect trainer

* style

* test for config

* add more log

* overwrite torch_dtype

* draft doc

* modify quantization_config docstring

* fix class name in docstring

* Apply suggestions from code review

Co-authored-by: Y…
blbadger pushed a commit to blbadger/transformers that referenced this pull request Nov 8, 2023
* improve from_pretrained for zero3 multi gpus mode

* Add check if torch.distributed.is_initialized

* Revert torch.distributed

---------

Co-authored-by: Stas Bekman <stas@stason.org>
zachares added a commit to nplan-io/transformers that referenced this pull request Nov 17, 2023
…xt2graph) (#8)

* [`Llama2`]  Add support for Llama 2 (#24891)

* add llama

* add other readmes

* update padding id in readme

* add link to paper

* fix paths and tokenizer

* more nits

* styling

* fit operation in 2 lines when possible

* nits

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add form

* update reademe

* update readme, we don't have a default pad token

* update test and tokenization

* LLaMA instead of Llama

* nits

* add expected text

* add greeedy output

* styling

* Update src/transformers/models/llama/modeling_llama.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* sequential device map

* skip relevant changes

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Disable ipex env var if false (#24885)

Disable ipex if in use

* Check for accelerate env var when doing CPU only (#24890)

Check for use-cpu

* Avoid some pipeline tasks to use `use_cache=True` (#24893)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Update tested versions in READMEs (#24895)

* Update supported Python and PyTorch versions in readme

* Update Python, etc. versions in non-English readmes

These were more out of date than in the English readme. This
updates all the versions the readmes claim the repository is tested
with to the same versions stated in the English readme.

Those versions are current at least in the case of the Python and
PyTorch versions (and less out of date for the others).

* Propagate trailing whitespace fix to model list

This runs "make fix-copies". The only change is the removal of
whitespace. No actual information or wording is changed.

* Update tested TensorFlow to 2.6 in all readmes

Per pinning in setup.py

Unlike Python and PyTorch, the minimum supported TensorFlow version
has not very recently changed, but old versions were listed in all
READMEs.

* Fix `test_model_parallelism` for `FalconModel` (#24914)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fixed issue where ACCELERATE_USE_CPU="False" results in bool(True) (#24907)

- This results in cpu mode on Apple Silicon mps

* fix typo in BARK_PRETRAINED_MODEL_ARCHIVE_LIST (#24902)

fix typo in BARK_PRETRAINED_MODEL_ARCHIVE_LIST

suno/barh should be suno/bark

* Fix minor llama2.md model doc typos (#24909)

Update llama2.md

 Fix typos in the llama2 model doc

* [`Llama2`] replace `self.pretraining_tp` with `self.config.pretraining_tp` (#24906)

* add possibility to disable TP

* fixup

* adapt from offline discussions

* [doc] `image_processing_vilt.py` wrong default documented (#24931)

[doc] image_processing_vilt.py wrong default

* 🌐 [i18n-KO] Translated`tasks/document_question_answering.md` to Korean (#24588)

* docs: ko: `document_question_answering.md`

* fix: resolve suggestions

Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>

* fix: resolve suggestions

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>
Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>

---------

Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>
Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>

* Add multi-label text classification support to pytorch example (#24770)

* Add text classification example

* set the problem type and finetuning task

* ruff reformated

* fix bug for unseting label_to_id for regression

* update README.md

* fixed finetuning task

* update comment

* check if label exists in feature before removing

* add useful logging

* Deprecate unused OpenLlama architecture (#24922)

* Resolve typo in check_repo.py

* Specify encoding when opening modeling files

* Deprecate the OpenLlama architecture

* Add disclaimer pointing to Llama

I'm open to different wordings here

* Match the capitalisation of LLaMA

* replace no_cuda with use_cpu in test_pytorch_examples (#24944)

* replace no_cuda with use_cpu in test_pytorch_examples

* remove codes that never be used

* fix style

* Generate: sequence bias can handle same terminations (#24822)

* Bump pygments from 2.11.2 to 2.15.0 in /examples/research_projects/decision_transformer (#24949)

Bump pygments in /examples/research_projects/decision_transformer

Bumps [pygments](https://github.com/pygments/pygments) from 2.11.2 to 2.15.0.
- [Release notes](https://github.com/pygments/pygments/releases)
- [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES)
- [Commits](https://github.com/pygments/pygments/compare/2.11.2...2.15.0)

---
updated-dependencies:
- dependency-name: pygments
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update processing_vision_text_dual_encoder.py (#24950)

Fixing small typo: kwrags -> kwargs

* Fix `main_input_name` in `src/transformers/keras_callbacks.py` (#24916)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [DOCS] Example for `LogitsProcessor` class (#24848)

* make docs

* fixup

* resolved

* remove debugs

* Revert "fixup"

This reverts commit 5e0f636aae0bf8707bc8bdaa6a9427fbf66834ed.

* prev (ignore)

* fixup broke some files

* remove files

* reverting modeling_reformer

* lang fix

* fix type annotations for arguments in training_args (#24550)

* testing

* example script

* fix typehinting

* some tests

* make test

* optional update

* Union of arguments

* does this fix the issue

* remove reports

* set default to False

* documentation change

* None support

* does not need None

* Fix typing annotations for FSDP and DeepSpeed in TrainingArguments (#24549)

* Fix typing annotations for FSDP and DeepSpeed in TrainingArguments

* Change dict to Dict

* Revert "Fix typing annotations for FSDP and DeepSpeed in TrainingArguments" (#24574)

Revert "Fix typing annotations for FSDP and DeepSpeed in TrainingArguments (#24549)"

This reverts commit c5e29d4381d4b9739e6cb427adbca87fbb43a3ad.

* Fix typing annotations for FSDP and DeepSpeed in TrainingArguments (#24549)

* Fix typing annotations for FSDP and DeepSpeed in TrainingArguments

* Change dict to Dict

* merge

* hacky fix

* fixup

---------

Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Bump aiohttp from 3.8.1 to 3.8.5 in /examples/research_projects/decision_transformer (#24954)

Bump aiohttp in /examples/research_projects/decision_transformer

Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.8.1 to 3.8.5.
- [Release notes](https://github.com/aio-libs/aiohttp/releases)
- [Changelog](https://github.com/aio-libs/aiohttp/blob/v3.8.5/CHANGES.rst)
- [Commits](https://github.com/aio-libs/aiohttp/compare/v3.8.1...v3.8.5)

---
updated-dependencies:
- dependency-name: aiohttp
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [`RWKV`] Add Gradient Checkpointing support for RWKV (#24955)

add GC support for RWKV

* Change logic for logging in the examples (#24956)

Change logic

* Contrastive Search peak memory reduction (#24120)

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fallback for missing attribute `Parameter.ds_numel` (#24942)

* [trainer] fallback for deepspeed param count

* [trainer] more readable numel count

* fix fsdp checkpointing issues (#24926)

* fix fsdp load

* Update trainer.py

* remove saving duplicate state_dict

* fix: cast input pixels to appropriate dtype for image_to_text pipelines (#24947)

* fix: cast input pixels to appropriate dtype for image_to_text tasks

* fix: add casting to pixel inputs of additional models after running copy checks

* 🌐 [i18n-KO] Fixed Korean and English `quicktour.md` (#24664)

* fix: english/korean quicktour.md

* fix: resolve suggestions

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>
Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>
Co-authored-by: Kihoon Son <75935546+kihoon71@users.noreply.github.com>

* fix: follow glossary

* 파인튜닝 -> 미세조정

---------

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>
Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>
Co-authored-by: Kihoon Son <75935546+kihoon71@users.noreply.github.com>

* fsdp fixes and enhancements (#24980)

* fix fsdp prepare to remove the warnings and fix excess memory usage

* Update training_args.py

* parity for FSDP+XLA

* Update trainer.py

* Fix missing spaces in system prompt of Llama2 tokenizer (#24930)

* Update tokenization_llama.py

* Update tokenization_llama_fast.py

* Update src/transformers/models/llama/tokenization_llama_fast.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/llama/tokenization_llama.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/llama/tokenization_llama.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/llama/tokenization_llama_fast.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* [`LlamaConfig`] Nit: pad token should be None by default (#24958)

* pad token should be None by default

* fix tests

* nits

* Remove tokenizers from the doc table (#24963)

* Avoid importing all models when instantiating a pipeline (#24960)

* Avoid importing all models when instantiating a pipeline

* Remove sums that don't work

* Fix type annotation for deepspeed training arg (#24988)

* Use main_input_name for include_inputs_for_metrics (#24993)

* Fix `llama` tokenization doctest (#24990)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [`bnb`] Add simple check for bnb import (#24995)

add simple check for bnb

* [`Llama`] remove persistent  `inv_freq` tensor (#24998)

remove persistent tensor

* improve from_pretrained for zero3 multi gpus mode (#24964)

* improve from_pretrained for zero3 multi gpus mode

* Add check if torch.distributed.is_initialized

* Revert torch.distributed

---------

Co-authored-by: Stas Bekman <stas@stason.org>

* Move template doc file to md (#25004)

* 🌐 [i18n-KO] Updated Korean `serialization.md` (#24686)

fix: update ko/serialization.md

* chatgpt draft

* [check_config_docstrings.py] improve diagnostics (#25012)

* [check_config_docstrings.py] improve diagnostics

* style

* rephrase

* fix

* [`logging.py`] set default `stderr`  path if `None` (#25033)

set default logger

* fix(integrations): store serialized `TrainingArgs` to `wandb.config` without sanitization. (#25035)

fix: store training args to wandb config without sanitization.

Allows resuming runs by reusing the wandb config.

Co-authored-by: Bharat Ramanathan <ramanathan.parameshwaran@gohuddl.com>

* [docs] Performance docs tidy up, part 1  (#23963)

* first pass at the single gpu doc

* overview: improved clarity and navigation

* WIP

* updated intro and deepspeed sections

* improved torch.compile section

* more improvements

* minor improvements

* make style

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* feedback addressed

* mdx -> md

* link fix

* feedback addressed

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Support GatedRepoError + use raise from (#25034)

* Support GatedRepoError + use raise from

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Use token instead of use_auth_token in error messages

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Better handling missing SYS in llama conversation tokenizer (#24997)

* Better handling missing SYS in llama conversation tokenizer

The existing code failed to add SYS if the conversation has history
without SYS, but did modify the passed conversation as it did.

Rearrange the code so modification to the conversation object are taken
into account for token id generation.

* Fix formatting with black

* Avoid one-liners

* Also fix fast tokenizer

* Drop List decl

* 🌐[i18n-KO] Translated performance.md to Korean (#24883)

* dos: ko: performance.md

* feat: chatgpt draft

* fix: manual edits

* fix: manual edits

* Update docs/source/ko/performance.md

Co-authored-by: Kihoon Son <75935546+kihoon71@users.noreply.github.com>

* Update docs/source/ko/performance.md

---------

Co-authored-by: Kihoon Son <75935546+kihoon71@users.noreply.github.com>

* 🌐 [i18n-KO] Translated `testing.md` to Korean (#24900)

* docs: ko: testing.md

* feat: draft

* fix: manual edits

* fix: edit ko/_toctree.yml

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: resolve suggestions

* Add dispatch_batches to training arguments (#25038)

* Dispatch batches

* Copy items

* Fix typo in LlamaTokenizerFast docstring example (#25018)

* Make more test models smaller (#25005)

* Make more test models tiny

* Make more test models tiny

* More models

* More models

* Comment again print statement

* Pvt model (#24720)

* pull and push updates

* add docs

* fix modeling

* Add and run test

* make copies

* add task

* fix tests and fix small issues

* Checks on a Pull Request

* fix docs

* add desc pvt.md

* compute_loss in trainer failing to label shift for PEFT model when label smoothing enabled. (#25044)

* added PeftModelForCausalLM to MODEL_FOR_CAUSAL_LM_MAPPING_NAMES dict

* check for PEFT model in compute_loss section

---------

Co-authored-by: Nathan Brake <nbrake3@mmm.com>

* [`8bit`] Fix 8bit corner case with Blip2 8bit (#25047)

fix 8bit corner case with Blip2 8bit

* 🌐 [i18n-KO] Translated `perf_train_cpu.md` to Korean (#24911)

* dos: ko: perf_train_cpu.md

* feat: chatgpt draft

* fix: manual edits

* fix: resolve suggestions

* fix: manual edits

Co-authored-by: Haewon Kim <ehdvkf02@naver.com>

---------

Co-authored-by: Haewon Kim <ehdvkf02@naver.com>

* Better error message when signal is not supported on OS (#25049)

* Better error message when signal is not supported on OS

* Address review comments

* [`RWKV`] Add note in doc on `RwkvStoppingCriteria` (#25055)

* Add note in doc on `RwkvStoppingCriteria`

* give some breathing space to the code

* Generate - add beam indices output in contrained beam search (#25042)

* [Docs] fix rope_scaling doc string (#25072)

fix rope_scaling doc string

* 🌐 [i18n-KO] Translated `<tf_xla>.md` to Korean (#24904)

* docs: ko: tf_xla.md

* feat: chatgpt draft

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: resolve suggestions

* 🌐 [i18n-KO] Translated `perf_hardware.md` to Korean (#24966)

* docs: ko: perf_hardware.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>

* fix: resolve suggestions

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>

* fix: resolve suggestions

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>

* fix: resolve suggestions

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>

* fix: resolve suggestions

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>

* fix: resolve suggestions

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>

* fix: resolve suggestions

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>

* fix: resolve suggestions

Co-authored-by: Haewon Kim <ehdvkf02@naver.com>

* Fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: fix rendering error of perf_hardware.md

---------

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>
Co-authored-by: Haewon Kim <ehdvkf02@naver.com>

* Fix last models for common tests that are too big. (#25058)

* Fix last models for common tests that are too big.

* Remove print statement

* fix: add TOC anchor link (#25066)

* Set `TF32` flag for PyTorch cuDNN backend (#25075)

* Fix broken link in README_hd.md (#25067)

Update README_hd.md

* replace `per_gpu_eval_batch_size` with `per_device_eval_batch_size` in readme of multiple-choice task (#25078)

replace `per_gpu_eval_batch_size` with `per_device_eval_batch_size`
in readme of multiple-choice

* [`generate`]  Only warn users if the `generation_config`'s `max_length` is set to the default value (#25030)

* check max length is default

* nit

* update warning: no-longer deprecate

* comment in the configuration_utils in case max length's default gets changed in the futur

* 🌐 [i18n-KO] Translated `hpo_train.md` to Korean (#24968)

* dos: ko: hpo_train.mdx

* feat: chatgpt draft

* fix: manual edits

* fix: resolve suggestions

* Fix: repeat per sample for SAM image embeddings (#25074)

Repeat per sample for SAM image embeddings

* [`MPT`] Add MosaicML's `MPT` model to transformers (#24629)

* draft add new model like

* some cleaning of the config

* nits

* add nested configs

* nits

* update

* update

* added layer norms + triton kernels

* consider only LPLayerNorm for now.

* update

* all keys match.

* Update

* fixing nits here and there

* working forward pass.

* removed einops dependency

* nits

* format

* add alibi

* byebye head mask

* refactor attention

* nits.

* format

* fix nits.

* nuke ande updates

* nuke tokenizer test

* don't reshape query with kv heads

* added a bit of documentation.

* remove unneeded things

* nuke more stuff

* nit

* logits match - same generations

* rm unneeded methods

* 1 remaining failing CI test

* nit

* fix nits

* fix docs

* fix docs

* rm tokenizer

* fixup

* fixup

* fixup and fix tests

* fixed configuration object.

* use correct activation

* few minor fixes

* clarify docs a bit

* logits match à 1e-12

* skip and unskip a test

* added some slow tests.

* fix readme

* add more details

* Update docs/source/en/model_doc/mpt.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix configuration issues

* more fixes in config

* added more models

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove unneeded position ids

* fix some  comments

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* revert suggestion

* mpt alibi + added batched generation

* Update src/transformers/models/mpt/__init__.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove init config

* Update src/transformers/models/mpt/configuration_mpt.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix nit

* add another slow test

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fits in one line

* some refactor because make fixup doesn't pass

* add ft notebook

* update md

* correct doc path

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* [DOCS] add example NoBadWordsLogitsProcessor (#25046)

* add example NoBadWordsLogitsProcessor

* fix L764 & L767

* make style

* 🌐 [i18n-KO] Translated `perf_infer_cpu.md` to Korean (#24920)

* docs: ko: perf_infer_cpu.md

* feat: chatgpt draft

* fix: manual edits

* Update docs/source/ko/_toctree.yml

* Update docs/source/ko/perf_infer_cpu.md

* Update docs/source/ko/perf_infer_cpu.md

이 부분은 저도 걸리적거렸던 부분입니다. 반영하겠습니다!

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Update docs/source/ko/perf_infer_cpu.md

동의합니다! 제가 원본에 너무 얽매여 있었네요!

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Update docs/source/ko/perf_infer_cpu.md

말씀하신대로 원문에 너무 집착했던것 같습니다

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Update docs/source/ko/perf_infer_cpu.md

더 나은 어휘 사용에 감사드립니다!

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Update docs/source/ko/perf_infer_cpu.md

이 당시 '주기'란 용어를 생각해내질 못했네요...

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Update docs/source/ko/perf_infer_cpu.md

좀 더 자연스러운 문맥이 됐네요!

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Update docs/source/ko/perf_infer_cpu.md

굳이 원본 형식에 얽매일 필요가 없군요!

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Update docs/source/ko/perf_infer_cpu.md

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

---------

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Allow generic composite models to pass more kwargs (#24927)

* fix

* Update src/transformers/generation/utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* [ `ForSequenceClassification`] Support `left` padding (#24979)

* support left padding

* nit

* Update src/transformers/models/gpt_neox/modeling_gpt_neox.py

* Update src/transformers/models/gpt_neox/modeling_gpt_neox.py

* [`TF`]  Also apply patch to support left padding (#25085)

* tf versions

* apply changes to other models

* 3 models slipped through the cracks

* Edit err message and comment in `test_model_is_small` (#25087)

* Edit err message and comment in

* put back 80M comment

* [ `PreTrainedTokenizerFast`] Keep properties from fast tokenizer (#25053)

* draft solution

* use `setdefault`

* nits

* add tests and fix truncation issue

* fix test

* test passes locally

* quality

* updates

* update tsets

* Hotfix for failing `MusicgenForConditionalGeneration` tests (#25091)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [`T5`, `MT5`, `UMT5`] Add [T5, MT5, UMT5]ForSequenceClassification (#24726)

* Initial addition of t5forsequenceclassification

* Adding imports and adding tests

* Formatting

* Running make fix-copies

* Adding mt5forseq

* Formatting

* run make fix-copies

* Adding to docs

* Add model_parallel

* Fix bug

* Fix

* Remove TODO

* Fixing tests for T5ForSequenceClassification

* Undo changes to dependency_versions_table.py

* Change classification head to work with T5Config directly

* Change seq length to let tests pass

* PR comments for formatting

* Formatting

* Initial addition of UMT5ForSequenceClassification

* Adding to inits and formatting

* run make fix-copies

* Add doc for UMT5ForSeqClass

* Update UMT5 config

* Fix docs

* Skip torch fx test for SequenceClassification

* Formatting

* Add skip to UMT5 tests as well

* Fix umt5 tests

* Running make fix-copies

* PR comments

* Fix for change to sentence_representation

* Rename seq_len to hidden_size since that's what it is

* Use base_model to follow format of the rest of the library

* Update docs

* Extract the decoder_input_ids changes and make one liner

* Make one-liner

* Fix doctest (#25031)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Bump certifi from 2022.12.7 to 2023.7.22 in /examples/research_projects/lxmert (#25096)

Bump certifi in /examples/research_projects/lxmert

Bumps [certifi](https://github.com/certifi/python-certifi) from 2022.12.7 to 2023.7.22.
- [Commits](https://github.com/certifi/python-certifi/compare/2022.12.07...2023.07.22)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump certifi from 2022.12.7 to 2023.7.22 in /examples/research_projects/decision_transformer (#25098)

Bump certifi in /examples/research_projects/decision_transformer

Bumps [certifi](https://github.com/certifi/python-certifi) from 2022.12.7 to 2023.7.22.
- [Commits](https://github.com/certifi/python-certifi/compare/2022.12.07...2023.07.22)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump certifi from 2022.12.7 to 2023.7.22 in /examples/research_projects/visual_bert (#25097)

Bump certifi in /examples/research_projects/visual_bert

Bumps [certifi](https://github.com/certifi/python-certifi) from 2022.12.7 to 2023.7.22.
- [Commits](https://github.com/certifi/python-certifi/compare/2022.12.07...2023.07.22)

---
updated-dependencies:
- dependency-name: certifi
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix tied_params for meta tensor (#25101)

* fix tied_params for meta tensor

* remove duplicate

* documentation for llama2 models (#25102)

* fix documentation

* changes

* 🌐[i18n-KO] Translated pipeline_webserver.md to Korean (#24828)

* translated pipeline_webserver.md

Co-Authored-By: Hyeonseo Yun <0525yhs@gmail.com>
Co-Authored-By: Wonhyeong Seo <wonhseo@kakao.com>
Co-Authored-By: Sohyun Sim <96299403+sim-so@users.noreply.github.com>
Co-Authored-By: Gabriel Yang <gabrielwithhappy@gmail.com>
Co-Authored-By: Nayeon Han <nayeon2.han@gmail.com>
Co-Authored-By: Jungnerd <46880056+jungnerd@users.noreply.github.com>

* Update pipeline_webserver.md

* Apply suggestions from code review

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>
Co-authored-by: Sangam Lee <74291999+augustinLib@users.noreply.github.com>
Co-authored-by: Kim haewon <ehdvkf02@naver.com>

---------

Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com>
Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>
Co-authored-by: Gabriel Yang <gabrielwithhappy@gmail.com>
Co-authored-by: Nayeon Han <nayeon2.han@gmail.com>
Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: Sangam Lee <74291999+augustinLib@users.noreply.github.com>
Co-authored-by: Kim haewon <ehdvkf02@naver.com>

* Fix `PvtModelIntegrationTest::test_inference_fp16` (#25106)

update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add descriptive docstring to TemperatureLogitsWarper (#24892)

* Add descriptive docstring to TemperatureLogitsWarper

It addresses https://github.com/huggingface/transformers/issues/24783

* Remove niche features

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Commit suggestion

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Refactor the examples to simpler ones

* Add a missing comma

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Make args description more compact

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Remove extra text after making description more compact

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fix linter

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* fix "UserWarning: Creating a tensor from a list of numpy.ndarrays is … (#24772)

fix "UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor."

Co-authored-by: 刘长伟 <hzliuchw@corp.netease.com>

* update `use_auth_token` -> `token` (#25083)

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix past CI after #24334 (#25113)

update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Move common image processing methods to BaseImageProcessor (#25089)

Move out common methods

* Fix ViT docstring regarding default dropout values. (#25118)

Fix docstring for dropout.

* MaskFormer - enable return_dict in order to compile (#25052)

* Enable return_dict in order to compile

* Update tests

* Move center_crop to BaseImageProcessor (#25122)

* fix deepspeed load best model at end when the model gets sharded (#25057)

* fix delete all checkpoints when save_total_limit is set to 1 (#25136)

* [`T5/LlamaTokenizer`] default legacy to `None` to not always warn (#25131)

default legacy to None

* Clarify 4/8 bit loading log message (#25134)

* clarify 4/8 bit loading log message

* make style

* 🚨🚨🚨Change default from `adamw_hf` to `adamw_torch` 🚨🚨🚨 (#25109)

* Change defaults

* Sylvain's comments

* [`MptConfig`] support from pretrained args (#25116)

* support from pretrained args

* draft addition of tests

* update test

* use parrent assert true

* Update src/transformers/models/mpt/configuration_mpt.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Add offload support to Bark (#25037)

* initial Bark offload proposal

* use hooks instead of manually offloading

* add test of bark offload to cpu feature

* Apply nit suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docstrings of offload

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* remove unecessary set_seed in Bark tests

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* More `token` things (#25146)

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add bloom flax (#25094)

* First commit

* step 1 working

* add alibi

* placeholder for `scan`

* add matrix mult alibi

* beta scaling factor for bmm

* working v1 - simple forward pass

* move layer_number from attribute to arg in call

* partial functioning scan

* hacky working scan

* add more modifs

* add test

* update scan for new kwarg order

* fix position_ids problem

* fix bug in attention layer

* small fix

- do the alibi broadcasting only once

* prelim refactor

* finish refactor

* alibi shifting

* incorporate dropout_add to attention module

* make style

* make padding work again

* update

* remove bogus file

* up

* get generation to work

* clean code a bit

* added small tests

* adding albii test

* make CI tests pass:

- change init weight
- add correct tuple for output attention
- add scan test
- make CI tests work

* fix few nits

* fix nit onnx

* fix onnx nit

* add missing dtype args to nn.Modules

* remove debugging statements

* fix scan generate

* Update modeling_flax_bloom.py

* Update test_modeling_flax_bloom.py

* Update test_modeling_flax_bloom.py

* Update test_modeling_flax_bloom.py

* fix small test issue + make style

* clean up

* Update tests/models/bloom/test_modeling_flax_bloom.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* fix function name

* small fix test

* forward contrib credits from PR17761

* Fix failing test

* fix small typo documentation

* fix non passing test

- remove device from build alibi

* refactor call

- refactor `FlaxBloomBlockCollection` module

* make style

* upcast to fp32

* cleaner way to upcast

* remove unused args

* remove layer number

* fix scan test

* make style

* fix i4 casting

* fix slow test

* Update src/transformers/models/bloom/modeling_flax_bloom.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* remove `layer_past`

* refactor a bit

* fix `scan` slow test

* remove useless import

* major changes

- remove unused code
- refactor a bit
- revert import `torch`

* major refactoring

- change build alibi

* remove scan

* fix tests

* make style

* clean-up alibi

* add integration tests

* up

* fix batch norm conversion

* style

* style

* update pt-fx cross tests

* update copyright

* Update src/transformers/modeling_flax_pytorch_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* per-weight check

* style

* line formats

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <haileyschoelkopf@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add new model in doc table of content (#25148)

* Fix `.push_to_hub` and cleanup `get_full_repo_name` usage (#25120)

* Fix .push_to_hub and cleanup get_full_repo_name usage

* Do not rely on Python bool conversion magic

* request changes

* Add test when downloading from gated repo (#25039)

* override .cuda() to check if model is already quantized (#25166)

* Represent query_length in a different way to solve jit issue (#25164)

Fix jit trace

* make run_generation more generic for other devices (#25133)

* make run_generation more generic for other devices

* use Accelerate to support any device type it supports.

* make style

* fix error usage of accelerator.prepare_model

* use `PartialState` to make sure everything is running on the right device

---------

Co-authored-by: statelesshz <jihuazhong1@huawei.com>

* added compiled model support for inference (#25124)

* added compiled model support for inference

* linter

* Fix tests

* linter

* linter

* remove inference mode from pipelines

* Linter

---------

Co-authored-by: amarkov <alexander@inworld.ai>

* Update `use_auth_token` -> `token` in example scripts (#25167)

* pytorch examples

* tensorflow examples

* flax examples

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [`Mpt`] Fix mpt slow test (#25170)

fix mpt slow test

* [`InstructBlip`] Fix instructblip slow test (#25171)

* fix instruct blip slow test

* Update tests/models/instructblip/test_modeling_instructblip.py

* 🌐 [i18n-KO] Translated `transformers_agents.md` to Korean (#24881)

* docs: ko: transformers_agents.md

* docs: ko: transformers_agents.md

* feat: deepl draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Juntae <79131091+sronger@users.noreply.github.com>
Co-authored-by: Injin Paek <71638597+eenzeenee@users.noreply.github.com>

---------

Co-authored-by: Juntae <79131091+sronger@users.noreply.github.com>
Co-authored-by: Injin Paek <71638597+eenzeenee@users.noreply.github.com>

* Fix beam search to sample at least 1 non eos token (#25103) (#25115)

* [MusicGen] Fix integration tests (#25169)

* move to device

* update with cuda values

* fix fp16

* more rigorous

* 🚨🚨🚨  Fix rescale ViVit Efficientnet (#25174)

* Fix rescaling bug

* Add tests

* Update integration tests

* Fix up

* Update src/transformers/image_transforms.py

* Update test - new possible order in list

* Musicgen: CFG is manually added  (#25173)

* Better error message in `_prepare_output_docstrings` (#25202)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [`PreTrainedModel`] Wrap `cuda` and `to` method correctly (#25206)

wrap `cuda` and `to` method correctly

* Fix `all_model_classes` in `FlaxBloomGenerationTest` (#25211)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [quantization.md] fix (#25190)

Update quantization.md

* [`pipeline`] revisit device check for pipeline (#25207)

* revisit device check for pipeline

* let's raise an error.

* Update tiny model info. and pipeline testing (#25213)

* update tiny_model_summary.json

* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix docker image build failure (#25214)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* make build_mpt_alibi_tensor a method of MptModel so that deepspeed co… (#25193)

make build_mpt_alibi_tensor a method of MptModel so that deepspeed could override it to make autoTP work

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* [`Pix2Struct`] Fix pix2struct cross attention (#25200)

* fix pix2struct cross attention

* fix torchscript slow test

* [`Docs`/`quantization`] Clearer explanation on how things works under the hood. + remove outdated info (#25216)

* clearer explanation on how things works under the hood.

* Update docs/source/en/main_classes/quantization.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/main_classes/quantization.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add `load_in_4bit` in `from_pretrained`

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* [`MPT`] Add  `require_bitsandbytes` on MPT integration tests (#25201)

* add  `require_bitsandbytes` on MPT integration tests

* add it on mpt as well

* [`Detr`] Fix detr BatchNorm replacement issue (#25230)

* fix detr weird issue

* Update src/transformers/models/conditional_detr/modeling_conditional_detr.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix copies

* fix copies

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Move rescale dtype recasting to match torchvision ToTensor (#25229)

Move dtype recasting to match torchvision ToTensor

* Fix set of model parallel in the Trainer when no GPUs are available (#25239)

* fix get_keys_to_not_convert() to return correct modules for full precision inference (#25105)

* add test for `get_keys_to_not_convert`

* add minimum patch to keep mpt lm_head from 8bit quantization

* add reivsion to

* add pathname and line number to logging formatter in debug mode (#25203)

* add pathname and lineno to logging formatter in debug mode

* use TRANSFORMERS_VERBOSITY="detail" to print pathname and lineno

* Add `token` arugment in example scripts (#25172)

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* resolving zero3 init when using accelerate config with Trainer (#25227)

* resolving zero3 init when using accelerate config with Trainer

* refactor

* fix

* fix import

* Update rescale tests - cast to float after rescaling to reflect #25229 (#25259)

Rescale tests - cast to float after rescaling to reflect #25229

* Fix some bugs for two stage training of deformable detr (#25045)

* Update modeling_deformable_detr.py

Fix bugs for two stage training

* Update modeling_deformable_detr.py

* Add test_two_stage_training to DeformableDetrModelTest

---------

Co-authored-by: yupeng.jia <yupeng.jia@momenta.ai>

* [DOCS] Add example and modified docs of EtaLogitsWarper (#25125)

* added example and modified docs for EtaLogitsWarper

* make style

* fixed styling issue on 544

* removed error info and added set_seed

* Update src/transformers/generation/logits_process.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/generation/logits_process.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* updated the results

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix return_dict_in_generate bug in InstructBlip generate function (#25246)

Fix bug in InstructBlip generate function

Previously, the postprocessing conducted on generated sequences in InstructBlip's generate function assumed these sequences were tensors (i.e. that `return_dict_in_generate == False`).

This commit checks whether the result of the call to the wrapped language model `generate()` is a tensor, and if not attempts to postprocess the sequence attribute of the returned results object.

* Remove `pytest_options={"rA": None}` in CI (#25263)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* 🌐 [i18n-KO] Translated `perf_infer_gpu_many.md` to Korean (#24943)

* doc: ko: perf_infer_gpu_many.mdx

* feat: chatgpt draft

* fix: manual edits

* Update docs/source/ko/perf_infer_gpu_many.md

Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

---------

Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

* recommend DeepSpeed's Argument Parsing documentation (#25268)

* [MMS] Fix mms (#25267)

* [MMS] Fix mms

* [MMS] Fix mms

* fix mms loading

* Apply suggestions from code review

* make style

* Update tests/models/wav2vec2/test_modeling_wav2vec2.py

* CI with `num_hidden_layers=2` 🚀🚀🚀 (#25266)

* CI with layers=2

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* CI with `pytest_num_workers=8` for torch/tf jobs (#25274)

n8

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Docs: Update list of `report_to` logging integrations in docstring (#25281)

* Update list of logging integrations in docstring

Also update type hint

* Also add 'flyte' to report_to callback list

* Revert 'report_to' type hint update

Due to CLI breaking

* Update InstructBLIP & Align values after rescale update (#25209)

* Update InstructBLIP values
Note: the tests are not independent. Running the test independentely produces different logits compared to running all the integration tests

* Update test values after rescale update

* Remove left over commented out code

* Revert to previous rescaling logic

* Update rescale tests

* Docs: separate generate section (#25235)

Separate generate doc section

* Update bark doc (#25234)

* add mention to optimization in Bark docs

* add offload mention in docs

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update bark docs.

* Update bark.md

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* add generate method to SpeechT5ForTextToSpeech (#25233)

* add generate method to SpeechT5ForTextToSpeech

* update speecht5forTTS docstrings

* Remove defaults to None in generate docstrings

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add timeout parameter to load_image function (#25184)

* Add timeout parameter to load_image function.

* Remove line.

* Reformat code

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Add parameter to docs.

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* [JAX] Bump min version (#25286)

* [JAX] Bump min version

* make fixup

* [small] llama2.md typo (#25295)

`groupe` -> `grouped`

* Fix typo: Roberta -> RoBERTa (#25302)

* Move usage of deprecated logging.warn to logging.warning (#25310)

The former spelling is deprecated and has been discouraged for a
while. The latter spelling seems to be more common in this project
anyway, so this change ought to be safe.

Fixes https://github.com/huggingface/transformers/issues/25283

* Give more memory in test_disk_offload (#25315)

* Generate: get generation mode as an enum (#25292)

* Add offline mode for agents (#25226)

* Add offline mode for agents

* Disable second check too

* Deal with nested configs better in base class (#25237)

* Deal better with nested configs

* Fixes

* More fixes

* Fix last test

* Clean up existing configs

* Remove hack in MPT Config

* Update src/transformers/configuration_utils.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Fix setting a nested config via dict in the kwargs

* Adapt common test

* Add test for nested config load with dict

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Document check copies (#25291)

* Document check copies better and add tests

* Include header in check for copies

* Manual fixes

* Try autofix

* Fixes

* Clean tests

* Finalize doc

* Remove debug print

* More fixes

* Make `bark` could have tiny model (#25290)

* temp

* update

* update

* update

* small dim

* small dim

* small dim

* fix

* update

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Document toc check and doctest check scripts (#25319)

* Clean doc toc check and make doctest list better

* Add to Makefile

* [Whisper] Better error message for outdated generation config (#25298)

* Remove jnp.DeviceArray since it is deprecated. (#24875)

* Remove jnp.DeviceArray since it is deprecated.

* Replace all instances of jnp.DeviceArray with jax.Array

* Update src/transformers/models/bert/modeling_flax_bert.py

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* add CFG for .generate() (#24654)

* 🌐 [i18n-KO] Translated `perf_infer_gpu_one.md` to Korean (#24978)

* docs: ko: perf_infer_gpu_one

* feat: chatgpt draft

* fix: manual edits

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>
Co-authored-by: TaeYupNoh <107118671+TaeYupNoh@users.noreply.github.com>

* fix: resolve suggestions

* fix: resolve suggestions

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------

Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>
Co-authored-by: TaeYupNoh <107118671+TaeYupNoh@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update TF pin in docker image (#25343)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Generalize CFG to allow for positive prompts (#25339)

* Generalize CFG to allow for positive prompts

* Add documentation, fix the correct class

* Loosen output shape restrictions on GPT-style models (#25188)

* Loosen output shape restrictions on GPT-style models

* Use more self-explanatory variables

* Revert "Use more self-explanatory variables"

This reverts commit 5fd9ab39119558b7e750f61aa4a19014dccc5ed5.

* Allow `trust_remote_code` in example scripts (#25248)

* pytorch examples

* pytorch mim no trainer

* cookiecutter

* flax examples

* missed line in pytorch run_glue

* tensorflow examples

* tensorflow run_clip

* tensorflow run_mlm

* tensorflow run_ner

* tensorflow run_clm

* pytorch example from_configs

* pytorch no trainer examples

* Revert "tensorflow run_clip"

This reverts commit 261f86ac1f1c9e05dd3fd0291e1a1f8e573781d5.

* fix: duplicated argument

* Generate: remove Marian hack (#25294)

Remove Marian hack

* Fix more offload edge cases (#25342)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Migrate Trainer from `Repository` to `upload_folder` (#25095)

* First draft

* Deal with progress bars

* Update src/transformers/utils/hub.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Address review comments

* Forgot one

* Pin hf_hub

* Add argument for push all and fix tests

* Fix tests

* Address review comments

---------

Co-authored-by: Lucain <lucainp@gmail.com>

* Adding more information in help parser on train_file and validation_file (#25324)

chorse: adding new doc on train and val

* [DOCS] Add `NoRepeatNGramLogitsProcessor` Example for `LogitsProcessor` class (#25186)

* Add Description And Example to Docstring

* make style corrections

* make style

* Doc Style Consistent With HF

* Apply make style

* Modify Docstring

* Edit Type in Docstring

* Feedback Incorporated

* Edit Docstring

* make style

* Post Review Changes

* Review Feedback Incorporated

* Styling

* Formatting

* make style

* pep8

* Docs: Added benchmarks for `torch.compile()` for vision models (#24748)

* added benchmarks for compile

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* added more models

* added more models fr

* added visualizations

* minor fix

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/perf_torch_compile.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Added links to models and put charts side by side

* Added batch comparisons

* Added more comparisons

* Fix table

* Added link to wheel

* Update perf_torch_compile.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Add mask2former fp16 support (#25093)

* Add mask2former fp16 support

* Clear consistency/quality issues

* Fix consistency/quality (2)

* Add integration test for mask2former (fp16 case)

* Fix code quality

* Add integration test for maskformer (fp16 case)

* Add integration test for oneformer (fp16 case)

* Remove slow decorator from fp16 tests

* Fix lint

* Remove usage of full inference and value checks for fp16

* Temporarily comment slow for {mask, mask2, one}former

* Add fp16 support to oneformer

* Revert "Temporarily comment slow for {mask, mask2, one}former"

This reverts commit e5371edabd301cf56079def0421a0a87df307cb0.

* Remove dtype conversion noop

* [DOCS] Add descriptive docstring to MinNewTokensLength (#25196)

* Add descriptive docstring to MinNewTokensLength

It addresses https://github.com/huggingface/transformers/issues/24783

* Refine the differences between `min_length` and `min_new_tokens`

* Remove extra line

* Remove extra arguments in generate

* Add a missing space

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Run the linter

* Add clarification comments

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Register ModelOutput subclasses as supported torch.utils._pytree nodes (#25358)

* Register ModelOutput subclasses as supported torch.utils._pytree nodes

Fixes #25357 where DDP with static_graph=True does not sync gradients when calling backward() over tensors contained in ModelOutput subclasses

* Add test for torch pytree ModelOutput serialization and deserialization

* Fix `test_model_parallelism` (#25359)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add warning for missing attention mask when pad tokens are detected (#25345)

* Add attention mask and pad token warning to many of the models

* Remove changes under examples/research_projects

These files are not maintained by HG.

* Skip the warning check during torch.fx or JIT tracing

* Switch ordering for the warning and input shape assignment

This ordering is a little cleaner for some of the cases.

* Add missing line break in one of the files

* [ASR Pipeline] Clarify return timestamps (#25344)

* [ASR Pipeline] Clarify return timestamps

* fix indentation

* fix ctc check

* fix ctc error message!

* fix test

* fix other test

* add new tests

* final comment

* MaskFormer, Mask2Former - replace einsum for tracing (#25297)

* Replace einsum with ops for tracing

* Fix comment

* Load state in else (#25318)

* Load else

* New approach

* Propagate

* Fix `token` in example template (#25351)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Enable tests to run on third-party devcies (#25327)

* enable unit tests to run on third-party devcies other than CUDA and CPU.

* remove the modification that enabled ut on MPS

* control test on third-party device by env variable

* update

---------

Co-authored-by: statelesshz <jihuazhong1@huawei.com>

* 🌐 [i18n-KO] Translated `add_tensorflow_model.md` to Korean (#25017)

* docs: ko: add_tensorflow_model.md

* feat: chatgpt draft

* fix: manual edits

* fix: manual edits

* fix: resolve suggestions

* fix: manual edits

* Fix `torch_job` worker(s) crashing (#25374)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Generate: add config-level validation (#25381)

* Fix missing usage of `token` (#25382)

* add missing tokens

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Use small config for `OneFormerModelTest.test_model_with_labels` (#25383)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add copied from for image processor methods (#25121)

* Add copied from statements for image processors

* Move out rescale and normalize to base image processor

* Remove rescale and normalize from vit (post rebase)

* Update docstrings and tidy up

* PR comments

* change version (#25387)

* [DOCS] Add example for `TopPLogitsWarper`  (#25361)

* [DOCS] Add example for `TopPLogitsWarper`

* fix typo

* address review feedback

* address review nits

* 🌐 [i18n-KO] Translated `perf_train_cpu_many.md` to Korean (#24923)

* docs: ko: perf_train_cpu_many.md

* feat: chatgpt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

---------

Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

* 16059 - Add missing type hints for ASTModel (#25364)

* 16059 - Add missing type hints for ASTModel

* Add an additional type hint

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* rm useless condition since the previous condition contains it. (#25403)

* Fix path for dynamic module creation (#25402)

* YOLOS - Revert default return_pixel_mask value (#25404)

Revert default return_pixel_mask value

* Docs: introduction to generation with LLMs (#25240)

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Generate: length validation (#25384)

* Improve training args (#25401)

* enhanced tips for some training args

* make style

* Generate: generation config validation fixes in docs (#25405)

* 16059 - Add extra type hints for AltCLIPModel (#25399)

* Generate: lower severity of parameterization checks (#25407)

* VQA task guide (#25244)

* initial commit

* semi-finished task guide draft

* image link

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/tasks/visual_question_answering.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* feedback addressed

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* nits addressed

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* 🌐 [i18n-KO] Translated `add_new_model.md` to Korean (#24957)

* docs: ko: add_new_model.md

* feat: chatgpt draft

* fix: manual edits

* fix: change document title

* fix: edit with reviewers

Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

* fix: edit with reviewers

Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

* fix: edit with reviewers

Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

* fix: edit with reviewers

Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

* fix: edit with reviewers

Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com>

* fix: edit with reviewers

Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com>

* fix: edit with reviewers

Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com>

* fix: edit with reviewers

Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

* fix: add anchor to header

* Update docs/source/ko/add_new_model.md

Co-authored-by: 이서정 <97655267+sjlee-wise@users.noreply.github.com>

* Update docs/source/ko/add_new_model.md

Co-authored-by: 이서정 <97655267+sjlee-wise@users.noreply.github.com>

* Update docs/source/ko/add_new_model.md

Co-authored-by: 이서정 <97655267+sjlee-wise@users.noreply.github.com>

* fix: edit with reviews

* feat: edit toctree

---------

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com>
Co-authored-by: 이서정 <97655267+sjlee-wise@users.noreply.github.com>

* 🌐 [i18n-KO] Translated `model_summary.md` to Korean (#24625)

* docs: ko: model_summary.md

* feat: nmt and manual edit model_summary.mdx

* fix: resolve suggestions

Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>
Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* fix: resolve suggestions2

Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>

---------

Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>
Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* Update Bark generation configs and tests (#25409)

* update bark generation configs for more coherent parameter

* make style

* update bark hub repo

* aligned sample_beam output selection with beam_search (#25375)

* aligned sample_beam specs with beam_search

* pull origin main

* Revert "pull origin main"

This reverts commit 06d356f1137bb52272e120a03636598c44449cf3.

* update test_utils.py

* fix format

* remove comment

---------

Co-authored-by: Shogo Fujita <shogo.fujita@legalontech.jp>

* Enable passing number of channels when inferring data format (#25412)

* Bark: flexible generation config overload (#25414)

* [DINOv2] Update pooler output (#25392)

Update pooler output

* 🌐 [i18n-KO] Translated `philosophy.md` to Korean (#25010)

* docs: ko: philosophy.md

* feat: chatgpt draft

* fix: manual edits

* fix: resolve suggestions

* Doc checks (#25408)

* Document check_dummies

* Type hints and doc in other files

* Document check inits

* Add documentation to

* Address review comments

* Generation: strict generation config validation at save time (#25411)

* strict gen config save; Add tests

* add note that the warning will be an exception in v4.34

* [WavLM] Fix Arxiv link and authors (#25415)

* [WavLM] Fix Arxiv link and authors

* make style

* Generate: Load generation config when `device_map` is passed (#25413)

* Fix rendering for `torch.compile()` docs (#25432)

fix rendering

* Add `examples`  to tests to run when `setup.py` is modified (#25437)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix issue with ratio evaluation steps and auto find batch size (#25436)

* Fully rebased solution

* 500

* docs: add LLaMA-Efficient-Tuning to awesome-transformers (#25441)

Co-authored-by: statelesshz <jihuazhong1@huawei.com>

* GPTQ integration (#25062)

* GTPQ integration

* Add tests for gptq

* support for more quantization model

* fix style

* typo

* fix method

* Update src/transformers/modeling_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add dataclass and fix quantization_method

* fix doc

* Update tests/quantization/gptq/test_gptq.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* modify dataclass

* add gtpqconfig import

* fix typo

* fix tests

* remove dataset as req arg

* remove tokenizer import

* add offload cpu quantization test

* fix check dataset

* modify dockerfile

* protect trainer

* style

* test for config

* add more log

* overwrite torch_dtype

* draft doc

* modify quantization_config docstring

* fix class name in docstring

* Apply suggestions from code review

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* more warning

* fix 8bit kwargs tests

* peft compatibility

* remove var

* fix is_gptq_quantized

* remove is_gptq_quantized

* fix wrap

* Update src/transformers/modeling_utils.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* add exllama

* skip test

* overwrite float16

* style

* fix skip test

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix docsting formatting

* add doc

* better test

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Fix for #25437 (#25454)

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* not debugged code

* reference code so nothing is lost

* novelty

* added docstrings

* fixed some relative import errors

* fixed small bugs

* added linear layers to bloom

* removed impossible embedding method

* Update src/transformers/models/bloom/desequence_graph_ids.py

Co-au…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants