Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: Add mistral v3 conversion script #30981

Merged
merged 3 commits into from
May 29, 2024

Conversation

younesbelkada
Copy link
Contributor

@younesbelkada younesbelkada commented May 23, 2024

What does this PR do?

Adds the conversion script to convert mistral-v3 models

cc @ArthurZucker

Fixes: #31093

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but we are missing the function calling tokens that need to be added to the tokenizer!

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

younesbelkada and others added 2 commits May 29, 2024 11:08
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
@younesbelkada younesbelkada merged commit bfe6f51 into main May 29, 2024
8 checks passed
@younesbelkada younesbelkada deleted the add-mistral-conversion-script branch May 29, 2024 09:43
vasqu pushed a commit to vasqu/transformers that referenced this pull request Jun 1, 2024
* add mistral v3 conversion script

* Update src/transformers/models/mistral/convert_mistral_weights_to_hf.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
vasqu added a commit to vasqu/transformers that referenced this pull request Jun 1, 2024
commit bf6ea14
Merge: b3261f5 96eb062
Author: Vasqu <antonprogamer@gmail.com>
Date:   Sat Jun 1 02:49:53 2024 +0200

    Merge remote-tracking branch 'origin/main'

commit b3261f5
Author: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Date:   Fri May 31 18:37:43 2024 +0200

    Diff converter v2 (huggingface#30868)

    * current working example!

    * commit regex and result file

    * update

    * nit

    * push the conversion file

    * oups

    * roadmap and nits

    * attempt diffs for 3 files

    * persimmon

    * nit

    * add diff file that is the same as the modeling_llama.py

    * fix rope nits

    * updates

    * updates with converted versions

    * give some breathing space to the code

    * delete

    * update

    * update

    * push the actual result

    * update regex patterns

    * update regex patterns

    * fix some issues

    * fix some issues

    * fix some issues

    * updates

    * updates

    * updates

    * updates

    * updates

    * revert changes done to llama

    * updates

    * update gemma

    * updates

    * oups

    * current state

    * current state

    * update

    * ouiiii

    * nit

    * clear diffs

    * nit

    * fixup

    * update

    * doc 🚀

    * 🔥

    * for now use gemma

    * deal with comments

    * style

    * handle funtions

    * deal with assigns

    * todos

    * process inheritage

    * keep decorators?

    * 🤗

    * deal with duplicates

    * fixup

    * correctly remove duplicate code

    * run ruff post script

    * ruff deals pretty well with imports, let's leave it to him

    * ah maybe not lol

    * for now remove all imports from child.

    * nit

    * conversion of llama

    * okay

    * convert starcoder2

    * synch with main

    * update llama diff

    * updates

    * https://docs.astral.sh/ruff/rules/redefined-while-unused/ fixes the imports, bit needs later version of ruff

    * updates

    * okay actual state

    * non zero exit

    * update!

    * revert unrelated

    * remove other diff files

    * updates

    * cleanup

    * update

    * less diff!

    * stash

    * current updates

    * updates

    * No need for call

    * finished fining deps

    * update

    * current changes

    * current state

    * current state

    * new status

    * nit

    * finally

    * fixes

    * nits

    * order is now expected

    * use logger info instead of prints

    * fixup

    * up

    * nit

    * update

    * nits

    * update

    * correct merge

    * update

    * update

    * update

    * add warning

    * update caution message

    * update

    * better merging strategy

    * copy class statements :wink

    * fixups

    * nits

    * update

    * Apply suggestions from code review

    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

    * nits

    * smaller header

    * do cleanup some stuff

    * even simpler header?

    * fixup

    * updates

    * ruff

    * update examples

    * nit

    * TODO

    * state

    * OUUUUUUF

    * current state

    * nits

    * final state

    * add a readme

    * fixup

    * remove diff llama

    * fix

    * nit

    * dummy noy funny

    * ruff format tests src utils --check

    * everless diffs

    * less diffs and fix test

    * fixes

    * naming nit?

    * update converter and add supper example

    * nits

    * updated for function signatures

    * update

    * update

    * add converted dummies

    * autoformat

    * single target assign fix

    * fixup

    * fix some imports

    * fixes

    * don't push them

    * `# noqa: F841`

    ---------

    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

commit ba34b39
Author: Vallepu Vamsi Krishna <vallepu670@gmail.com>
Date:   Fri May 31 21:53:11 2024 +0530

    Added description of quantization_config (huggingface#31133)

    * Description of quantization_config

    Added missing description about quantization_config in replace_with_bnb_linear for better readability.

    * Removed trailing spaces

commit 2a2ec42
Author: Pavel Iakubovskii <qubvel@gmail.com>
Date:   Fri May 31 16:56:17 2024 +0100

    Instance segmentation examples (huggingface#31084)

    * Initial setup

    * Metrics

    * Overfit on two batches

    * Train 40 epochs

    * Memory leak debugging

    * Trainer fine-tuning

    * Draft

    * Fixup

    * Trained end-to-end

    * Add requirements

    * Rewrite evaluator

    * nits

    * Add readme

    * Add instance-segmentation to the table

    * Support void masks

    * Remove sh

    * Update docs

    * Add pytorch test

    * Add accelerate test

    * Update examples/pytorch/instance-segmentation/README.md

    * Update examples/pytorch/instance-segmentation/run_instance_segmentation.py

    * Update examples/pytorch/instance-segmentation/run_instance_segmentation_no_trainer.py

    * Update examples/pytorch/instance-segmentation/run_instance_segmentation_no_trainer.py

    * Update examples/pytorch/instance-segmentation/run_instance_segmentation.py

    * Fix consistency oneformer

    * Fix imports

    * Fix imports sort

    * Apply suggestions from code review

    Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

    * Update examples/pytorch/instance-segmentation/run_instance_segmentation.py

    Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>

    * Add resources to docs

    * Update examples/pytorch/instance-segmentation/README.md

    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

    * Update examples/pytorch/instance-segmentation/README.md

    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

    * Remove explicit model_type argument

    * Fix tests

    * Update readme

    * Note about other models

    ---------

    Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
    Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

commit 3231ed4
Author: Aymeric Roucher <69208727+aymeric-roucher@users.noreply.github.com>
Date:   Fri May 31 14:16:23 2024 +0200

    Add streaming, various fixes (huggingface#30838)

    * Implement streaming run in ReAct agents
    * Allow additional imports in code agents
    * Python interpreter: support classes and exceptions, fixes

commit 899d73f
Author: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Date:   Fri May 31 12:44:20 2024 +0200

    [trainer] add sanity evaluation option  (huggingface#31146)

    * add sanity evaluation

    * fix

    * Apply suggestions from code review

    Co-authored-by: Zach Mueller <muellerzr@gmail.com>

    * fix

    ---------

    Co-authored-by: Zach Mueller <muellerzr@gmail.com>

commit 09daece
Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Date:   Fri May 31 12:36:46 2024 +0200

    Quantization: Enhance bnb error message (huggingface#31160)

    enhance error message

commit 390c9f4
Author: Asif Ajrof <asifajrof@gmail.com>
Date:   Fri May 31 16:34:29 2024 +0600

    Update sam.md (huggingface#31130)

    `mask` variable is not defined. probably a writing mistake. it should be `segmentation_map`. `segmentation_map` should be a `1` channel image rather than `RGB`.
    [on a different note, the `mask_url` is the same as `raw_image`. could provide a better example.

commit a6967c0
Author: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Date:   Fri May 31 12:08:55 2024 +0200

    Fix quantized cache output (huggingface#31143)

commit aa2e1d4
Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Date:   Fri May 31 10:35:54 2024 +0200

    pytest -rsfE (huggingface#31140)

    Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

commit 6c33f18
Author: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Date:   Fri May 31 08:49:33 2024 +0200

    helper (huggingface#31152)

    * helper

    * Apply suggestions from code review

    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

    * updates

    * more doc

    ---------

    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

commit adb74a2
Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Date:   Thu May 30 17:21:10 2024 +0200

    Workflow: Remove `IS_GITHUB_CI` (huggingface#31147)

    remove `IS_GITHUB_CI`

commit 3553184
Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Date:   Thu May 30 16:47:35 2024 +0200

    Docs / Quantization: Replace all occurences of `load_in_8bit` with bnb config (huggingface#31136)

    Replace all occurences of `load_in_8bit` with bnb config

commit e6dcdfd
Author: zspo <songpo.zhang@foxmail.com>
Date:   Thu May 30 22:25:43 2024 +0800

    fix get_scheduler when name is warmup_stable_decay (huggingface#31128)

    fix get_scheduler args

commit 9d8b6ea
Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Date:   Thu May 30 11:45:03 2024 +0200

    FIX / Quantization: Add extra validation for bnb config (huggingface#31135)

    add validation for bnb config

commit 7fc432f
Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Date:   Wed May 29 19:43:51 2024 +0200

    Cleanup docker build (huggingface#31119)

    * remove

    * build

    ---------

    Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

commit c350b52
Author: Dhruv Pai <46631243+dhruvbpai@users.noreply.github.com>
Date:   Wed May 29 07:20:59 2024 -0700

    Add on_optimizer_step to callback options (huggingface#31095)

    * Modified test

    * Added on_optimizer_step to callbacks

    * Move callback after step is called

    * Added on optimizer step callback

commit 545d7ca
Author: Joao Gante <joaofranciscocardosogante@gmail.com>
Date:   Wed May 29 15:17:14 2024 +0100

    Add VLM generation default contributor (huggingface#31115)

    * add Raushan

    * add Raushan

commit 296c546
Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Date:   Wed May 29 15:56:28 2024 +0200

    FIX / Docs: Fix GPTQ expected number of bits (huggingface#31111)

    Update overview.md

commit b643801
Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Date:   Wed May 29 15:42:39 2024 +0200

    Fix nightly circleci (huggingface#31114)

    * fix

    * fix

    ---------

    Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

commit 89261a1
Author: Zach Mueller <muellerzr@gmail.com>
Date:   Wed May 29 09:35:37 2024 -0400

    Rm maintainer + migrate (huggingface#31089)

commit 0e3643c
Author: Matt <Rocketknight1@users.noreply.github.com>
Date:   Wed May 29 13:33:26 2024 +0100

    Fix faulty rstrip in module loading (huggingface#31108)

commit a41deea
Author: Matt <Rocketknight1@users.noreply.github.com>
Date:   Wed May 29 13:20:36 2024 +0100

    Fix env.py in cases where torch is not present (huggingface#31113)

    * Fix env.py in cases where torch is not present

    * Simplify the fix (and avoid some issues)

commit 61f854a
Author: Huazhong Ji <hzji210@gmail.com>
Date:   Wed May 29 18:57:54 2024 +0800

    Improve `transformers-cli env` reporting (huggingface#31003)

    * Improve `transformers-cli env` reporting

    * move the line `"Using GPU in script?": "<fill in>"` to in if conditional
    statement

    * same option for npu

commit 40ed3a8
Author: Lucain <lucainp@gmail.com>
Date:   Wed May 29 12:55:43 2024 +0200

    Use `HF_HUB_OFFLINE` + fix has_file in offline mode (huggingface#31016)

    * Fix has_file in offline mode

    * harmonize env variable for offline mode

    * Switch to HF_HUB_OFFLINE

    * fix test

    * revert test_offline to test TRANSFORMERS_OFFLINE

    * Add new offline test

    * merge conflicts

    * docs

commit 300d03c
Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Date:   Wed May 29 11:43:54 2024 +0200

    FEAT: Add mistral v3 conversion script (huggingface#30981)

    * add mistral v3 conversion script

    * Update src/transformers/models/mistral/convert_mistral_weights_to_hf.py

    Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

    * fixup

    ---------

    Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

commit 524d7bf
Author: Raushan Turganbay <raushan@huggingface.co>
Date:   Wed May 29 14:25:44 2024 +0500

    Quantized KV cache: update quanto (huggingface#31052)

    * quanto latest version was refactored

    * add error msg

    * incorrect compare sign

    * Update src/transformers/cache_utils.py

    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

    ---------

    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

commit 9f98c9c
Author: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Date:   Tue May 28 18:07:07 2024 +0100

    Deprecate low use models (huggingface#30781)

    * Deprecate models
    - graphormer
    - time_series_transformer
    - xlm_prophetnet
    - qdqbert
    - nat
    - ernie_m
    - tvlt
    - nezha
    - mega
    - jukebox
    - vit_hybrid
    - x_clip
    - deta
    - speech_to_text_2
    - efficientformer
    - realm
    - gptsan_japanese

    * Fix up

    * Fix speech2text2 imports

    * Make sure message isn't indented

    * Fix docstrings

    * Correctly map for deprecated models from model_type

    * Uncomment out

    * Add back time series transformer and x-clip

    * Import fix and fix-up

    * Fix up with updated ruff

commit 1cb30f0
Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Date:   Tue May 28 18:29:22 2024 +0200

    Docs / Quantization: Redirect deleted page (huggingface#31063)

    Update _redirects.yml

commit 1ed4924
Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Date:   Tue May 28 18:29:11 2024 +0200

    TST: Fix instruct-blip tests (huggingface#31088)

    * fix flan t5 tests

    * better format

commit 2a08fd3
Author: Jonny Li <jonny_li@live.ca>
Date:   Tue May 28 12:25:15 2024 -0400

    Fix DeepSpeed compatibility with weight_norm (huggingface#30881) (huggingface#31018)

commit b5f4ec6
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Tue May 28 17:47:35 2024 +0200

    Fix PretrainedConfig docstring with deprecated resume_download (huggingface#31014)

commit 454cbe0
Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Date:   Tue May 28 17:44:52 2024 +0200

    skip `test_multi_gpu_data_parallel_forward` for `vit` and `deit` (huggingface#31086)

    fix

    Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

commit e70c2ea
Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Date:   Tue May 28 17:06:00 2024 +0200

    FIX / OPT: Fix OPT multi-GPU training for `OPTForQuestionAnswering` (huggingface#31092)

    Update modeling_opt.py

commit 6560e25
Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Date:   Tue May 28 17:05:44 2024 +0200

    FIX: Add `accelerate` as a hard requirement (huggingface#31090)

    add accelerate

commit 9bf05ec
Author: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Date:   Tue May 28 16:02:51 2024 +0200

    Render chat template tojson filter as unicode (huggingface#31041)

    * Render chat template tojson filter as unicode

    * ruff--

commit e405f2b
Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Date:   Tue May 28 15:04:43 2024 +0200

    Docs / PEFT: Add PEFT API documentation (huggingface#31078)

    * add peft references

    * add peft references

    * Update docs/source/en/peft.md

    * Update docs/source/en/peft.md

commit 5237955
Author: Raushan Turganbay <raushan@huggingface.co>
Date:   Tue May 28 17:07:42 2024 +0500

    Watermark: fix tests (huggingface#30961)

    * fix tests

    * style

    * Update tests/generation/test_utils.py

    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

    ---------

    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

commit f2a7f7c
Author: Lysandre Debut <hi@lysand.re>
Date:   Tue May 28 13:34:23 2024 +0200

    Fix failing tokenizer tests (huggingface#31083)

    * Fix failing tokenizer tests

    * Use small tokenizer

    * Fix remaining reference

commit 0e1935b
Author: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Date:   Tue May 28 13:22:06 2024 +0200

    [SuperPoint, PaliGemma] Update docs (huggingface#31025)

    * Update docs

    * Add PaliGemma resources

    * Address comment

    * Update docs

commit 2fe8356
Author: Sina Taslimi <33656391+taslimisina@users.noreply.github.com>
Date:   Tue May 28 13:09:32 2024 +0200

    Fix typo in trainer.py (huggingface#31048)

commit b74960c
Author: Pavel Iakubovskii <qubvel@gmail.com>
Date:   Tue May 28 11:06:06 2024 +0000

    Fix OWLv2 post_process_object_detection for multiple images (huggingface#31082)

    * Add test for multiple images

    * [run slow] owlv2

    * Fix box rescaling

    * [run slow] owlv2

commit 3e3599d
Author: Pavel Iakubovskii <qubvel@gmail.com>
Date:   Tue May 28 10:41:40 2024 +0000

    Remove float64 cast for OwlVit and OwlV2 to support MPS device (huggingface#31071)

    Remove float64

commit 48d33da
Author: oOraph <13552058+oOraph@users.noreply.github.com>
Date:   Tue May 28 11:56:05 2024 +0200

    fix from_pretrained in offline mode when model is preloaded in cache (huggingface#31010)

    * Unit test to verify fix

    Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com>

    * fix from_pretrained in offline mode when model is preloaded in cache

    Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com>

    * minor: fmt

    Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com>

    ---------

    Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com>
    Co-authored-by: Raphael Glon <oOraph@users.noreply.github.com>

commit 7c472e6
Author: Hengwen Tong <kevint324@gmail.com>
Date:   Tue May 28 17:52:47 2024 +0800

    Remove redundant backend checks in training_args.py (huggingface#30999)

    * Remove backend checks in training_args.py

    * Expilicit initialize the device

    ---------

    Co-authored-by: tonghengwen <tonghengwen@cambricon.com>

commit 46b606e
Author: AP <108011872+apalkk@users.noreply.github.com>
Date:   Tue May 28 09:50:45 2024 +0000

    Update quicktour.md to fix broken link to Glossary (huggingface#31072)

    Update quicktour.md to fix broken link

    Missing '/' in attention mask link in the transformers quicktour

commit 580f464
Author: Clint Adams <clint@gcfm.net>
Date:   Tue May 28 05:48:23 2024 -0400

    fix "piano" typo (huggingface#31027)

commit 5e211d5
Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Date:   Tue May 28 11:36:26 2024 +0200

    Remove `ninja` from docker image build (huggingface#31080)

    fix

    Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

commit 8b91c20
Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Date:   Tue May 28 10:53:28 2024 +0200

    use `@main` (huggingface#31065)

    use main

    Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

commit 04440a0
Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Date:   Mon May 27 18:36:39 2024 +0200

    skip `test_model_parallelism` for 2 model test classes (huggingface#31067)

    skip

    Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

commit f803e2b
Author: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
Date:   Mon May 27 16:09:05 2024 +0200

    Fix pad_to_max_length Whisper (huggingface#30787)

    * fix pad_to_max_length Whisper

    * add tests

    * make style

commit b6eb29b
Author: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Date:   Mon May 27 15:53:45 2024 +0200

    Fix quanto tests (huggingface#31062)

    fix quanto tests

commit e581213
Author: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Date:   Mon May 27 14:16:47 2024 +0100

    Update feature request label in template (huggingface#30940)

commit 05eff71
Author: Eitan Turok <150733043+eitanturok@users.noreply.github.com>
Date:   Mon May 27 08:57:43 2024 -0400

    Follow up: Fix link in dbrx.md (huggingface#30514)

    * Fix link in dbrx.md

    * remove "though this may not be up to date"

    ---------

    Co-authored-by: Lysandre Debut <hi@lysand.re>

commit d5aa839
Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Date:   Mon May 27 13:47:47 2024 +0200

    unpin uv (huggingface#31055)

    [push-ci-image]

    Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

commit 165bd7a
Author: Aymeric Roucher <69208727+aymeric-roucher@users.noreply.github.com>
Date:   Mon May 27 10:34:14 2024 +0200

    Redirect transformers_agents doc to agents (huggingface#31054)

commit 6df5028
Author: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Date:   Fri May 24 19:02:55 2024 +0200

    Paligemma- fix devices and dtype assignments (huggingface#31008)

    * fix devices and dtype assignments

    * [run-slow]paligemma

commit 61f1d47
Author: Ita Zaporozhets <31893021+itazap@users.noreply.github.com>
Date:   Fri May 24 17:38:58 2024 +0200

    Add split special tokens (huggingface#30772)

    * seems like `split_special_tokens` is used here

    * split special token

    * add new line at end of file

    * moving split special token test to common tests

    * added assertions

    * test

    * fixup

    * add co-author

    * passing rest of args to gptsan_japanese, fixing tests

    * removing direct comparison of fast and slow models

    * adding test support for UDOP and LayoutXLM

    * ruff fix

    * readd check if slow tokenizer

    * modify test to handle bos tokens

    * removing commented function

    * trigger build

    * applying review feedback - updated docstrings, var names, and simplified tests

    * ruff fixes

    * Update tests/test_tokenization_common.py

    Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

    * applying feedback, comments

    * shutil temp directory fix

    ---------

    Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
    Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MBP.localdomain>
    Co-authored-by: itazap <itazap@users.noreply.github.com>
    Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
    Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MacBook-Pro.local>

commit e2b9913
Author: BHUVAN M <121122109+bhuvanmdev@users.noreply.github.com>
Date:   Fri May 24 20:50:09 2024 +0530

    added interpolation for vitmae model in pytorch as well as tf. (huggingface#30732)

    * added interpolation for vitmae model in pytorch as well as tf.

    * Update modeling_vit_mae.py

    irreugalr import fixed

    * small changes and proper formatting

    * changes suggested in review.

    * modified decoder interpolate_func

    * arguments and docstring fix

    * Apply suggestions from code review

    doc fixes

    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

    ---------

    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

commit 96eb062
Author: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Date:   Fri May 31 18:37:43 2024 +0200

    Diff converter v2 (huggingface#30868)

    * current working example!

    * commit regex and result file

    * update

    * nit

    * push the conversion file

    * oups

    * roadmap and nits

    * attempt diffs for 3 files

    * persimmon

    * nit

    * add diff file that is the same as the modeling_llama.py

    * fix rope nits

    * updates

    * updates with converted versions

    * give some breathing space to the code

    * delete

    * update

    * update

    * push the actual result

    * update regex patterns

    * update regex patterns

    * fix some issues

    * fix some issues

    * fix some issues

    * updates

    * updates

    * updates

    * updates

    * updates

    * revert changes done to llama

    * updates

    * update gemma

    * updates

    * oups

    * current state

    * current state

    * update

    * ouiiii

    * nit

    * clear diffs

    * nit

    * fixup

    * update

    * doc 🚀

    * 🔥

    * for now use gemma

    * deal with comments

    * style

    * handle funtions

    * deal with assigns

    * todos

    * process inheritage

    * keep decorators?

    * 🤗

    * deal with duplicates

    * fixup

    * correctly remove duplicate code

    * run ruff post script

    * ruff deals pretty well with imports, let's leave it to him

    * ah maybe not lol

    * for now remove all imports from child.

    * nit

    * conversion of llama

    * okay

    * convert starcoder2

    * synch with main

    * update llama diff

    * updates

    * https://docs.astral.sh/ruff/rules/redefined-while-unused/ fixes the imports, bit needs later version of ruff

    * updates

    * okay actual state

    * non zero exit

    * update!

    * revert unrelated

    * remove other diff files

    * updates

    * cleanup

    * update

    * less diff!

    * stash

    * current updates

    * updates

    * No need for call

    * finished fining deps

    * update

    * current changes

    * current state

    * current state

    * new status

    * nit

    * finally

    * fixes

    * nits

    * order is now expected

    * use logger info instead of prints

    * fixup

    * up

    * nit

    * update

    * nits

    * update

    * correct merge

    * update

    * update

    * update

    * add warning

    * update caution message

    * update

    * better merging strategy

    * copy class statements :wink

    * fixups

    * nits

    * update

    * Apply suggestions from code review

    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

    * nits

    * smaller header

    * do cleanup some stuff

    * even simpler header?

    * fixup

    * updates

    * ruff

    * update examples

    * nit

    * TODO

    * state

    * OUUUUUUF

    * current state

    * nits

    * final state

    * add a readme

    * fixup

    * remove diff llama

    * fix

    * nit

    * dummy noy funny

    * ruff format tests src utils --check

    * everless diffs

    * less diffs and fix test

    * fixes

    * naming nit?

    * update converter and add supper example

    * nits

    * updated for function signatures

    * update

    * update

    * add converted dummies

    * autoformat

    * single target assign fix

    * fixup

    * fix some imports

    * fixes

    * don't push them

    * `# noqa: F841`

    ---------

    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

commit 372baec
Author: Vallepu Vamsi Krishna <vallepu670@gmail.com>
Date:   Fri May 31 21:53:11 2024 +0530

    Added description of quantization_config (huggingface#31133)

    * Description of quantization_config

    Added missing description about quantization_config in replace_with_bnb_linear for better readability.

    * Removed trailing spaces

commit cdc8131
Author: Pavel Iakubovskii <qubvel@gmail.com>
Date:   Fri May 31 16:56:17 2024 +0100

    Instance segmentation examples (huggingface#31084)

    * Initial setup

    * Metrics

    * Overfit on two batches

    * Train 40 epochs

    * Memory leak debugging

    * Trainer fine-tuning

    * Draft

    * Fixup

    * Trained end-to-end

    * Add requirements

    * Rewrite evaluator

    * nits

    * Add readme

    * Add instance-segmentation to the table

    * Support void masks

    * Remove sh

    * Update docs

    * Add pytorch test

    * Add accelerate test

    * Update examples/pytorch/instance-segmentation/README.md

    * Update examples/pytorch/instance-segmentation/run_instance_segmentation.py

    * Update examples/pytorch/instance-segmentation/run_instance_segmentation_no_trainer.py

    * Update examples/pytorch/instance-segmentation/run_instance_segmentation_no_trainer.py

    * Update examples/pytorch/instance-segmentation/run_instance_segmentation.py

    * Fix consistency oneformer

    * Fix imports

    * Fix imports sort

    * Apply suggestions from code review

    Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

    * Update examples/pytorch/instance-segmentation/run_instance_segmentation.py

    Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>

    * Add resources to docs

    * Update examples/pytorch/instance-segmentation/README.md

    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

    * Update examples/pytorch/instance-segmentation/README.md

    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

    * Remove explicit model_type argument

    * Fix tests

    * Update readme

    * Note about other models

    ---------

    Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
    Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

commit 9837a25
Author: Aymeric Roucher <69208727+aymeric-roucher@users.noreply.github.com>
Date:   Fri May 31 14:16:23 2024 +0200

    Add streaming, various fixes (huggingface#30838)

    * Implement streaming run in ReAct agents
    * Allow additional imports in code agents
    * Python interpreter: support classes and exceptions, fixes

commit f8e6ba4
Author: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Date:   Fri May 31 12:44:20 2024 +0200

    [trainer] add sanity evaluation option  (huggingface#31146)

    * add sanity evaluation

    * fix

    * Apply suggestions from code review

    Co-authored-by: Zach Mueller <muellerzr@gmail.com>

    * fix

    ---------

    Co-authored-by: Zach Mueller <muellerzr@gmail.com>

commit fc5d3e1
Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Date:   Fri May 31 12:36:46 2024 +0200

    Quantization: Enhance bnb error message (huggingface#31160)

    enhance error message

commit bd9d1dd
Author: Asif Ajrof <asifajrof@gmail.com>
Date:   Fri May 31 16:34:29 2024 +0600

    Update sam.md (huggingface#31130)

    `mask` variable is not defined. probably a writing mistake. it should be `segmentation_map`. `segmentation_map` should be a `1` channel image rather than `RGB`.
    [on a different note, the `mask_url` is the same as `raw_image`. could provide a better example.

commit 48cada8
Author: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Date:   Fri May 31 12:08:55 2024 +0200

    Fix quantized cache output (huggingface#31143)

commit d19566e
Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Date:   Fri May 31 10:35:54 2024 +0200

    pytest -rsfE (huggingface#31140)

    Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

commit f3f640d
Author: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Date:   Fri May 31 08:49:33 2024 +0200

    helper (huggingface#31152)

    * helper

    * Apply suggestions from code review

    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

    * updates

    * more doc

    ---------

    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

commit 6bd511a
Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Date:   Thu May 30 17:21:10 2024 +0200

    Workflow: Remove `IS_GITHUB_CI` (huggingface#31147)

    remove `IS_GITHUB_CI`

commit f5590de
Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Date:   Thu May 30 16:47:35 2024 +0200

    Docs / Quantization: Replace all occurences of `load_in_8bit` with bnb config (huggingface#31136)

    Replace all occurences of `load_in_8bit` with bnb config

commit cda9c82
Author: zspo <songpo.zhang@foxmail.com>
Date:   Thu May 30 22:25:43 2024 +0800

    fix get_scheduler when name is warmup_stable_decay (huggingface#31128)

    fix get_scheduler args

commit 5e5c4d6
Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Date:   Thu May 30 11:45:03 2024 +0200

    FIX / Quantization: Add extra validation for bnb config (huggingface#31135)

    add validation for bnb config

commit 2b9e252
Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Date:   Wed May 29 19:43:51 2024 +0200

    Cleanup docker build (huggingface#31119)

    * remove

    * build

    ---------

    Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

commit 5c88253
Author: Dhruv Pai <46631243+dhruvbpai@users.noreply.github.com>
Date:   Wed May 29 07:20:59 2024 -0700

    Add on_optimizer_step to callback options (huggingface#31095)

    * Modified test

    * Added on_optimizer_step to callbacks

    * Move callback after step is called

    * Added on optimizer step callback

commit 4af705c
Author: Joao Gante <joaofranciscocardosogante@gmail.com>
Date:   Wed May 29 15:17:14 2024 +0100

    Add VLM generation default contributor (huggingface#31115)

    * add Raushan

    * add Raushan

commit cb879c5
Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Date:   Wed May 29 15:56:28 2024 +0200

    FIX / Docs: Fix GPTQ expected number of bits (huggingface#31111)

    Update overview.md

commit 1f84141
Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Date:   Wed May 29 15:42:39 2024 +0200

    Fix nightly circleci (huggingface#31114)

    * fix

    * fix

    ---------

    Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

commit d16053c
Author: Zach Mueller <muellerzr@gmail.com>
Date:   Wed May 29 09:35:37 2024 -0400

    Rm maintainer + migrate (huggingface#31089)

commit 0bef4a2
Author: Matt <Rocketknight1@users.noreply.github.com>
Date:   Wed May 29 13:33:26 2024 +0100

    Fix faulty rstrip in module loading (huggingface#31108)

commit 97a58a5
Author: Matt <Rocketknight1@users.noreply.github.com>
Date:   Wed May 29 13:20:36 2024 +0100

    Fix env.py in cases where torch is not present (huggingface#31113)

    * Fix env.py in cases where torch is not present

    * Simplify the fix (and avoid some issues)

commit c886137
Author: Huazhong Ji <hzji210@gmail.com>
Date:   Wed May 29 18:57:54 2024 +0800

    Improve `transformers-cli env` reporting (huggingface#31003)

    * Improve `transformers-cli env` reporting

    * move the line `"Using GPU in script?": "<fill in>"` to in if conditional
    statement

    * same option for npu

commit c3044ec
Author: Lucain <lucainp@gmail.com>
Date:   Wed May 29 12:55:43 2024 +0200

    Use `HF_HUB_OFFLINE` + fix has_file in offline mode (huggingface#31016)

    * Fix has_file in offline mode

    * harmonize env variable for offline mode

    * Switch to HF_HUB_OFFLINE

    * fix test

    * revert test_offline to test TRANSFORMERS_OFFLINE

    * Add new offline test

    * merge conflicts

    * docs

commit bfe6f51
Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Date:   Wed May 29 11:43:54 2024 +0200

    FEAT: Add mistral v3 conversion script (huggingface#30981)

    * add mistral v3 conversion script

    * Update src/transformers/models/mistral/convert_mistral_weights_to_hf.py

    Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

    * fixup

    ---------

    Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

commit d521ba5
Author: Raushan Turganbay <raushan@huggingface.co>
Date:   Wed May 29 14:25:44 2024 +0500

    Quantized KV cache: update quanto (huggingface#31052)

    * quanto latest version was refactored

    * add error msg

    * incorrect compare sign

    * Update src/transformers/cache_utils.py

    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

    ---------

    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

commit a564d10
Author: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Date:   Tue May 28 18:07:07 2024 +0100

    Deprecate low use models (huggingface#30781)

    * Deprecate models
    - graphormer
    - time_series_transformer
    - xlm_prophetnet
    - qdqbert
    - nat
    - ernie_m
    - tvlt
    - nezha
    - mega
    - jukebox
    - vit_hybrid
    - x_clip
    - deta
    - speech_to_text_2
    - efficientformer
    - realm
    - gptsan_japanese

    * Fix up

    * Fix speech2text2 imports

    * Make sure message isn't indented

    * Fix docstrings

    * Correctly map for deprecated models from model_type

    * Uncomment out

    * Add back time series transformer and x-clip

    * Import fix and fix-up

    * Fix up with updated ruff

commit 7f08817
Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Date:   Tue May 28 18:29:22 2024 +0200

    Docs / Quantization: Redirect deleted page (huggingface#31063)

    Update _redirects.yml

commit 3264be4
Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Date:   Tue May 28 18:29:11 2024 +0200

    TST: Fix instruct-blip tests (huggingface#31088)

    * fix flan t5 tests

    * better format

commit 476890e
Author: Jonny Li <jonny_li@live.ca>
Date:   Tue May 28 12:25:15 2024 -0400

    Fix DeepSpeed compatibility with weight_norm (huggingface#30881) (huggingface#31018)

commit aada568
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date:   Tue May 28 17:47:35 2024 +0200

    Fix PretrainedConfig docstring with deprecated resume_download (huggingface#31014)

commit 3af7bf3
Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Date:   Tue May 28 17:44:52 2024 +0200

    skip `test_multi_gpu_data_parallel_forward` for `vit` and `deit` (huggingface#31086)

    fix

    Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

commit ab19f90
Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Date:   Tue May 28 17:06:00 2024 +0200

    FIX / OPT: Fix OPT multi-GPU training for `OPTForQuestionAnswering` (huggingface#31092)

    Update modeling_opt.py

commit 94d416f
Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Date:   Tue May 28 17:05:44 2024 +0200

    FIX: Add `accelerate` as a hard requirement (huggingface#31090)

    add accelerate

commit 22dab24
Author: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Date:   Tue May 28 16:02:51 2024 +0200

    Render chat template tojson filter as unicode (huggingface#31041)

    * Render chat template tojson filter as unicode

    * ruff--

commit 4f98b14
Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Date:   Tue May 28 15:04:43 2024 +0200

    Docs / PEFT: Add PEFT API documentation (huggingface#31078)

    * add peft references

    * add peft references

    * Update docs/source/en/peft.md

    * Update docs/source/en/peft.md

commit 779bc36
Author: Raushan Turganbay <raushan@huggingface.co>
Date:   Tue May 28 17:07:42 2024 +0500

    Watermark: fix tests (huggingface#30961)

    * fix tests

    * style

    * Update tests/generation/test_utils.py

    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

    ---------

    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

commit a3c7b59
Author: Lysandre Debut <hi@lysand.re>
Date:   Tue May 28 13:34:23 2024 +0200

    Fix failing tokenizer tests (huggingface#31083)

    * Fix failing tokenizer tests

    * Use small tokenizer

    * Fix remaining reference

commit 90da0b1
Author: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Date:   Tue May 28 13:22:06 2024 +0200

    [SuperPoint, PaliGemma] Update docs (huggingface#31025)

    * Update docs

    * Add PaliGemma resources

    * Address comment

    * Update docs

commit 66add16
Author: Sina Taslimi <33656391+taslimisina@users.noreply.github.com>
Date:   Tue May 28 13:09:32 2024 +0200

    Fix typo in trainer.py (huggingface#31048)

commit 98e2d48
Author: Pavel Iakubovskii <qubvel@gmail.com>
Date:   Tue May 28 11:06:06 2024 +0000

    Fix OWLv2 post_process_object_detection for multiple images (huggingface#31082)

    * Add test for multiple images

    * [run slow] owlv2

    * Fix box rescaling

    * [run slow] owlv2

commit c31473e
Author: Pavel Iakubovskii <qubvel@gmail.com>
Date:   Tue May 28 10:41:40 2024 +0000

    Remove float64 cast for OwlVit and OwlV2 to support MPS device (huggingface#31071)

    Remove float64

commit 936ab7b
Author: oOraph <13552058+oOraph@users.noreply.github.com>
Date:   Tue May 28 11:56:05 2024 +0200

    fix from_pretrained in offline mode when model is preloaded in cache (huggingface#31010)

    * Unit test to verify fix

    Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com>

    * fix from_pretrained in offline mode when model is preloaded in cache

    Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com>

    * minor: fmt

    Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com>

    ---------

    Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com>
    Co-authored-by: Raphael Glon <oOraph@users.noreply.github.com>

commit 537deb7
Author: Hengwen Tong <kevint324@gmail.com>
Date:   Tue May 28 17:52:47 2024 +0800

    Remove redundant backend checks in training_args.py (huggingface#30999)

    * Remove backend checks in training_args.py

    * Expilicit initialize the device

    ---------

    Co-authored-by: tonghengwen <tonghengwen@cambricon.com>

commit dd4654e
Author: AP <108011872+apalkk@users.noreply.github.com>
Date:   Tue May 28 09:50:45 2024 +0000

    Update quicktour.md to fix broken link to Glossary (huggingface#31072)

    Update quicktour.md to fix broken link

    Missing '/' in attention mask link in the transformers quicktour

commit e18da4e
Author: Clint Adams <clint@gcfm.net>
Date:   Tue May 28 05:48:23 2024 -0400

    fix "piano" typo (huggingface#31027)

commit 8e3b1fe
Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Date:   Tue May 28 11:36:26 2024 +0200

    Remove `ninja` from docker image build (huggingface#31080)

    fix

    Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

commit 8f0f727
Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Date:   Tue May 28 10:53:28 2024 +0200

    use `@main` (huggingface#31065)

    use main

    Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

commit 9d35edb
Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Date:   Mon May 27 18:36:39 2024 +0200

    skip `test_model_parallelism` for 2 model test classes (huggingface#31067)

    skip

    Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

commit d355741
Author: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
Date:   Mon May 27 16:09:05 2024 +0200

    Fix pad_to_max_length Whisper (huggingface#30787)

    * fix pad_to_max_length Whisper

    * add tests

    * make style

commit b84cd67
Author: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Date:   Mon May 27 15:53:45 2024 +0200

    Fix quanto tests (huggingface#31062)

    fix quanto tests

commit cd79777
Author: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Date:   Mon May 27 14:16:47 2024 +0100

    Update feature request label in template (huggingface#30940)

commit 0a064dc
Author: Eitan Turok <150733043+eitanturok@users.noreply.github.com>
Date:   Mon May 27 08:57:43 2024 -0400

    Follow up: Fix link in dbrx.md (huggingface#30514)

    * Fix link in dbrx.md

    * remove "though this may not be up to date"

    ---------

    Co-authored-by: Lysandre Debut <hi@lysand.re>

commit d7942d9
Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Date:   Mon May 27 13:47:47 2024 +0200

    unpin uv (huggingface#31055)

    [push-ci-image]

    Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

commit 84c4b72
Author: Aymeric Roucher <69208727+aymeric-roucher@users.noreply.github.com>
Date:   Mon May 27 10:34:14 2024 +0200

    Redirect transformers_agents doc to agents (huggingface#31054)

commit bdb9106
Author: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Date:   Fri May 24 19:02:55 2024 +0200

    Paligemma- fix devices and dtype assignments (huggingface#31008)

    * fix devices and dtype assignments

    * [run-slow]paligemma

commit deba765
Author: Ita Zaporozhets <31893021+itazap@users.noreply.github.com>
Date:   Fri May 24 17:38:58 2024 +0200

    Add split special tokens (huggingface#30772)

    * seems like `split_special_tokens` is used here

    * split special token

    * add new line at end of file

    * moving split special token test to common tests

    * added assertions

    * test

    * fixup

    * add co-author

    * passing rest of args to gptsan_japanese, fixing tests

    * removing direct comparison of fast and slow models

    * adding test support for UDOP and LayoutXLM

    * ruff fix

    * readd check if slow tokenizer

    * modify test to handle bos tokens

    * removing commented function

    * trigger build

    * applying review feedback - updated docstrings, var names, and simplified tests

    * ruff fixes

    * Update tests/test_tokenization_common.py

    Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

    * applying feedback, comments

    * shutil temp directory fix

    ---------

    Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
    Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MBP.localdomain>
    Co-authored-by: itazap <itazap@users.noreply.github.com>
    Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
    Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MacBook-Pro.local>

commit e5103a7
Author: BHUVAN M <121122109+bhuvanmdev@users.noreply.github.com>
Date:   Fri May 24 20:50:09 2024 +0530

    added interpolation for vitmae model in pytorch as well as tf. (huggingface#30732)

    * added interpolation for vitmae model in pytorch as well as tf.

    * Update modeling_vit_mae.py

    irreugalr import fixed

    * small changes and proper formatting

    * changes suggested in review.

    * modified decoder interpolate_func

    * arguments and docstring fix

    * Apply suggestions from code review

    doc fixes

    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

    ---------

    Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Jun 11, 2024
* add mistral v3 conversion script

* Update src/transformers/models/mistral/convert_mistral_weights_to_hf.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert_mistral_weights_to_hf fails loading consolidated.safetensors
3 participants