FEAT: Add mistral v3 conversion script #30981

younesbelkada · 2024-05-23T09:09:47Z

What does this PR do?

Adds the conversion script to convert mistral-v3 models

HuggingFaceDocBuilderDev · 2024-05-23T09:29:10Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

LGTM but we are missing the function calling tokens that need to be added to the tokenizer!

ArthurZucker

LGTM

src/transformers/models/mistral/convert_mistral_weights_to_hf.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add mistral v3 conversion script * Update src/transformers/models/mistral/convert_mistral_weights_to_hf.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

commit bf6ea14 Merge: b3261f5 96eb062 Author: Vasqu <antonprogamer@gmail.com> Date: Sat Jun 1 02:49:53 2024 +0200 Merge remote-tracking branch 'origin/main' commit b3261f5 Author: Arthur <48595927+ArthurZucker@users.noreply.github.com> Date: Fri May 31 18:37:43 2024 +0200 Diff converter v2 (huggingface#30868) * current working example! * commit regex and result file * update * nit * push the conversion file * oups * roadmap and nits * attempt diffs for 3 files * persimmon * nit * add diff file that is the same as the modeling_llama.py * fix rope nits * updates * updates with converted versions * give some breathing space to the code * delete * update * update * push the actual result * update regex patterns * update regex patterns * fix some issues * fix some issues * fix some issues * updates * updates * updates * updates * updates * revert changes done to llama * updates * update gemma * updates * oups * current state * current state * update * ouiiii * nit * clear diffs * nit * fixup * update * doc 🚀 * 🔥 * for now use gemma * deal with comments * style * handle funtions * deal with assigns * todos * process inheritage * keep decorators? * 🤗 * deal with duplicates * fixup * correctly remove duplicate code * run ruff post script * ruff deals pretty well with imports, let's leave it to him * ah maybe not lol * for now remove all imports from child. * nit * conversion of llama * okay * convert starcoder2 * synch with main * update llama diff * updates * https://docs.astral.sh/ruff/rules/redefined-while-unused/ fixes the imports, bit needs later version of ruff * updates * okay actual state * non zero exit * update! * revert unrelated * remove other diff files * updates * cleanup * update * less diff! * stash * current updates * updates * No need for call * finished fining deps * update * current changes * current state * current state * new status * nit * finally * fixes * nits * order is now expected * use logger info instead of prints * fixup * up * nit * update * nits * update * correct merge * update * update * update * add warning * update caution message * update * better merging strategy * copy class statements :wink * fixups * nits * update * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * nits * smaller header * do cleanup some stuff * even simpler header? * fixup * updates * ruff * update examples * nit * TODO * state * OUUUUUUF * current state * nits * final state * add a readme * fixup * remove diff llama * fix * nit * dummy noy funny * ruff format tests src utils --check * everless diffs * less diffs and fix test * fixes * naming nit? * update converter and add supper example * nits * updated for function signatures * update * update * add converted dummies * autoformat * single target assign fix * fixup * fix some imports * fixes * don't push them * `# noqa: F841` --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> commit ba34b39 Author: Vallepu Vamsi Krishna <vallepu670@gmail.com> Date: Fri May 31 21:53:11 2024 +0530 Added description of quantization_config (huggingface#31133) * Description of quantization_config Added missing description about quantization_config in replace_with_bnb_linear for better readability. * Removed trailing spaces commit 2a2ec42 Author: Pavel Iakubovskii <qubvel@gmail.com> Date: Fri May 31 16:56:17 2024 +0100 Instance segmentation examples (huggingface#31084) * Initial setup * Metrics * Overfit on two batches * Train 40 epochs * Memory leak debugging * Trainer fine-tuning * Draft * Fixup * Trained end-to-end * Add requirements * Rewrite evaluator * nits * Add readme * Add instance-segmentation to the table * Support void masks * Remove sh * Update docs * Add pytorch test * Add accelerate test * Update examples/pytorch/instance-segmentation/README.md * Update examples/pytorch/instance-segmentation/run_instance_segmentation.py * Update examples/pytorch/instance-segmentation/run_instance_segmentation_no_trainer.py * Update examples/pytorch/instance-segmentation/run_instance_segmentation_no_trainer.py * Update examples/pytorch/instance-segmentation/run_instance_segmentation.py * Fix consistency oneformer * Fix imports * Fix imports sort * Apply suggestions from code review Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update examples/pytorch/instance-segmentation/run_instance_segmentation.py Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> * Add resources to docs * Update examples/pytorch/instance-segmentation/README.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update examples/pytorch/instance-segmentation/README.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Remove explicit model_type argument * Fix tests * Update readme * Note about other models --------- Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> commit 3231ed4 Author: Aymeric Roucher <69208727+aymeric-roucher@users.noreply.github.com> Date: Fri May 31 14:16:23 2024 +0200 Add streaming, various fixes (huggingface#30838) * Implement streaming run in ReAct agents * Allow additional imports in code agents * Python interpreter: support classes and exceptions, fixes commit 899d73f Author: Marc Sun <57196510+SunMarc@users.noreply.github.com> Date: Fri May 31 12:44:20 2024 +0200 [trainer] add sanity evaluation option (huggingface#31146) * add sanity evaluation * fix * Apply suggestions from code review Co-authored-by: Zach Mueller <muellerzr@gmail.com> * fix --------- Co-authored-by: Zach Mueller <muellerzr@gmail.com> commit 09daece Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Fri May 31 12:36:46 2024 +0200 Quantization: Enhance bnb error message (huggingface#31160) enhance error message commit 390c9f4 Author: Asif Ajrof <asifajrof@gmail.com> Date: Fri May 31 16:34:29 2024 +0600 Update sam.md (huggingface#31130) `mask` variable is not defined. probably a writing mistake. it should be `segmentation_map`. `segmentation_map` should be a `1` channel image rather than `RGB`. [on a different note, the `mask_url` is the same as `raw_image`. could provide a better example. commit a6967c0 Author: Marc Sun <57196510+SunMarc@users.noreply.github.com> Date: Fri May 31 12:08:55 2024 +0200 Fix quantized cache output (huggingface#31143) commit aa2e1d4 Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Fri May 31 10:35:54 2024 +0200 pytest -rsfE (huggingface#31140) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> commit 6c33f18 Author: Arthur <48595927+ArthurZucker@users.noreply.github.com> Date: Fri May 31 08:49:33 2024 +0200 helper (huggingface#31152) * helper * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * updates * more doc --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> commit adb74a2 Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Thu May 30 17:21:10 2024 +0200 Workflow: Remove `IS_GITHUB_CI` (huggingface#31147) remove `IS_GITHUB_CI` commit 3553184 Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Thu May 30 16:47:35 2024 +0200 Docs / Quantization: Replace all occurences of `load_in_8bit` with bnb config (huggingface#31136) Replace all occurences of `load_in_8bit` with bnb config commit e6dcdfd Author: zspo <songpo.zhang@foxmail.com> Date: Thu May 30 22:25:43 2024 +0800 fix get_scheduler when name is warmup_stable_decay (huggingface#31128) fix get_scheduler args commit 9d8b6ea Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Thu May 30 11:45:03 2024 +0200 FIX / Quantization: Add extra validation for bnb config (huggingface#31135) add validation for bnb config commit 7fc432f Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Wed May 29 19:43:51 2024 +0200 Cleanup docker build (huggingface#31119) * remove * build --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> commit c350b52 Author: Dhruv Pai <46631243+dhruvbpai@users.noreply.github.com> Date: Wed May 29 07:20:59 2024 -0700 Add on_optimizer_step to callback options (huggingface#31095) * Modified test * Added on_optimizer_step to callbacks * Move callback after step is called * Added on optimizer step callback commit 545d7ca Author: Joao Gante <joaofranciscocardosogante@gmail.com> Date: Wed May 29 15:17:14 2024 +0100 Add VLM generation default contributor (huggingface#31115) * add Raushan * add Raushan commit 296c546 Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Wed May 29 15:56:28 2024 +0200 FIX / Docs: Fix GPTQ expected number of bits (huggingface#31111) Update overview.md commit b643801 Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Wed May 29 15:42:39 2024 +0200 Fix nightly circleci (huggingface#31114) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> commit 89261a1 Author: Zach Mueller <muellerzr@gmail.com> Date: Wed May 29 09:35:37 2024 -0400 Rm maintainer + migrate (huggingface#31089) commit 0e3643c Author: Matt <Rocketknight1@users.noreply.github.com> Date: Wed May 29 13:33:26 2024 +0100 Fix faulty rstrip in module loading (huggingface#31108) commit a41deea Author: Matt <Rocketknight1@users.noreply.github.com> Date: Wed May 29 13:20:36 2024 +0100 Fix env.py in cases where torch is not present (huggingface#31113) * Fix env.py in cases where torch is not present * Simplify the fix (and avoid some issues) commit 61f854a Author: Huazhong Ji <hzji210@gmail.com> Date: Wed May 29 18:57:54 2024 +0800 Improve `transformers-cli env` reporting (huggingface#31003) * Improve `transformers-cli env` reporting * move the line `"Using GPU in script?": "<fill in>"` to in if conditional statement * same option for npu commit 40ed3a8 Author: Lucain <lucainp@gmail.com> Date: Wed May 29 12:55:43 2024 +0200 Use `HF_HUB_OFFLINE` + fix has_file in offline mode (huggingface#31016) * Fix has_file in offline mode * harmonize env variable for offline mode * Switch to HF_HUB_OFFLINE * fix test * revert test_offline to test TRANSFORMERS_OFFLINE * Add new offline test * merge conflicts * docs commit 300d03c Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Wed May 29 11:43:54 2024 +0200 FEAT: Add mistral v3 conversion script (huggingface#30981) * add mistral v3 conversion script * Update src/transformers/models/mistral/convert_mistral_weights_to_hf.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> commit 524d7bf Author: Raushan Turganbay <raushan@huggingface.co> Date: Wed May 29 14:25:44 2024 +0500 Quantized KV cache: update quanto (huggingface#31052) * quanto latest version was refactored * add error msg * incorrect compare sign * Update src/transformers/cache_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> commit 9f98c9c Author: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Date: Tue May 28 18:07:07 2024 +0100 Deprecate low use models (huggingface#30781) * Deprecate models - graphormer - time_series_transformer - xlm_prophetnet - qdqbert - nat - ernie_m - tvlt - nezha - mega - jukebox - vit_hybrid - x_clip - deta - speech_to_text_2 - efficientformer - realm - gptsan_japanese * Fix up * Fix speech2text2 imports * Make sure message isn't indented * Fix docstrings * Correctly map for deprecated models from model_type * Uncomment out * Add back time series transformer and x-clip * Import fix and fix-up * Fix up with updated ruff commit 1cb30f0 Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Tue May 28 18:29:22 2024 +0200 Docs / Quantization: Redirect deleted page (huggingface#31063) Update _redirects.yml commit 1ed4924 Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Tue May 28 18:29:11 2024 +0200 TST: Fix instruct-blip tests (huggingface#31088) * fix flan t5 tests * better format commit 2a08fd3 Author: Jonny Li <jonny_li@live.ca> Date: Tue May 28 12:25:15 2024 -0400 Fix DeepSpeed compatibility with weight_norm (huggingface#30881) (huggingface#31018) commit b5f4ec6 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue May 28 17:47:35 2024 +0200 Fix PretrainedConfig docstring with deprecated resume_download (huggingface#31014) commit 454cbe0 Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Tue May 28 17:44:52 2024 +0200 skip `test_multi_gpu_data_parallel_forward` for `vit` and `deit` (huggingface#31086) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> commit e70c2ea Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Tue May 28 17:06:00 2024 +0200 FIX / OPT: Fix OPT multi-GPU training for `OPTForQuestionAnswering` (huggingface#31092) Update modeling_opt.py commit 6560e25 Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Tue May 28 17:05:44 2024 +0200 FIX: Add `accelerate` as a hard requirement (huggingface#31090) add accelerate commit 9bf05ec Author: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Date: Tue May 28 16:02:51 2024 +0200 Render chat template tojson filter as unicode (huggingface#31041) * Render chat template tojson filter as unicode * ruff-- commit e405f2b Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Tue May 28 15:04:43 2024 +0200 Docs / PEFT: Add PEFT API documentation (huggingface#31078) * add peft references * add peft references * Update docs/source/en/peft.md * Update docs/source/en/peft.md commit 5237955 Author: Raushan Turganbay <raushan@huggingface.co> Date: Tue May 28 17:07:42 2024 +0500 Watermark: fix tests (huggingface#30961) * fix tests * style * Update tests/generation/test_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> commit f2a7f7c Author: Lysandre Debut <hi@lysand.re> Date: Tue May 28 13:34:23 2024 +0200 Fix failing tokenizer tests (huggingface#31083) * Fix failing tokenizer tests * Use small tokenizer * Fix remaining reference commit 0e1935b Author: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Date: Tue May 28 13:22:06 2024 +0200 [SuperPoint, PaliGemma] Update docs (huggingface#31025) * Update docs * Add PaliGemma resources * Address comment * Update docs commit 2fe8356 Author: Sina Taslimi <33656391+taslimisina@users.noreply.github.com> Date: Tue May 28 13:09:32 2024 +0200 Fix typo in trainer.py (huggingface#31048) commit b74960c Author: Pavel Iakubovskii <qubvel@gmail.com> Date: Tue May 28 11:06:06 2024 +0000 Fix OWLv2 post_process_object_detection for multiple images (huggingface#31082) * Add test for multiple images * [run slow] owlv2 * Fix box rescaling * [run slow] owlv2 commit 3e3599d Author: Pavel Iakubovskii <qubvel@gmail.com> Date: Tue May 28 10:41:40 2024 +0000 Remove float64 cast for OwlVit and OwlV2 to support MPS device (huggingface#31071) Remove float64 commit 48d33da Author: oOraph <13552058+oOraph@users.noreply.github.com> Date: Tue May 28 11:56:05 2024 +0200 fix from_pretrained in offline mode when model is preloaded in cache (huggingface#31010) * Unit test to verify fix Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com> * fix from_pretrained in offline mode when model is preloaded in cache Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com> * minor: fmt Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com> --------- Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com> Co-authored-by: Raphael Glon <oOraph@users.noreply.github.com> commit 7c472e6 Author: Hengwen Tong <kevint324@gmail.com> Date: Tue May 28 17:52:47 2024 +0800 Remove redundant backend checks in training_args.py (huggingface#30999) * Remove backend checks in training_args.py * Expilicit initialize the device --------- Co-authored-by: tonghengwen <tonghengwen@cambricon.com> commit 46b606e Author: AP <108011872+apalkk@users.noreply.github.com> Date: Tue May 28 09:50:45 2024 +0000 Update quicktour.md to fix broken link to Glossary (huggingface#31072) Update quicktour.md to fix broken link Missing '/' in attention mask link in the transformers quicktour commit 580f464 Author: Clint Adams <clint@gcfm.net> Date: Tue May 28 05:48:23 2024 -0400 fix "piano" typo (huggingface#31027) commit 5e211d5 Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Tue May 28 11:36:26 2024 +0200 Remove `ninja` from docker image build (huggingface#31080) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> commit 8b91c20 Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Tue May 28 10:53:28 2024 +0200 use `@main` (huggingface#31065) use main Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> commit 04440a0 Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Mon May 27 18:36:39 2024 +0200 skip `test_model_parallelism` for 2 model test classes (huggingface#31067) skip Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> commit f803e2b Author: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> Date: Mon May 27 16:09:05 2024 +0200 Fix pad_to_max_length Whisper (huggingface#30787) * fix pad_to_max_length Whisper * add tests * make style commit b6eb29b Author: Marc Sun <57196510+SunMarc@users.noreply.github.com> Date: Mon May 27 15:53:45 2024 +0200 Fix quanto tests (huggingface#31062) fix quanto tests commit e581213 Author: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Date: Mon May 27 14:16:47 2024 +0100 Update feature request label in template (huggingface#30940) commit 05eff71 Author: Eitan Turok <150733043+eitanturok@users.noreply.github.com> Date: Mon May 27 08:57:43 2024 -0400 Follow up: Fix link in dbrx.md (huggingface#30514) * Fix link in dbrx.md * remove "though this may not be up to date" --------- Co-authored-by: Lysandre Debut <hi@lysand.re> commit d5aa839 Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Mon May 27 13:47:47 2024 +0200 unpin uv (huggingface#31055) [push-ci-image] Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> commit 165bd7a Author: Aymeric Roucher <69208727+aymeric-roucher@users.noreply.github.com> Date: Mon May 27 10:34:14 2024 +0200 Redirect transformers_agents doc to agents (huggingface#31054) commit 6df5028 Author: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Date: Fri May 24 19:02:55 2024 +0200 Paligemma- fix devices and dtype assignments (huggingface#31008) * fix devices and dtype assignments * [run-slow]paligemma commit 61f1d47 Author: Ita Zaporozhets <31893021+itazap@users.noreply.github.com> Date: Fri May 24 17:38:58 2024 +0200 Add split special tokens (huggingface#30772) * seems like `split_special_tokens` is used here * split special token * add new line at end of file * moving split special token test to common tests * added assertions * test * fixup * add co-author * passing rest of args to gptsan_japanese, fixing tests * removing direct comparison of fast and slow models * adding test support for UDOP and LayoutXLM * ruff fix * readd check if slow tokenizer * modify test to handle bos tokens * removing commented function * trigger build * applying review feedback - updated docstrings, var names, and simplified tests * ruff fixes * Update tests/test_tokenization_common.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * applying feedback, comments * shutil temp directory fix --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MBP.localdomain> Co-authored-by: itazap <itazap@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MacBook-Pro.local> commit e2b9913 Author: BHUVAN M <121122109+bhuvanmdev@users.noreply.github.com> Date: Fri May 24 20:50:09 2024 +0530 added interpolation for vitmae model in pytorch as well as tf. (huggingface#30732) * added interpolation for vitmae model in pytorch as well as tf. * Update modeling_vit_mae.py irreugalr import fixed * small changes and proper formatting * changes suggested in review. * modified decoder interpolate_func * arguments and docstring fix * Apply suggestions from code review doc fixes Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> commit 96eb062 Author: Arthur <48595927+ArthurZucker@users.noreply.github.com> Date: Fri May 31 18:37:43 2024 +0200 Diff converter v2 (huggingface#30868) * current working example! * commit regex and result file * update * nit * push the conversion file * oups * roadmap and nits * attempt diffs for 3 files * persimmon * nit * add diff file that is the same as the modeling_llama.py * fix rope nits * updates * updates with converted versions * give some breathing space to the code * delete * update * update * push the actual result * update regex patterns * update regex patterns * fix some issues * fix some issues * fix some issues * updates * updates * updates * updates * updates * revert changes done to llama * updates * update gemma * updates * oups * current state * current state * update * ouiiii * nit * clear diffs * nit * fixup * update * doc 🚀 * 🔥 * for now use gemma * deal with comments * style * handle funtions * deal with assigns * todos * process inheritage * keep decorators? * 🤗 * deal with duplicates * fixup * correctly remove duplicate code * run ruff post script * ruff deals pretty well with imports, let's leave it to him * ah maybe not lol * for now remove all imports from child. * nit * conversion of llama * okay * convert starcoder2 * synch with main * update llama diff * updates * https://docs.astral.sh/ruff/rules/redefined-while-unused/ fixes the imports, bit needs later version of ruff * updates * okay actual state * non zero exit * update! * revert unrelated * remove other diff files * updates * cleanup * update * less diff! * stash * current updates * updates * No need for call * finished fining deps * update * current changes * current state * current state * new status * nit * finally * fixes * nits * order is now expected * use logger info instead of prints * fixup * up * nit * update * nits * update * correct merge * update * update * update * add warning * update caution message * update * better merging strategy * copy class statements :wink * fixups * nits * update * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * nits * smaller header * do cleanup some stuff * even simpler header? * fixup * updates * ruff * update examples * nit * TODO * state * OUUUUUUF * current state * nits * final state * add a readme * fixup * remove diff llama * fix * nit * dummy noy funny * ruff format tests src utils --check * everless diffs * less diffs and fix test * fixes * naming nit? * update converter and add supper example * nits * updated for function signatures * update * update * add converted dummies * autoformat * single target assign fix * fixup * fix some imports * fixes * don't push them * `# noqa: F841` --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> commit 372baec Author: Vallepu Vamsi Krishna <vallepu670@gmail.com> Date: Fri May 31 21:53:11 2024 +0530 Added description of quantization_config (huggingface#31133) * Description of quantization_config Added missing description about quantization_config in replace_with_bnb_linear for better readability. * Removed trailing spaces commit cdc8131 Author: Pavel Iakubovskii <qubvel@gmail.com> Date: Fri May 31 16:56:17 2024 +0100 Instance segmentation examples (huggingface#31084) * Initial setup * Metrics * Overfit on two batches * Train 40 epochs * Memory leak debugging * Trainer fine-tuning * Draft * Fixup * Trained end-to-end * Add requirements * Rewrite evaluator * nits * Add readme * Add instance-segmentation to the table * Support void masks * Remove sh * Update docs * Add pytorch test * Add accelerate test * Update examples/pytorch/instance-segmentation/README.md * Update examples/pytorch/instance-segmentation/run_instance_segmentation.py * Update examples/pytorch/instance-segmentation/run_instance_segmentation_no_trainer.py * Update examples/pytorch/instance-segmentation/run_instance_segmentation_no_trainer.py * Update examples/pytorch/instance-segmentation/run_instance_segmentation.py * Fix consistency oneformer * Fix imports * Fix imports sort * Apply suggestions from code review Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update examples/pytorch/instance-segmentation/run_instance_segmentation.py Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> * Add resources to docs * Update examples/pytorch/instance-segmentation/README.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update examples/pytorch/instance-segmentation/README.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Remove explicit model_type argument * Fix tests * Update readme * Note about other models --------- Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> commit 9837a25 Author: Aymeric Roucher <69208727+aymeric-roucher@users.noreply.github.com> Date: Fri May 31 14:16:23 2024 +0200 Add streaming, various fixes (huggingface#30838) * Implement streaming run in ReAct agents * Allow additional imports in code agents * Python interpreter: support classes and exceptions, fixes commit f8e6ba4 Author: Marc Sun <57196510+SunMarc@users.noreply.github.com> Date: Fri May 31 12:44:20 2024 +0200 [trainer] add sanity evaluation option (huggingface#31146) * add sanity evaluation * fix * Apply suggestions from code review Co-authored-by: Zach Mueller <muellerzr@gmail.com> * fix --------- Co-authored-by: Zach Mueller <muellerzr@gmail.com> commit fc5d3e1 Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Fri May 31 12:36:46 2024 +0200 Quantization: Enhance bnb error message (huggingface#31160) enhance error message commit bd9d1dd Author: Asif Ajrof <asifajrof@gmail.com> Date: Fri May 31 16:34:29 2024 +0600 Update sam.md (huggingface#31130) `mask` variable is not defined. probably a writing mistake. it should be `segmentation_map`. `segmentation_map` should be a `1` channel image rather than `RGB`. [on a different note, the `mask_url` is the same as `raw_image`. could provide a better example. commit 48cada8 Author: Marc Sun <57196510+SunMarc@users.noreply.github.com> Date: Fri May 31 12:08:55 2024 +0200 Fix quantized cache output (huggingface#31143) commit d19566e Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Fri May 31 10:35:54 2024 +0200 pytest -rsfE (huggingface#31140) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> commit f3f640d Author: Arthur <48595927+ArthurZucker@users.noreply.github.com> Date: Fri May 31 08:49:33 2024 +0200 helper (huggingface#31152) * helper * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * updates * more doc --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> commit 6bd511a Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Thu May 30 17:21:10 2024 +0200 Workflow: Remove `IS_GITHUB_CI` (huggingface#31147) remove `IS_GITHUB_CI` commit f5590de Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Thu May 30 16:47:35 2024 +0200 Docs / Quantization: Replace all occurences of `load_in_8bit` with bnb config (huggingface#31136) Replace all occurences of `load_in_8bit` with bnb config commit cda9c82 Author: zspo <songpo.zhang@foxmail.com> Date: Thu May 30 22:25:43 2024 +0800 fix get_scheduler when name is warmup_stable_decay (huggingface#31128) fix get_scheduler args commit 5e5c4d6 Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Thu May 30 11:45:03 2024 +0200 FIX / Quantization: Add extra validation for bnb config (huggingface#31135) add validation for bnb config commit 2b9e252 Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Wed May 29 19:43:51 2024 +0200 Cleanup docker build (huggingface#31119) * remove * build --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> commit 5c88253 Author: Dhruv Pai <46631243+dhruvbpai@users.noreply.github.com> Date: Wed May 29 07:20:59 2024 -0700 Add on_optimizer_step to callback options (huggingface#31095) * Modified test * Added on_optimizer_step to callbacks * Move callback after step is called * Added on optimizer step callback commit 4af705c Author: Joao Gante <joaofranciscocardosogante@gmail.com> Date: Wed May 29 15:17:14 2024 +0100 Add VLM generation default contributor (huggingface#31115) * add Raushan * add Raushan commit cb879c5 Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Wed May 29 15:56:28 2024 +0200 FIX / Docs: Fix GPTQ expected number of bits (huggingface#31111) Update overview.md commit 1f84141 Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Wed May 29 15:42:39 2024 +0200 Fix nightly circleci (huggingface#31114) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> commit d16053c Author: Zach Mueller <muellerzr@gmail.com> Date: Wed May 29 09:35:37 2024 -0400 Rm maintainer + migrate (huggingface#31089) commit 0bef4a2 Author: Matt <Rocketknight1@users.noreply.github.com> Date: Wed May 29 13:33:26 2024 +0100 Fix faulty rstrip in module loading (huggingface#31108) commit 97a58a5 Author: Matt <Rocketknight1@users.noreply.github.com> Date: Wed May 29 13:20:36 2024 +0100 Fix env.py in cases where torch is not present (huggingface#31113) * Fix env.py in cases where torch is not present * Simplify the fix (and avoid some issues) commit c886137 Author: Huazhong Ji <hzji210@gmail.com> Date: Wed May 29 18:57:54 2024 +0800 Improve `transformers-cli env` reporting (huggingface#31003) * Improve `transformers-cli env` reporting * move the line `"Using GPU in script?": "<fill in>"` to in if conditional statement * same option for npu commit c3044ec Author: Lucain <lucainp@gmail.com> Date: Wed May 29 12:55:43 2024 +0200 Use `HF_HUB_OFFLINE` + fix has_file in offline mode (huggingface#31016) * Fix has_file in offline mode * harmonize env variable for offline mode * Switch to HF_HUB_OFFLINE * fix test * revert test_offline to test TRANSFORMERS_OFFLINE * Add new offline test * merge conflicts * docs commit bfe6f51 Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Wed May 29 11:43:54 2024 +0200 FEAT: Add mistral v3 conversion script (huggingface#30981) * add mistral v3 conversion script * Update src/transformers/models/mistral/convert_mistral_weights_to_hf.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> commit d521ba5 Author: Raushan Turganbay <raushan@huggingface.co> Date: Wed May 29 14:25:44 2024 +0500 Quantized KV cache: update quanto (huggingface#31052) * quanto latest version was refactored * add error msg * incorrect compare sign * Update src/transformers/cache_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> commit a564d10 Author: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Date: Tue May 28 18:07:07 2024 +0100 Deprecate low use models (huggingface#30781) * Deprecate models - graphormer - time_series_transformer - xlm_prophetnet - qdqbert - nat - ernie_m - tvlt - nezha - mega - jukebox - vit_hybrid - x_clip - deta - speech_to_text_2 - efficientformer - realm - gptsan_japanese * Fix up * Fix speech2text2 imports * Make sure message isn't indented * Fix docstrings * Correctly map for deprecated models from model_type * Uncomment out * Add back time series transformer and x-clip * Import fix and fix-up * Fix up with updated ruff commit 7f08817 Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Tue May 28 18:29:22 2024 +0200 Docs / Quantization: Redirect deleted page (huggingface#31063) Update _redirects.yml commit 3264be4 Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Tue May 28 18:29:11 2024 +0200 TST: Fix instruct-blip tests (huggingface#31088) * fix flan t5 tests * better format commit 476890e Author: Jonny Li <jonny_li@live.ca> Date: Tue May 28 12:25:15 2024 -0400 Fix DeepSpeed compatibility with weight_norm (huggingface#30881) (huggingface#31018) commit aada568 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue May 28 17:47:35 2024 +0200 Fix PretrainedConfig docstring with deprecated resume_download (huggingface#31014) commit 3af7bf3 Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Tue May 28 17:44:52 2024 +0200 skip `test_multi_gpu_data_parallel_forward` for `vit` and `deit` (huggingface#31086) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> commit ab19f90 Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Tue May 28 17:06:00 2024 +0200 FIX / OPT: Fix OPT multi-GPU training for `OPTForQuestionAnswering` (huggingface#31092) Update modeling_opt.py commit 94d416f Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Tue May 28 17:05:44 2024 +0200 FIX: Add `accelerate` as a hard requirement (huggingface#31090) add accelerate commit 22dab24 Author: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Date: Tue May 28 16:02:51 2024 +0200 Render chat template tojson filter as unicode (huggingface#31041) * Render chat template tojson filter as unicode * ruff-- commit 4f98b14 Author: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Date: Tue May 28 15:04:43 2024 +0200 Docs / PEFT: Add PEFT API documentation (huggingface#31078) * add peft references * add peft references * Update docs/source/en/peft.md * Update docs/source/en/peft.md commit 779bc36 Author: Raushan Turganbay <raushan@huggingface.co> Date: Tue May 28 17:07:42 2024 +0500 Watermark: fix tests (huggingface#30961) * fix tests * style * Update tests/generation/test_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> commit a3c7b59 Author: Lysandre Debut <hi@lysand.re> Date: Tue May 28 13:34:23 2024 +0200 Fix failing tokenizer tests (huggingface#31083) * Fix failing tokenizer tests * Use small tokenizer * Fix remaining reference commit 90da0b1 Author: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Date: Tue May 28 13:22:06 2024 +0200 [SuperPoint, PaliGemma] Update docs (huggingface#31025) * Update docs * Add PaliGemma resources * Address comment * Update docs commit 66add16 Author: Sina Taslimi <33656391+taslimisina@users.noreply.github.com> Date: Tue May 28 13:09:32 2024 +0200 Fix typo in trainer.py (huggingface#31048) commit 98e2d48 Author: Pavel Iakubovskii <qubvel@gmail.com> Date: Tue May 28 11:06:06 2024 +0000 Fix OWLv2 post_process_object_detection for multiple images (huggingface#31082) * Add test for multiple images * [run slow] owlv2 * Fix box rescaling * [run slow] owlv2 commit c31473e Author: Pavel Iakubovskii <qubvel@gmail.com> Date: Tue May 28 10:41:40 2024 +0000 Remove float64 cast for OwlVit and OwlV2 to support MPS device (huggingface#31071) Remove float64 commit 936ab7b Author: oOraph <13552058+oOraph@users.noreply.github.com> Date: Tue May 28 11:56:05 2024 +0200 fix from_pretrained in offline mode when model is preloaded in cache (huggingface#31010) * Unit test to verify fix Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com> * fix from_pretrained in offline mode when model is preloaded in cache Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com> * minor: fmt Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com> --------- Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com> Co-authored-by: Raphael Glon <oOraph@users.noreply.github.com> commit 537deb7 Author: Hengwen Tong <kevint324@gmail.com> Date: Tue May 28 17:52:47 2024 +0800 Remove redundant backend checks in training_args.py (huggingface#30999) * Remove backend checks in training_args.py * Expilicit initialize the device --------- Co-authored-by: tonghengwen <tonghengwen@cambricon.com> commit dd4654e Author: AP <108011872+apalkk@users.noreply.github.com> Date: Tue May 28 09:50:45 2024 +0000 Update quicktour.md to fix broken link to Glossary (huggingface#31072) Update quicktour.md to fix broken link Missing '/' in attention mask link in the transformers quicktour commit e18da4e Author: Clint Adams <clint@gcfm.net> Date: Tue May 28 05:48:23 2024 -0400 fix "piano" typo (huggingface#31027) commit 8e3b1fe Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Tue May 28 11:36:26 2024 +0200 Remove `ninja` from docker image build (huggingface#31080) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> commit 8f0f727 Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Tue May 28 10:53:28 2024 +0200 use `@main` (huggingface#31065) use main Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> commit 9d35edb Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Mon May 27 18:36:39 2024 +0200 skip `test_model_parallelism` for 2 model test classes (huggingface#31067) skip Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> commit d355741 Author: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com> Date: Mon May 27 16:09:05 2024 +0200 Fix pad_to_max_length Whisper (huggingface#30787) * fix pad_to_max_length Whisper * add tests * make style commit b84cd67 Author: Marc Sun <57196510+SunMarc@users.noreply.github.com> Date: Mon May 27 15:53:45 2024 +0200 Fix quanto tests (huggingface#31062) fix quanto tests commit cd79777 Author: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Date: Mon May 27 14:16:47 2024 +0100 Update feature request label in template (huggingface#30940) commit 0a064dc Author: Eitan Turok <150733043+eitanturok@users.noreply.github.com> Date: Mon May 27 08:57:43 2024 -0400 Follow up: Fix link in dbrx.md (huggingface#30514) * Fix link in dbrx.md * remove "though this may not be up to date" --------- Co-authored-by: Lysandre Debut <hi@lysand.re> commit d7942d9 Author: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Mon May 27 13:47:47 2024 +0200 unpin uv (huggingface#31055) [push-ci-image] Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> commit 84c4b72 Author: Aymeric Roucher <69208727+aymeric-roucher@users.noreply.github.com> Date: Mon May 27 10:34:14 2024 +0200 Redirect transformers_agents doc to agents (huggingface#31054) commit bdb9106 Author: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Date: Fri May 24 19:02:55 2024 +0200 Paligemma- fix devices and dtype assignments (huggingface#31008) * fix devices and dtype assignments * [run-slow]paligemma commit deba765 Author: Ita Zaporozhets <31893021+itazap@users.noreply.github.com> Date: Fri May 24 17:38:58 2024 +0200 Add split special tokens (huggingface#30772) * seems like `split_special_tokens` is used here * split special token * add new line at end of file * moving split special token test to common tests * added assertions * test * fixup * add co-author * passing rest of args to gptsan_japanese, fixing tests * removing direct comparison of fast and slow models * adding test support for UDOP and LayoutXLM * ruff fix * readd check if slow tokenizer * modify test to handle bos tokens * removing commented function * trigger build * applying review feedback - updated docstrings, var names, and simplified tests * ruff fixes * Update tests/test_tokenization_common.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * applying feedback, comments * shutil temp directory fix --------- Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com> Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MBP.localdomain> Co-authored-by: itazap <itazap@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Ita Zaporozhets <itazaporozhets@Itas-MacBook-Pro.local> commit e5103a7 Author: BHUVAN M <121122109+bhuvanmdev@users.noreply.github.com> Date: Fri May 24 20:50:09 2024 +0530 added interpolation for vitmae model in pytorch as well as tf. (huggingface#30732) * added interpolation for vitmae model in pytorch as well as tf. * Update modeling_vit_mae.py irreugalr import fixed * small changes and proper formatting * changes suggested in review. * modified decoder interpolate_func * arguments and docstring fix * Apply suggestions from code review doc fixes Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add mistral v3 conversion script * Update src/transformers/models/mistral/convert_mistral_weights_to_hf.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

add mistral v3 conversion script

cf9fcba

younesbelkada requested a review from ArthurZucker May 23, 2024 09:10

ArthurZucker reviewed May 24, 2024

View reviewed changes

younesbelkada mentioned this pull request May 29, 2024

Convert_mistral_weights_to_hf fails loading consolidated.safetensors #31093

Closed

4 tasks

ArthurZucker approved these changes May 29, 2024

View reviewed changes

src/transformers/models/mistral/convert_mistral_weights_to_hf.py Outdated Show resolved Hide resolved

younesbelkada and others added 2 commits May 29, 2024 11:08

Update src/transformers/models/mistral/convert_mistral_weights_to_hf.py

dce85d5

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

fixup

b02ecb0

younesbelkada merged commit bfe6f51 into main May 29, 2024
8 checks passed

younesbelkada deleted the add-mistral-conversion-script branch May 29, 2024 09:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Add mistral v3 conversion script #30981

FEAT: Add mistral v3 conversion script #30981

younesbelkada commented May 23, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented May 23, 2024

ArthurZucker left a comment

ArthurZucker left a comment

FEAT: Add mistral v3 conversion script #30981

FEAT: Add mistral v3 conversion script #30981

Conversation

younesbelkada commented May 23, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented May 23, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

younesbelkada commented May 23, 2024 •

edited

Loading