06 Dec 11:42

de88c70

Version 0.14.0: EVA, Context-aware Prompt Tuning, Bone, and more Latest

Latest

Highlights

New Methods

Context-aware Prompt Tuning

@tsachiblau added a new soft prompt method called Context-aware Prompt Tuning (CPT) which is a combination of In-Context Learning and Prompt Tuning in the sense that, for each training sample, it builds a learnable context from training examples in addition to the single training sample. Allows for sample- and parameter-efficient few-shot classification and addresses recency-bias.

Explained Variance Adaptation

@sirluk contributed a new LoRA initialization method called Explained Variance Adaptation (EVA). Instead of randomly initializing LoRA weights, this method uses SVD on minibatches of finetuning data to initialize the LoRA weights and is also able to re-allocate the ranks of the adapter based on the explained variance ratio (derived from SVD). Thus, this initialization method can yield better initial values and better rank distribution.

Bone

@JL-er added an implementation for Block Affine (Bone) Adaptation which utilizes presumed sparsity in the base layer weights to divide them into multiple sub-spaces that share a single low-rank matrix for updates. Compared to LoRA, Bone has the potential to significantly reduce memory usage and achieve faster computation.

Enhancements

PEFT now supports LoRAs for int8 torchao quantized models (check this and this notebook) . In addition, VeRA can now be used with 4 and 8 bit bitsandbytes quantization thanks to @ZiadHelal.

Hot-swapping of LoRA adapters is now possible using the hotswap_adapter function. Now you are able to load one LoRA and replace its weights in-place with the LoRA weights of another adapter which, in general, should be faster than deleting one adapter and loading the other adapter in its place. The feature is built so that no re-compilation of the model is necessary if torch.compile was called on the model (right now, this requires ranks and alphas to be the same for the adapters).

LoRA and IA³ now support Conv3d layers thanks to @jsilter, and @JINO-ROHIT added a notebook showcasing PEFT model evaluation using lm-eval-harness toolkit.

With the target_modules argument, you can specify which layers to target with the adapter (e.g. LoRA). Now you can also specify which modules not to target by using the exclude_modules parameter (thanks @JINO-ROHIT).

Changes

There have been made several fixes to the OFT implementation, among other things, to fix merging, which makes adapter weights trained with PEFT versions prior to this release incompatible (see #1996 for details).
Adapter configs are now forward-compatible by accepting unknown keys.
Prefix tuning was fitted to the DynamicCache caching infrastructure of transformers (see #2096). If you are using this PEFT version and a recent version of transformers with an old prefix tuning checkpoint, you should double check that it still works correctly and retrain it if it doesn't.
Added lora_bias parameter to LoRA layers to enable bias on LoRA B matrix. This is useful when extracting LoRA weights from fully fine-tuned parameters with bias vectors so that these can be taken into account.
#2180 provided a couple of bug fixes to LoKr (thanks @yaswanth19). If you're using LoKr, your old checkpoints should still work but it's recommended to retrain your adapter.
from_pretrained now warns the user if PEFT keys are missing.
Attribute access to modules in modules_to_save is now properly and transparently handled.
PEFT supports the changes to bitsandbytes 8bit quantization from the recent v0.45.0 release. To benefit from these improvements, we thus recommend to upgrade bitsandbytes if you're using QLoRA. Expect slight numerical differences in model outputs if you're using QLoRA with 8bit bitsandbytes quantization.

What's Changed

Bump version to 0.13.1.dev0 by @BenjaminBossan in #2094
Support Conv3d layer in LoRA and IA3 by @jsilter in #2082
Fix Inconsistent Missing Keys Warning for Adapter Weights in PEFT by @yaswanth19 in #2084
FIX: Change check if past_key_values is empty by @BenjaminBossan in #2106
Update install.md by @Salehbigdeli in #2110
Update OFT to fix merge bugs by @Zeju1997 in #1996
ENH: Improved attribute access for modules_to_save by @BenjaminBossan in #2117
FIX low_cpu_mem_usage consolidates devices by @BenjaminBossan in #2113
TST Mark flaky X-LoRA test as xfail by @BenjaminBossan in #2114
ENH: Warn when from_pretrained misses PEFT keys by @BenjaminBossan in #2118
FEAT: Adding exclude modules param(#2044) by @JINO-ROHIT in #2102
fix merging bug / update boft conv2d scaling variable by @Zeju1997 in #2127
FEAT: Support quantization for VeRA using bitsandbytes (#2070) by @ZiadHelal in #2076
Bump version to 0.13.2.dev0 by @BenjaminBossan in #2137
FEAT: Support torchao by @BenjaminBossan in #2062
FIX: Transpose weight matrix based on fan_in_fan_out condition in PiSSA initialization (#2103) by @suyang160 in #2104
FIX Type annoations in vera/bnb.py by @BenjaminBossan in #2139
ENH Make PEFT configs forward compatible by @BenjaminBossan in #2038
FIX Raise an error when performing mixed adapter inference and passing non-existing adapter names by @BenjaminBossan in #2090
FIX Prompt learning with latest transformers error by @BenjaminBossan in #2140
adding peft lora example notebook for ner by @JINO-ROHIT in #2126
FIX TST: NaN issue with HQQ GPU test by @BenjaminBossan in #2143
FIX: Bug in target module optimization if child module name is suffix of parent module name by @BenjaminBossan in #2144
Bump version to 0.13.2.dev0 by @BenjaminBossan in #2145
FIX Don't assume past_key_valus for encoder models by @BenjaminBossan in #2149
Use SFTConfig instead of SFTTrainer keyword args by @qgallouedec in #2150
FIX: Sft train script FSDP QLoRA embedding mean resizing error by @BenjaminBossan in #2151
Optimize DoRA in eval and no dropout by @ariG23498 in #2122
FIX Missing low_cpu_mem_usage argument by @BenjaminBossan in #2156
MNT: Remove version pin of diffusers by @BenjaminBossan in #2162
DOC: Improve docs for layers_pattern argument by @BenjaminBossan in #2157
Update HRA by @DaShenZi721 in #2160
fix fsdp_auto_wrap_policy by @eljandoubi in #2167
MNT Remove Python 3.8 since it's end of life by @BenjaminBossan in #2135
Improving error message when users pass layers_to_transform and layers_pattern by @JINO-ROHIT in #2169
FEAT Add hotswapping functionality by @BenjaminBossan in #2120
Fix to prefix tuning to fit transformers by @BenjaminBossan in #2096
MNT: Enable Python 3.12 on CI by @BenjaminBossan in #2173
MNT: Update docker nvidia base image to 12.4.1 by @BenjaminBossan in #2176
DOC: Extend modules_to_save doc with pooler example by @BenjaminBossan in #2175
FIX VeRA failure on multiple GPUs by @BenjaminBossan in #2163
FIX: Import location of HF hub errors by @BenjaminBossan in #2178
DOC: fix broken link in the README of loftq by @dennis2030 in #2183
added checks for layers to transforms and layer pattern in lora by @JINO-ROHIT in #2159
ENH: Warn when loading PiSSA/OLoRA together with other adapters by @BenjaminBossan in #2186
TST: Skip AQLM test that is incompatible with torch 2.5 by @BenjaminBossan in #2187
FIX: Prefix...

Contributors

githubnemo, jsilter, and 19 other contributors

Assets 2

11 Oct 11:45

BenjaminBossan

v0.13.2

431c0e2

v0.13.2: Small patch release

This patch release contains a small bug fix for an issue that prevented some LoRA checkpoints to be loaded correctly (mostly concerning stable diffusion checkpoints not trained with PEFT when loaded in diffusers, #2144).

Full Changelog: v0.13.1...v0.13.2

Assets 2

08 Oct 12:29

BenjaminBossan

v0.13.1

b8da272

v0.13.1: Small patch release

This patch release contains a small bug fix for the low_cpu_mem_usage=True option (#2113).

Full Changelog: v0.13.0...v0.13.1

Assets 2

25 Sep 12:11

BenjaminBossan

v0.13.0

f0b066e

v0.13.0: LoRA+, VB-LoRA, and more

Highlights

New methods

LoRA+

@kallewoof added LoRA+ to PEFT (#1915). This is a function that allows to initialize an optimizer with settings that are better suited for training a LoRA adapter.

VB-LoRA

@leo-yangli added a new method to PEFT called VB-LoRA (#2039). The idea is to have LoRA layers be composed from a single vector bank (hence "VB") that is shared among all layers. This makes VB-LoRA extremely parameter efficient and the checkpoints especially small (comparable to the VeRA method), while still promising good fine-tuning performance. Check the VB-LoRA docs and example.

Enhancements

New Hugging Face team member @ariG23498 added the helper function rescale_adapter_scale to PEFT (#1951). Use this context manager to temporarily increase or decrease the scaling of the LoRA adapter of a model. It also works for PEFT adapters loaded directly into a transformers or diffusers model.

@ariG23498 also added DoRA support for embedding layers (#2006). So if you're using the use_dora=True option in the LoraConfig, you can now also target embedding layers.

For some time now, we support inference with batches that are using different adapters for different samples, so e.g. sample 1-5 use "adapter1" and samples 6-10 use "adapter2". However, this only worked for LoRA layers so far. @saeid93 extended this to also work with layers targeted by modules_to_save (#1990).

When loading a PEFT adapter, you now have the option to pass low_cpu_mem_usage=True (#1961). This will initialize the adapter with empty weights ("meta" device) before loading the weights instead of initializing on CPU or GPU. This can speed up loading PEFT adapters. So use this option especially if you have a lot of adapters to load at the same time or if these adapters are very big. Please let us know if you encounter issues with this option, as we may make this the default in the future.

Changes

Safe loading of PyTorch weights

Unless indicated otherwise, PEFT adapters are saved and loaded using the secure safetensors format. However, we also support the PyTorch format for checkpoints, which relies on the inherently insecure pickle protocol from Python. In the future, PyTorch will be more strict when loading these files to improve security by making the option weights_only=True the default. This is generally recommended and should not cause any trouble with PEFT checkpoints, which is why with this release, PEFT will enable this by default. Please open an issue if this causes trouble.

What's Changed

Bump version to 0.12.1.dev0 by @BenjaminBossan in #1950
CI Fix Windows permission error on merge test by @BenjaminBossan in #1952
Check if past_key_values is provided when using prefix_tuning in peft_model by @Nidhogg-lyz in #1942
Add lora+ implementation by @kallewoof in #1915
FIX: New bloom changes breaking prompt learning by @BenjaminBossan in #1969
ENH Update VeRA preconfigured models by @BenjaminBossan in #1941
fix: lora+: include lr in optimizer kwargs by @kallewoof in #1973
FIX active_adapters for transformers models by @BenjaminBossan in #1975
FIX Loading adapter honors offline mode by @BenjaminBossan in #1976
chore: Update CI configuration for workflows by @XciD in #1985
Cast to fp32 if using bf16 weights on cpu during merge_and_unload by @snarayan21 in #1978
AdaLora: Trigger warning when user uses 'r' inplace of 'init_r' by @bhargavyagnik in #1981
[Add] scaling LoRA adapter weights with a context manager by @ariG23498 in #1951
DOC Small fixes for HQQ and section title by @BenjaminBossan in #1986
Add docs and examples for X-LoRA by @EricLBuehler in #1970
fix: fix docker build gpus by @XciD in #1987
FIX: Adjust transformers version check for bloom by @BenjaminBossan in #1992
[Hotfix] Fix BOFT mixed precision by @Edenzzzz in #1925
[Suggestions] Updates suggested for helper.rescale_adapter_scale by @ariG23498 in #1989
MAINT: Default to loading weights only for torch.load by @BenjaminBossan in #1993
BOFT bug fix when saving by @Zeju1997 in #1994
FIX Import error in BOFT half precision test by @BenjaminBossan in #1995
Update lora.md (typos) by @nir-sh-automat-it in #2003
TST Add LNTuningConfig and LoKrConfig to tests by @BenjaminBossan in #2005
ENH: Warn when a user provided model name in the config renamed by @BenjaminBossan in #2004
FIX CI Correctly report outcome of bnb import test by @BenjaminBossan in #2007
Update docs for X-LoRA and some bugfixes by @EricLBuehler in #2002
TST: Potentially Skip 8bit bnb regression test if compute capability is too low by @BenjaminBossan in #1998
CI Activate single core multi backend bnb tests by @BenjaminBossan in #2008
Fix usage of deprecated parameters/functions in X-LoRA by @EricLBuehler in #2010
[tests] enable test_vera_dtypes on XPU by @faaany in #2017
CI Remove regression tests from BNB CI by @BenjaminBossan in #2024
[tests] enable regression tests on XPU by @faaany in #2019
ENH: Better error msg for replace_lora_weights_loftq when using a local model. by @BenjaminBossan in #2022
[tests] make cuda-only cases in TestModelAndLayerStatus device-agnostic by @faaany in #2026
[tests] enable test_mixed_adapter_batches_lora_opt_timing on XPU by @faaany in #2021
MAINT: Update ruff version to ~0.6.1 by @BenjaminBossan in #1965
ENH Raise error when applying modules_to_save on tuner layer by @BenjaminBossan in #2028
FIX: Don't target the classification head when using target_modules="all-linear" by @BenjaminBossan in #2033
[tests] enable cuda-only tests in test_common_gpu.py to work on XPU by @faaany in #2031
[Add] DoRA Embedding by @ariG23498 in #2006
[tests] enable test_gpu_examples.py on XPU by @faaany in #2036
Bug: set correct pre-commit-hooks version by @ltoniazzi in #2034
Warn if using tied target module with tie_word_embeddings by @ltoniazzi in #2025
ENH: Faster adapter loading if there are a lot of target modules by @BenjaminBossan in #2045
FIX: Error with OLoRA init when using bnb by @BenjaminBossan in #2011
FIX: Small numerical discrepancy for p-tuning after loading the model by @BenjaminBossan in #2047
Add VB-LoRA by @leo-yangli in #2039
Fixing scalings logging test by @EricLBuehler in #2042
TST: Fewer inference steps for stable diffusion tests by @BenjaminBossan in #2051
TST Speed up vision model tests by @BenjaminBossan in #2058
TST: Make X-LoRA tests faster by @BenjaminBossan in #2059
Update permissions for githubtoken stale.yml by @glegendre01 in #2061
MAINT: Give stale bot permissions for PRs too by @BenjaminBossan in #2064
avoid saving boft_P in adapter model by @sywangyi in #2050
fix arguments for PiSSA preprocess by @keakon in #2053
Apply deprecated evaluation_strategy by @muellerzr in #1664
fixing multiple LoRA in the same batch or vit by @saeid93 in https://gi...

Contributors

kallewoof, keakon, and 20 other contributors

Assets 2

24 Jul 11:55

BenjaminBossan

v0.12.0

e6cd24c

v0.12.0: New methods OLoRA, X-LoRA, FourierFT, HRA, and much more

Highlights

New methods

OLoRA

@tokenizer-decode added support for a new LoRA initialization strategy called OLoRA (#1828). With this initialization option, the LoRA weights are initialized to be orthonormal, which promises to improve training convergence. Similar to PiSSA, this can also be applied to models quantized with bitsandbytes. Check out the accompanying OLoRA examples.

X-LoRA

@EricLBuehler added the X-LoRA method to PEFT (#1491). This is a mixture of experts approach that combines the strength of multiple pre-trained LoRA adapters. Documentation has yet to be added but check out the X-LoRA tests for how to use it.

FourierFT

@Phoveran, @zqgao22, @Chaos96, and @DSAILatHKUST added discrete Fourier transform fine-tuning to PEFT (#1838). This method promises to match LoRA in terms of performance while reducing the number of parameters even further. Check out the included FourierFT notebook.

HRA

@DaShenZi721 added support for Householder Reflection Adaptation (#1864). This method bridges the gap between low rank adapters like LoRA on the one hand and orthogonal fine-tuning techniques such as OFT and BOFT on the other. As such, it is interesting for both LLMs and image generation models. Check out the HRA example on how to perform DreamBooth fine-tuning.

Enhancements

IA³ now supports merging of multiple adapters via the add_weighted_adapter method thanks to @alexrs (#1701).
Call peft_model.get_layer_status() and peft_model.get_model_status() to get an overview of the layer/model status of the PEFT model. This can be especially helpful when dealing with multiple adapters or for debugging purposes. More information can be found in the docs (#1743).
DoRA now supports FSDP training, including with bitsandbytes quantization, aka QDoRA ()#1806).
VeRA has been extended by @dkopi to support targeting layers with different weight shapes (#1817).
@kallewoof added the possibility for ephemeral GPU offloading. For now, this is only implemented for loading DoRA models, which can be sped up considerably for big models at the cost of a bit of extra VRAM (#1857).
Experimental: It is now possible to tell PEFT to use your custom LoRA layers through dynamic dispatching. Use this, for instance, to add LoRA layers for thus far unsupported layer types without the need to first create a PR on PEFT (but contributions are still welcome!) (#1875).

Examples

@shirinyamani added a script and a notebook to demonstrate DoRA fine-tuning.
@rahulbshrestha contributed a notebook that shows how to fine-tune a DNA language model with LoRA.

Changes

Casting of the adapter dtype

Important: If the base model is loaded in float16 (fp16) or bfloat16 (bf16), PEFT now autocasts adapter weights to float32 (fp32) instead of using the dtype of the base model (#1706). This requires more memory than previously but stabilizes training, so it's the more sensible default. To prevent this, pass autocast_adapter_dtype=False when calling get_peft_model, PeftModel.from_pretrained, or PeftModel.load_adapter.

Adapter device placement

The logic of device placement when loading multiple adapters on the same model has been changed (#1742). Previously, PEFT would move all adapters to the device of the base model. Now, only the newly loaded/created adapter is moved to the base model's device. This allows users to have more fine-grained control over the adapter devices, e.g. allowing them to offload unused adapters to CPU more easily.

PiSSA

Calling save_pretrained with the convert_pissa_to_lora argument is deprecated, the argument was renamed to path_initial_model_for_weight_conversion (#1828). Also, calling this no longer deletes the original adapter (#1933).
Using weight conversion (path_initial_model_for_weight_conversion) while also using use_rslora=True and rank_pattern or alpha_pattern now raises an error (#1930). This used not to raise but inference would return incorrect outputs. We also warn about this setting during initialization.

Call for contributions

We are now making sure to tag appropriate issues with the contributions welcome label. If you are looking for a way to contribute to PEFT, check out these issues.

What's Changed

Bump version to 0.11.1.dev0 by @BenjaminBossan in #1736
save and load base model with revision by @mnoukhov in #1658
Autocast adapter weights if fp16/bf16 by @BenjaminBossan in #1706
FIX BOFT setting env vars breaks C++ compilation by @BenjaminBossan in #1739
Bump version to 0.11.2.dev0 by @BenjaminBossan in #1741
TST: torch compile tests by @BenjaminBossan in #1725
Add add_weighted_adapter to IA3 adapters by @alexrs in #1701
ENH Layer/model status shows devices now by @BenjaminBossan in #1743
Fix warning messages about config.json when the base model_id is local. by @elementary-particle in #1668
DOC TST Document and test reproducibility with models using batch norm by @BenjaminBossan in #1734
FIX Use correct attribute name for HQQ in merge by @BenjaminBossan in #1791
fix docs by @pacman100 in #1793
FIX Allow same layer adapters on different devices by @BenjaminBossan in #1742
TST Install bitsandbytes for compile tests by @BenjaminBossan in #1796
FIX BOFT device error after PR 1742 by @BenjaminBossan in #1799
TST Add regression test for DoRA, VeRA, BOFT, LN Tuning by @BenjaminBossan in #1792
Docs / LoRA: Add more information on merge_and_unload docs by @younesbelkada in #1805
TST: Add simple BNB regression tests by @BenjaminBossan in #1602
CI Make torch compile tests run on GPU by @BenjaminBossan in #1808
MNT Remove deprecated use of load_in_8bit by @BenjaminBossan in #1811
Refactor to make DoRA and QDoRA work with FSDP by @BenjaminBossan in #1806
FIX CI: Remove potentially problematic git command by @BenjaminBossan in #1820
ENH / Workflow: Notify on slack about peft + transformers main test results by @younesbelkada in #1821
FIX CI: Install pytest-reportlog package by @BenjaminBossan in #1822
ENH / Workflow: Use repository variable by @younesbelkada in #1823
Patch for Cambricon MLUs test by @huismiling in #1747
Fix a documentation typo by @sparsh2 in #1833
FIX Failing Llama tests due to new kv cache by @BenjaminBossan in #1832
Workflow / Bnb: Add a mechanism to inform us if the import fails by @younesbelkada in #1830
Workflow: Fix broken messages by @younesbelkada in #1842
feat(ci): add trufflehog secrets detection by @McPatate in #1841
DOC Describe torch_device argument in from_pretrained docstring by @BenjaminBossan in #1843
Support for different layer shapes for VeRA by @dkopi in #1817
CI Activate env to prevent bnb import error by @BenjaminBossan in #1845
Fixed PeftMixedModel docstring example #1824 by @namanvats in #1850
MNT Upgrade ruff version to ~0.4.8 by @BenjaminBossan in #1851
Adding support for an optional initialization strategy OLoRA by @tokenizer-decode in #1828
FIX: Adalora ranknum loaded on wrong device by @BenjaminBossan in #1852
Workflow / FIX: Fix red status on our CI by @younesbelkada in #1854
DOC FIX Comment about init of LoRA Embedding by @BenjaminBossan in https://gi...

Contributors

kallewoof, alexrs, and 30 other contributors

Assets 2

17 May 12:55

BenjaminBossan

v0.11.1

207376d

v0.11.1

Patch release v0.11.1

Fix a bug that could lead to C++ compilation errors after importing PEFT (#1738 #1739).

Full Changelog: v0.11.0...v0.11.1

Assets 2

16 May 09:53

BenjaminBossan

v0.11.0

0649947

v0.11.0: New PEFT methods BOFT, VeRA, PiSSA, quantization with HQQ and EETQ, and more

Highlights

New methods

BOFT

Thanks to @yfeng95, @Zeju1997, and @YuliangXiu, PEFT was extended with BOFT: Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization (#1326, BOFT paper link). In PEFT v0.7.0, we already added OFT, but BOFT is even more parameter efficient. Check out the included BOFT controlnet and BOFT dreambooth examples.

VeRA

If the parameter reduction of LoRA is not enough for your use case, you should take a close look at VeRA: Vector-based Random Matrix Adaptation (#1564, VeRA paper link). This method resembles LoRA but adds two learnable scaling vectors to the two LoRA weight matrices. However, the LoRA weights themselves are shared across all layers, considerably reducing the number of trainable parameters.

The bulk of this PR was implemented by contributor @vvvm23 with the help of @dkopi.

PiSSA

PiSSA, Principal Singular values and Singular vectors Adaptation, is a new initialization method for LoRA, which was added by @fxmeng (#1626, PiSSA paper link). The improved initialization promises to speed up convergence and improve the final performance of LoRA models. When using models quantized with bitsandbytes, PiSSA initialization should reduce the quantization error, similar to LoftQ.

Quantization

HQQ

Thanks to @fahadh4ilyas, PEFT LoRA linear layers now support Half-Quadratic Quantization, HQQ (#1618, HQQ repo). HQQ is fast and efficient (down to 2 bits), while not requiring calibration data.

EETQ

Another new quantization method supported in PEFT is Easy & Efficient Quantization for Transformers, EETQ (#1675, EETQ repo). This 8 bit quantization method works for LoRA linear layers and should be faster than bitsandbytes.

Show adapter layer and model status

We added a feature to show adapter layer and model status of PEFT models in #1663. With the newly added methods, you can easily check what adapters exist on your model, whether gradients are active, whether they are enabled, which ones are active or merged. You will also be informed if irregularities have been detected.

To use this new feature, call model.get_layer_status() for layer-level information, and model.get_model_status() for model-level information. For more details, check out our docs on layer and model status.

Changes

Edge case of how we deal with `modules_to_save`

We had the issue that when we were using classes such as PeftModelForSequenceClassification, we implicitly added the classifier layers to model.modules_to_save. However, this would only add a new ModulesToSaveWrapper instance for the first adapter being initialized. When initializing a 2nd adapter via model.add_adapter, this information was ignored. Now, peft_config.modules_to_save is updated explicitly to add the classifier layers (#1615). This is a departure from how this worked previously, but it reflects the intended behavior better.

Furthermore, when merging together multiple LoRA adapters using model.add_weighted_adapter, if these adapters had modules_to_save, the original parameters of these modules would be used. This is unexpected and will most likely result in bad outputs. As there is no clear way to merge these modules, we decided to raise an error in this case (#1615).

What's Changed

Bump version to 0.10.1.dev0 by @BenjaminBossan in #1578
FIX Minor issues in docs, re-raising exception by @BenjaminBossan in #1581
FIX / Docs: Fix doc link for layer replication by @younesbelkada in #1582
DOC: Short section on using transformers pipeline by @BenjaminBossan in #1587
Extend PeftModel.from_pretrained() to models with disk-offloaded modules by @blbadger in #1431
[feat] Add lru_cache to import_utils calls that did not previously have it by @tisles in #1584
fix deepspeed zero3+prompt tuning bug. word_embeddings.weight shape i… by @sywangyi in #1591
MNT: Update GH bug report template by @BenjaminBossan in #1600
fix the torch_dtype and quant_storage_dtype by @pacman100 in #1614
FIX In the image classification example, Change the model to the LoRA… by @changhwa in #1624
Remove duplicated import by @nzw0301 in #1622
FIX: bnb config wrong argument names by @BenjaminBossan in #1603
FIX Make DoRA work with Conv1D layers by @BenjaminBossan in #1588
FIX: Send results to correct channel by @younesbelkada in #1628
FEAT: Allow ignoring mismatched sizes when loading by @BenjaminBossan in #1620
itemsize is torch>=2.1, use element_size() by @winglian in #1630
FIX Multiple adapters and modules_to_save by @BenjaminBossan in #1615
FIX Correctly call element_size by @BenjaminBossan in #1635
fix: allow load_adapter to use different device by @yhZhai in #1631
Adalora deepspeed by @sywangyi in #1625
Adding BOFT: Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization by @yfeng95 in #1326
Don't use deprecated Repository anymore by @Wauplin in #1641
FIX Errors in the transformers integration docs by @BenjaminBossan in #1629
update figure assets of BOFT by @YuliangXiu in #1642
print_trainable_parameters - format % to be sensible by @stas00 in #1648
FIX: Bug with handling of active adapters by @BenjaminBossan in #1659
Remove dreambooth Git link by @charliermarsh in #1660
add safetensor load in multitask_prompt_tuning by @sywangyi in #1662
Adds Vera (Vector Based Random Matrix Adaption) #2 by @BenjaminBossan in #1564
Update deepspeed.md by @sanghyuk-choi in #1679
ENH: Add multi-backend tests for bnb by @younesbelkada in #1667
FIX / Workflow: Fix Mac-OS CI issues by @younesbelkada in #1680
FIX Use trl version of tiny random llama by @BenjaminBossan in #1681
FIX: Don't eagerly import bnb for LoftQ by @BenjaminBossan in #1683
FEAT: Add EETQ support in PEFT by @younesbelkada in #1675
FIX / Workflow: Always notify on slack for docker image workflows by @younesbelkada in #1682
FIX: upgrade autoawq to latest version by @younesbelkada in #1684
FIX: Initialize DoRA weights in float32 if float16 is being used by @BenjaminBossan in #1653
fix bf16 model type issue for ia3 by @sywangyi in #1634
FIX Issues with AdaLora initialization by @BenjaminBossan in #1652
FEAT Show adapter layer and model status by @BenjaminBossan in #1663
Fixing the example by providing correct tokenized seq length by @jpodivin in #1686
TST: Skiping AWQ tests for now .. by @younesbelkada in #1690
Add LayerNorm tuning model by @DTennant in #1301
FIX Use different doc builder docker image by @BenjaminBossan in #1697
Set experimental dynamo config for compile tests by @BenjaminBossan in #1698
fix the fsdp peft autowrap policy by @pacman100 in #1694
Add LoRA support to HQQ Quantization by @fahadh4ilyas in #1618
FEAT Helper to check if a model is a PEFT model by @BenjaminBossan in #1713
support Cambricon MLUs device by @huismiling in #1687
Some small cleanups in docstrings, copyright note by @BenjaminBossan in #1714
Fix docs typo by @NielsRogge in #1719
revise run_peft_multigpu.sh by @abzb1 in #1722
Workflow: Add slack messages workflow by @younesbelkada in #1723
DOC Document the PEFT checkpoint for...

Contributors

winglian, charliermarsh, and 24 other contributors

Assets 2

21 Mar 10:20

BenjaminBossan

v0.10.0

8221246

v0.10.0: Fine-tune larger QLoRA models with DeepSpeed and FSDP, layer replication, enhance DoRA

Highlights

Support for QLoRA with DeepSpeed ZeRO3 and FSDP

We added a couple of changes to allow QLoRA to work with DeepSpeed ZeRO3 and Fully Sharded Data Parallel (FSDP). For instance, this allows you to fine-tune a 70B Llama model on two GPUs with 24GB memory each. Besides the latest version of PEFT, this requires bitsandbytes>=0.43.0, accelerate>=0.28.0, transformers>4.38.2, trl>0.7.11. Check out our docs on DeepSpeed and FSDP with PEFT, as well as this blogpost from answer.ai, for more details.

Layer replication

First time contributor @siddartha-RE added support for layer replication with LoRA. This allows you to duplicate layers of a model and apply LoRA adapters to them. Since the base weights are shared, this costs only very little extra memory, but can lead to a nice improvement of model performance. Find out more in our docs.

Improving DoRA

Last release, we added the option to enable DoRA in PEFT by simply adding use_dora=True to your LoraConfig. However, this only worked for non-quantized linear layers. With this PEFT release, we now also support Conv2d layers, as well as linear layers quantized with bitsandbytes.

Mixed LoRA adapter batches

If you have a PEFT model with multiple LoRA adapters attached to it, it's now possible to apply different adapters (or, in fact, no adapter) on different samples in the same batch. To do this, pass a list of adapter names as an additional argument. For example, if you have a batch of three samples:

output = model(**inputs, adapter_names=["adapter1", "adapter2", "__base__"])`

Here, "adapter1" and "adapter2" should be the same name as your corresponding LoRA adapters and "__base__" is a special name that refers to the base model without any adapter. Find more details in our docs.

Without this feature, if you wanted to run inference with different LoRA adapters, you'd have to use single samples or try to group batches with the same adapter, then switch between adapters using set_adapter -- this is inefficient and inconvenient. Therefore, it is recommended to use this new, faster method from now on when encountering this scenario.

New LoftQ initialization function

We added an alternative way to initialize LoRA weights for a quantized model using the LoftQ method, which can be more convenient than the existing method. Right now, using LoftQ requires you to go through multiple steps as shown here. Furthermore, it's necessary to keep a separate copy of the quantized weights, as those are not identical to the quantized weights from the default model.

Using the new replace_lora_weights_loftq function, it's now possible to apply LoftQ initialization in a single step and without the need for extra copies of the weights. Check out the docs and this example notebook to see how it works. Right now, this method only supports 4bit quantization with bitsandbytes, and the model has to be stored in the safetensors format.

Deprecations

The function prepare_model_for_int8_training was deprecated for quite some time and is now removed completely. Use prepare_model_for_kbit_training instead.

What's Changed

Besides these highlights, we added many small improvements and fixed a couple of bugs. All these changes are listed below. As always, we thank all the awesome contributors who helped us improve PEFT.

Bump version to 0.9.1.dev0 by @BenjaminBossan in #1517
Fix for "leaf Variable that requires grad" Error in In-Place Operation by @DopeorNope-Lee in #1372
FIX [CI / Docker] Follow up from #1481 by @younesbelkada in #1487
CI: temporary disable workflow by @younesbelkada in #1534
FIX [Docs/ bnb / DeepSpeed] Add clarification on bnb + PEFT + DS compatibilities by @younesbelkada in #1529
Expose bias attribute on tuner layers by @BenjaminBossan in #1530
docs: highlight difference between num_parameters() and get_nb_trainable_parameters() in PEFT by @kmehant in #1531
fix: fail when required args not passed when prompt_tuning_init==TEXT by @kmehant in #1519
Fixed minor grammatical and code bugs by @gremlin97 in #1542
Optimize levenshtein_distance algorithm in peft_lora_seq2seq_accelera… by @SUNGOD3 in #1527
Update prompt_based_methods.md by @insist93 in #1548
FIX Allow AdaLoRA rank to be 0 by @BenjaminBossan in #1540
FIX: Make adaptation prompt CI happy for transformers 4.39.0 by @younesbelkada in #1551
MNT: Use BitsAndBytesConfig as load_in_* is deprecated by @BenjaminBossan in #1552
Add Support for Mistral Model in Llama-Adapter Method by @PrakharSaxena24 in #1433
Add support for layer replication in LoRA by @siddartha-RE in #1368
QDoRA: Support DoRA with BnB quantization by @BenjaminBossan in #1518
Feat: add support for Conv2D DoRA by @sayakpaul in #1516
TST Report slowest tests by @BenjaminBossan in #1556
Changes to support fsdp+qlora and dsz3+qlora by @pacman100 in #1550
Update style with ruff 0.2.2 by @BenjaminBossan in #1565
FEAT Mixing different LoRA adapters in same batch by @BenjaminBossan in #1558
FIX [CI] Fix test docker CI by @younesbelkada in #1535
Fix LoftQ docs and tests by @BenjaminBossan in #1532
More convenient way to initialize LoftQ by @BenjaminBossan in #1543

New Contributors

@DopeorNope-Lee made their first contribution in #1372
@kmehant made their first contribution in #1531
@gremlin97 made their first contribution in #1542
@SUNGOD3 made their first contribution in #1527
@insist93 made their first contribution in #1548
@PrakharSaxena24 made their first contribution in #1433
@siddartha-RE made their first contribution in #1368

Full Changelog: v0.9.0...v0.10.0

Contributors

BenjaminBossan, pacman100, and 9 other contributors

Assets 2

28 Feb 10:37

BenjaminBossan

v0.9.0

7e5335d

v0.9.0: Merging LoRA weights, new quantization options, DoRA support, and more

Highlights

New methods for merging LoRA weights together

With PR #1364, we added new methods for merging LoRA weights together. This is not about merging LoRA weights into the base model. Instead, this is about merging the weights from different LoRA adapters into a single adapter by calling add_weighted_adapter. This allows you to combine the strength from multiple LoRA adapters into a single adapter, while being faster than activating each of these adapters individually.

Although this feature has already existed in PEFT for some time, we have added new merging methods that promise much better results. The first is based on TIES, the second on DARE and a new one inspired by both called Magnitude Prune. If you haven't tried these new methods, or haven't touched the LoRA weight merging feature at all, you can find more information here:

AWQ and AQLM support for LoRA

Via #1394, we now support AutoAWQ in PEFT. This is a new method for 4bit quantization of model weights.

Similarly, we now support AQLM via #1476. This method allows to quantize weights to as low as 2 bits. Both methods support quantizing nn.Linear layers. To find out more about all the quantization options that work with PEFT, check out our docs here.

Note these integrations do not support merge_and_unload() yet, meaning for inference you need to always attach the adapter weights into the base model

DoRA support

We now support Weight-Decomposed Low-Rank Adaptation aka DoRA via #1474. This new method is builds on top of LoRA and has shown very promising results. Especially at lower ranks (e.g. r=8), it should perform much better than LoRA. Right now, only non-quantized nn.Linear layers are supported. If you'd like to give it a try, just pass use_dora=True to your LoraConfig and you're good to go.

Documentation

Thanks to @stevhliu and many other contributors, there have been big improvements to the documentation. You should find it more organized and more up-to-date. Our DeepSpeed and FSDP guides have also been much improved.

Check out our improved docs if you haven't already!

Development

If you're implementing custom adapter layers, for instance a custom LoraLayer, note that all subclasses should now implement update_layer -- unless they want to use the default method by the parent class. In particular, this means you should no longer use different method names for the subclass, like update_layer_embedding. Also, we generally don't permit ranks (r) of 0 anymore. For more, see this PR.

Developers should have an easier time now since we fully embrace ruff. If you're the type of person who forgets to call make style before pushing to a PR, consider adding a pre-commit hook. Tests are now a bit less verbose by using plain asserts and generally embracing pytest features more fully. All of this comes thanks to @akx.

What's Changed

On top of these changes, we have added a lot of small changes since the last release, check out the full changes below. As always, we had a lot of support by many contributors, you're awesome!

Release patch version 0.8.2 by @pacman100 in #1428
[docs] Polytropon API by @stevhliu in #1422
Fix MatMul8bitLtBackward view issue by @younesbelkada in #1425
Fix typos by @szepeviktor in #1435
Fixed saving for models that don't have _name_or_path in config by @kovalexal in #1440
[docs] README update by @stevhliu in #1411
[docs] Doc maintenance by @stevhliu in #1394
[core/TPLinear] Fix breaking change by @younesbelkada in #1439
Renovate quality tools by @akx in #1421
[Docs] call set_adapters() after add_weighted_adapter by @sayakpaul in #1444
MNT: Check only selected directories with ruff by @BenjaminBossan in #1446
TST: Improve test coverage by skipping fewer tests by @BenjaminBossan in #1445
Update Dockerfile to reflect how to compile bnb from source by @younesbelkada in #1437
[docs] Lora-like guides by @stevhliu in #1371
[docs] IA3 by @stevhliu in #1373
Add docstrings for set_adapter and keep frozen by @EricLBuehler in #1447
Add new merging methods by @pacman100 in #1364
FIX Loading with AutoPeftModel.from_pretrained by @BenjaminBossan in #1449
Support modules_to_save config option when using DeepSpeed ZeRO-3 with ZeRO init enabled. by @pacman100 in #1450
FIX Honor HF_HUB_OFFLINE mode if set by user by @BenjaminBossan in #1454
[docs] Remove iframe by @stevhliu in #1456
[docs] Docstring typo by @stevhliu in #1455
[core / get_peft_state_dict] Ignore all exceptions to avoid unexpected errors by @younesbelkada in #1458
[ Adaptation Prompt] Fix llama rotary embedding issue with transformers main by @younesbelkada in #1459
[CI] Add CI tests on transformers main to catch early bugs by @younesbelkada in #1461
Use plain asserts in tests by @akx in #1448
Add default IA3 target modules for Mixtral by @arnavgarg1 in #1376
add magnitude_prune merging method by @pacman100 in #1466
[docs] Model merging by @stevhliu in #1423
Adds an example notebook for showing multi-adapter weighted inference by @sayakpaul in #1471
Make tests succeed more on MPS by @akx in #1463
[CI] Fix adaptation prompt CI on transformers main by @younesbelkada in #1465
Update docstring at peft_types.py by @eduardozamudio in #1475
FEAT: add awq suppot in PEFT by @younesbelkada in #1399
Add pre-commit configuration by @akx in #1467
ENH [CI] Run tests only when relevant files are modified by @younesbelkada in #1482
FIX [CI / bnb] Fix failing bnb workflow by @younesbelkada in #1480
FIX [PromptTuning] Simple fix for transformers >= 4.38 by @younesbelkada in #1484
FIX: Multitask prompt tuning with other tuning init by @BenjaminBossan in #1144
previous_dtype is now inferred from F.linear's result output type. by @MFajcik in #1010
ENH: [CI / Docker]: Create a workflow to temporarly build docker images in case dockerfiles are modified by @younesbelkada in #1481
Fix issue with unloading double wrapped modules by @BenjaminBossan in #1490
FIX: [CI / Adaptation Prompt] Fix CI on transformers main by @younesbelkada in #1493
Update peft_bnb_whisper_large_v2_training.ipynb: Fix a typo by @martin0258 in #1494
covert SVDLinear dtype by @PHOSPHENES8 in #1495
Raise error on wrong type for to modules_to_save by @BenjaminBossan in #1496
AQLM support for LoRA by @BlackSamorez in #1476
Allow trust_remote_code for tokenizers when loading AutoPeftModels by @OfficialDelta in https://...

Contributors

akx, szepeviktor, and 14 other contributors

Assets 2

01 Feb 14:16

pacman100

v0.8.2

e37bff6

Release v0.8.2

What's Changed

Release v0.8.2.dev0 by @pacman100 in #1416
Add IA3 Modules for Phi by @arnavgarg1 in #1407
Update custom_models.md by @boyufan in #1409
Add positional args to PeftModelForCausalLM.generate by @SumanthRH in #1393
[Hub] fix: subfolder existence check by @sayakpaul in #1417
FIX: Make merging of adapter weights idempotent by @BenjaminBossan in #1355
[core] fix critical bug in diffusers by @younesbelkada in #1427

New Contributors

@boyufan made their first contribution in #1409

Full Changelog: v0.8.1...v0.8.2

Contributors

BenjaminBossan, pacman100, and 5 other contributors

Assets 2

Releases: huggingface/peft

Version 0.14.0: EVA, Context-aware Prompt Tuning, Bone, and more

Highlights

New Methods

Context-aware Prompt Tuning

Explained Variance Adaptation

Bone

Enhancements

Changes

What's Changed

Contributors

v0.13.2: Small patch release

v0.13.1: Small patch release

v0.13.0: LoRA+, VB-LoRA, and more

Highlights

New methods

LoRA+

VB-LoRA

Enhancements

Changes

Safe loading of PyTorch weights

What's Changed

Contributors

v0.12.0: New methods OLoRA, X-LoRA, FourierFT, HRA, and much more

Highlights

New methods

OLoRA

X-LoRA

FourierFT

HRA

Enhancements

Examples

Changes

Casting of the adapter dtype

Adapter device placement

PiSSA

Call for contributions

What's Changed

Contributors

v0.11.1

Patch release v0.11.1

v0.11.0: New PEFT methods BOFT, VeRA, PiSSA, quantization with HQQ and EETQ, and more

Highlights

New methods

BOFT

VeRA

PiSSA

Quantization

HQQ

EETQ

Show adapter layer and model status

Changes

Edge case of how we deal with modules_to_save

What's Changed

Contributors

v0.10.0: Fine-tune larger QLoRA models with DeepSpeed and FSDP, layer replication, enhance DoRA

Highlights

Support for QLoRA with DeepSpeed ZeRO3 and FSDP

Layer replication

Improving DoRA

Mixed LoRA adapter batches

New LoftQ initialization function

Deprecations

What's Changed

New Contributors

Contributors

v0.9.0: Merging LoRA weights, new quantization options, DoRA support, and more

Highlights

New methods for merging LoRA weights together

AWQ and AQLM support for LoRA

DoRA support

Documentation

Development

What's Changed

Contributors

Release v0.8.2

What's Changed

New Contributors

Contributors

Edge case of how we deal with `modules_to_save`