Skip to content

Commit

Permalink
FEAT: VeRA quantization using bitsandbytes (huggingface#2070) (huggin…
Browse files Browse the repository at this point in the history
…gface#2076)

VeRA can now be used with 4bit and 8bit bnb quantization.
  • Loading branch information
ZiadHelal authored and BenjaminBossan committed Oct 22, 2024
1 parent 5a560da commit d10151e
Show file tree
Hide file tree
Showing 8 changed files with 840 additions and 12 deletions.
10 changes: 9 additions & 1 deletion docs/source/developer_guides/quantization.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,9 +187,17 @@ peft_config = LoraConfig(...)
quantized_model = get_peft_model(quantized_model, peft_config)
```

## Other Supported PEFT Methods

Besides LoRA, the following PEFT methods also support quantization:

- **VeRA** (supports bitsandbytes quantization)
- **AdaLoRA** (supports both bitsandbytes and GPTQ quantization)
- **(IA)³** (supports bitsandbytes quantization)

## Next steps

If you're interested in learning more about quantization, the following may be helpful:

* Learn more about details about QLoRA and check out some benchmarks on its impact in the [Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA](https://huggingface.co/blog/4bit-transformers-bitsandbytes) blog post.
* Learn more details about QLoRA and check out some benchmarks on its impact in the [Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA](https://huggingface.co/blog/4bit-transformers-bitsandbytes) blog post.
* Read more about different quantization schemes in the Transformers [Quantization](https://hf.co/docs/transformers/main/quantization) guide.
5 changes: 1 addition & 4 deletions docs/source/package_reference/vera.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,9 @@ When saving the adapter parameters, it's possible to eschew storing the low rank

To handle different shapes of adapted layers, VeRA initializes shared A and B matrices with the largest required size for each dimension. During the forward pass, submatrices A and B for a given layer are sliced out from these shared matrices and used as described in the paper. For example, adapting two linear layers of shapes (100, 20) and (80, 50) will create A and B matrices of shapes (rank, 50) and (100, rank) respectively. Then, to adapt a layer of shape (100, 20), submatrices A and B of shapes (rank, 20) and (100, rank) will be extracted.

VeRA currently has the following constraints:
VeRA currently has the following constraint:

- Only `nn.Linear` layers are supported.
- Quantized layers are not supported.

If these constraints don't work for your use case, use LoRA instead.

The abstract from the paper is:

Expand Down
3 changes: 2 additions & 1 deletion src/peft/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,8 @@ def rescale_adapter_scale(model, multiplier):
Args:
model: The model containing `LoraLayer` modules whose scaling is to be adjusted.
multiplier (float or int): The multiplier that rescales the `scaling` attribute. Must be of type float or int.
multiplier (float or int):
The multiplier that rescales the `scaling` attribute. Must be of type float or int.
Raises:
ValueError: If the model does not contain any `LoraLayer`
Expand Down
16 changes: 16 additions & 0 deletions src/peft/tuners/vera/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,25 @@
# See the License for the specific language governing permissions and
# limitations under the License.

from peft.import_utils import is_bnb_4bit_available, is_bnb_available

from .config import VeraConfig
from .layer import Linear, VeraLayer
from .model import VeraModel


__all__ = ["VeraConfig", "VeraLayer", "Linear", "VeraModel"]


def __getattr__(name):
if (name == "Linear8bitLt") and is_bnb_available():
from .bnb import Linear8bitLt

return Linear8bitLt

if (name == "Linear4bit") and is_bnb_4bit_available():
from .bnb import Linear4bit

return Linear4bit

raise AttributeError(f"module {__name__} has no attribute {name}")
Loading

0 comments on commit d10151e

Please sign in to comment.