AQLM support for LoRA #1476

BlackSamorez · 2024-02-17T20:12:10Z

This PR aims to add AQLM support for LoRA finetuning.

AQLM has recently been integrated into transformers with this PR and it would make sense to add fine-tuning support for it.

From the aqlm inference side, proper autograd integration needs to be implemented. The work is being done in this branch. The basic code already works, but the efficient kernels are not willing to work just yet.

BlackSamorez · 2024-02-17T22:27:35Z

Proof of concept finetuning Mixtral on Colab:
https://colab.research.google.com/drive/12GTp1FCj5_0SnnNQH18h_2XFh9vS_guX?usp=sharing

BlackSamorez · 2024-02-19T08:09:26Z

@younesbelkada

younesbelkada

Thanks a lot for your great work @BlackSamorez - as always !
Can you merge your changes with the latest changes from main
I am not 100% familiar with AQLM yet, I also have an open question, do you think merging the adapter weights into the AQLM base model is something that is doable?
For reference, this is how merging is performed in a classic LoRA: https://huggingface.co/docs/peft/v0.7.1/en/conceptual_guides/lora#merge-lora-weights-into-the-base-model which is also supported in QLoRA

BlackSamorez · 2024-02-19T09:22:01Z

I believe merging would not be possible because AQLM forces a very strong and specific symmetry on the weights (in the case of one codebook it's a limited amount of repeating local weight patterns). An abstract LoRA adapter would not satisfy said symmetry.

BenjaminBossan · 2024-02-19T11:08:03Z

Thanks a lot for adding support for AQLM. I'm not sure what the state of the PR is, if it's ready for review or still in progress. LMK if you want to have a full review.

From a first glance, here are some things that are still missing:

Import of aqlm should be guarded (possibly with min version) so that users don't get an error when it's not installed
We should add documentation (could be later PR but it's not ideal)
We should add tests (could be later PR but it's not ideal)

BlackSamorez · 2024-02-19T11:36:12Z

I'm working on each of those points (looking at #1399 as a reference).

younesbelkada · 2024-02-19T11:36:46Z

Thanks so much @BlackSamorez !

BlackSamorez · 2024-02-19T13:23:14Z

@BenjaminBossan
What would the correct page docs for this be?

BenjaminBossan · 2024-02-19T14:00:52Z

What would the correct page docs for this be?

We have a section dedicated to quantization.

pacman100

Thank you @BlackSamorez for all the work wrt AQLM support for LoRA!

Went over the PR and left a comment. Overall looks good!

pacman100 · 2024-02-19T14:26:47Z

src/peft/tuners/lora/aqlm.py

+
+    if is_aqlm_available() and isinstance(target_base_layer, AqlmQuantizedLinear):
+        new_module = QuantLinear(target, adapter_name, **kwargs)
+        target.qweight = target_base_layer.codes


is this used?

Yes, this is the place where quantized linear layers get wrapped with a LoRA wrapper.
qweight itself is there simply to get it's device here.

BlackSamorez · 2024-02-19T15:32:28Z

I think I've addressed all of the issues above. We'll have to wait for the aqlm 1.0.2 (corresponding PR) release though because it will add proper autograd.
I'll tag you once it's released.

HuggingFaceDocBuilderDev · 2024-02-19T16:01:15Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan

Thanks for adding this method so quickly, it looks really promising.

I have a couple of suggestions, but those should be easy to adjust. Please take a look.

BenjaminBossan · 2024-02-19T15:51:55Z

docker/peft-gpu/Dockerfile

@@ -46,6 +46,10 @@ RUN source activate peft && \
 RUN source activate peft && \
    python3 -m pip install --no-cache-dir https://github.com/casper-hansen/AutoAWQ_kernels/releases/download/v0.0.4/autoawq_kernels-0.0.4-cp38-cp38-linux_x86_64.whl

+# Add aqlm for quantization testing
+RUN source activate peft && \
+    pip install aqlm[gpu]==1.0.2


Could you please move this install into the installation block below (lines 60-67) to avoid creating another cache step. Also, do you think it's a good idea to fix the version like that? It means that if there is a new aqlm release that breaks something in PEFT, we wouldn't notice it.

Moved + replaced == with >=

docs/source/developer_guides/quantization.md

BenjaminBossan · 2024-02-19T15:56:54Z

docs/source/developer_guides/quantization.md

+
+Additive Quantization of Language Models ([AQLM](https://arxiv.org/abs/2401.06118)) is a Large Language Models compression method. It quantizes multiple weights together and take advantage of interdependencies between them. AQLM represents groups of 8-16 weights as a sum of multiple vector codes. This allows it to compress models down to as low as 2-bit with considerably low accuracy losses.
+
+Since the AQLM quantization process is computationally expensive, a use of prequantized models is recommended. A partial list of available models can be found in the official aqlm [repository](https://github.com/Vahe1994/AQLM).


It would be nice (and better for adoption) to have safetensors for all of these models.

I mostly did safetensors for models for which we needed low RAM footprint for demos. We're currently updating the models themselves as well, and we'll definitely standardize the checkpoints once we're done.

docs/source/developer_guides/quantization.md

BenjaminBossan · 2024-02-19T15:59:02Z

docs/source/developer_guides/quantization.md

+quantized_model = get_peft_model(quantized_model, peft_config)
+```
+
+You can refer to the [Google Colab](https://colab.research.google.com/drive/12GTp1FCj5_0SnnNQH18h_2XFh9vS_guX?usp=sharing) example for an overview of AQLM+LoRA finetuning.


How about adding this notebook to the examples/ folder in PEFT?

I think that this notebook will suffice as an example in docs but it's not good enough to put it on GitHub. It'll probably be replaced in a few weeks anyway once we have better models, simpler pypi installs and generally better example.

BenjaminBossan · 2024-02-19T16:06:26Z

src/peft/tuners/lora/aqlm.py

+        super().__init__()
+        LoraLayer.__init__(self, base_layer)
+
+        # self.base_layer and self.quant_linear_module are the same; we need the former for consistency and the latter


We should be able to just do self.base_layer = base_layer here. Backwards compatibility is not an issue here, since, unlike for GPTQ, this is a new class.

Base layer is initialized during LoraLayer.__init__(self, base_layer). I removed the self.quant_linear because it is, indeed, not needed

BenjaminBossan · 2024-02-19T16:08:29Z

src/peft/tuners/lora/aqlm.py

+
+
+if is_aqlm_available():
+    from aqlm import QuantizedLinear as AqlmQuantizedLinear


Why do we need the alias? There is no name conflict because the PEFT class is called QuantLinear.

You're right, we don't. I simply didn't like those two names being similar.
I decided to rename QuantLinear -> AqlmLoraLinear for better readability of model structure.

docs/source/developer_guides/quantization.md

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

BlackSamorez · 2024-02-20T19:45:50Z

@BenjaminBossan
I've addressed the issues above. Also, aqlm==1.0.2 has been released and I made sure that the demo is functional.
The tests I've added here, however, are blocked by another PR into transformers.

younesbelkada

Very clean ! thanks very much ! LGTM once @BenjaminBossan approves !

BenjaminBossan

Thanks a lot for addressing the issues. From my point of view, the PR is almost ready to go, but we need to take care of a few issues with the test. Please take a look at my comments.

The tests I've added here, however, are blocked by another huggingface/transformers#29142.

If this means that the test is currently expected to fail, we need to take care of that. We don't want to have a failing CI until the next transformers release with the fix is published. Therefore, we need some kind of check on the test to ensure that it is skipped if the transformers version does not contain the fix.

BenjaminBossan · 2024-02-21T10:22:17Z

tests/test_gpu_examples.py

+    """
+
+    def setUp(self):
+        self.causal_lm_model_id = "BlackSamorez/TinyLlama-1_1B-Chat-v1_0-AQLM-2Bit-1x16-hf"


This model is stored in a pickle file, for tests we should really move to safetensors. Would it be possible for you to convert it or switch to a safetensors model for testing? Also, we should move models used for testing over to https://huggingface.co/peft-internal-testing, which I can do once we have a safetensors model.

I've converted the model to safetensors. The tests still pass (with this PR's transformers) and the results are consistent.

BenjaminBossan · 2024-02-21T10:24:50Z

tests/test_gpu_examples.py

+        correctly.
+        """
+        with tempfile.TemporaryDirectory() as tmp_dir:
+            model = AutoModelForCausalLM.from_pretrained(


When running the test locally, I get the following error:

@pytest.mark.single_gpu_tests def test_causal_lm_training_aqlm(self): r""" Test the CausalLM training on a single GPU device. The test would simply fail if the adapters are not set correctly. """ with tempfile.TemporaryDirectory() as tmp_dir: > model = AutoModelForCausalLM.from_pretrained( self.causal_lm_model_id, device_map="cuda", torch_dtype="auto", ) tests/test_gpu_examples.py:1421: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../../../anaconda3/envs/peft/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:567: in from_pretrained return model_class.from_pretrained( ../../../anaconda3/envs/peft/lib/python3.10/site-packages/transformers/modeling_utils.py:3563: in from_pretrained hf_quantizer.postprocess_model(model) ../../../anaconda3/envs/peft/lib/python3.10/site-packages/transformers/quantizers/base.py:179: in postprocess_model return self._process_model_after_weight_loading(model, **kwargs) ../../../anaconda3/envs/peft/lib/python3.10/site-packages/transformers/quantizers/quantizer_aqlm.py:80: in _process_model_after_weight_loading model._is_quantized_training_enabled = False _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(32000, 2048) (layers): ModuleList( (0...() ) ) (norm): LlamaRMSNorm() ) (lm_head): Linear(in_features=2048, out_features=32000, bias=False) ) name = '_is_quantized_training_enabled', value = False def __setattr__(self, name: str, value: Union[Tensor, 'Module']) -> None: def remove_from(*dicts_or_sets): for d in dicts_or_sets: if name in d: if isinstance(d, dict): del d[name] else: d.discard(name) params = self.__dict__.get('_parameters') if isinstance(value, Parameter): if params is None: raise AttributeError( "cannot assign parameters before Module.__init__() call") remove_from(self.__dict__, self._buffers, self._modules, self._non_persistent_buffers_set) self.register_parameter(name, value) elif params is not None and name in params: if value is not None: raise TypeError(f"cannot assign '{torch.typename(value)}' as parameter '{name}' " "(torch.nn.Parameter or None expected)" ) self.register_parameter(name, value) else: modules = self.__dict__.get('_modules') if isinstance(value, Module): if modules is None: raise AttributeError( "cannot assign module before Module.__init__() call") remove_from(self.__dict__, self._parameters, self._buffers, self._non_persistent_buffers_set) for hook in _global_module_registration_hooks.values(): output = hook(self, name, value) if output is not None: value = output modules[name] = value elif modules is not None and name in modules: if value is not None: raise TypeError(f"cannot assign '{torch.typename(value)}' as child module '{name}' " "(torch.nn.Module or None expected)" ) for hook in _global_module_registration_hooks.values(): output = hook(self, name, value) if output is not None: value = output modules[name] = value else: buffers = self.__dict__.get('_buffers') if buffers is not None and name in buffers: if value is not None and not isinstance(value, torch.Tensor): raise TypeError(f"cannot assign '{torch.typename(value)}' as buffer '{name}' " "(torch.Tensor or None expected)" ) for hook in _global_buffer_registration_hooks.values(): output = hook(self, name, value) if output is not None: value = output buffers[name] = value else: > super().__setattr__(name, value) E AttributeError: can't set attribute '_is_quantized_training_enabled' ../../../anaconda3/envs/peft/lib/python3.10/site-packages/torch/nn/modules/module.py:1747: AttributeError

Not sure if that's the one that would be fixed by the transformers PR or if it's a different issue.

For that you need to checkout to that transformers PR indeed, maybe we can do a version check of transformers from PEFT side, what do you think? @BenjaminBossan @BlackSamorez

If we know what version this will be contained in, this would be a possibility. It would mean that we don't have a test at all until it's released though.

yes ! It should be included in 4.38.0

@BenjaminBossan @BlackSamorez that's not the error I usually get when using main branch transformers. That would be ValueError: The model you are trying to fine-tune is quantized with aqlm but that quantization method do not support training. Please open an issue on GitHub: https://github.com/huggingface/transformers to request the support for training support for aqlm which is consistent with that PR's logic, which adds the possibility of retruning positive is_trainable when aqlm's version is right.
Your transformers main is out of date and didn't catch this PR.

@BenjaminBossan note in our daily CI we build transformers from main so IMO once the transformers PR is merged we can merge this PR ! 🙏

Looks like it has been merged meaning that transformers main should fully support this PR's tests.
(at least that's the case on my machine)

Okay, so this test should run successfully when we test against transformers main. Still, let's add logic to skip the test if the transformers version is too old to ensure that CI is green even when testing against the transformers release version.

@BenjaminBossan added

@unittest.skipUnless( version.parse(importlib.metadata.version("transformers")) >= version.parse("4.38.0"), "test requires `transformers>=4.38.0`", )

BenjaminBossan

Thanks so much for addressing the last concern, this LGTM now.

We should not forget to move a copy of the model to our internal testing repo, but that can be done in a follow up PR.

I'll leave the merging to @younesbelkada in case he wants to double check the last few changes.

younesbelkada

Great work ! Thanks so much for your great work @BlackSamorez !

* aqlm * Style and copied tests * aqlm import guadr * docs * correct model in tests * Update docs/source/developer_guides/quantization.md Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com> * Update docs/source/developer_guides/quantization.md Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com> * moved aqlm install and added >= * Removed `quant_linear_module` * AqlmLoraLinear * docs update * transformers version check --------- Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

aqlm

7d29b37

younesbelkada reviewed Feb 19, 2024

View reviewed changes

Merge branch 'main' into aqlm

cfd390f

BlackSamorez added 2 commits February 19, 2024 14:15

Style and copied tests

8e01436

aqlm import guadr

d8b075f

pacman100 reviewed Feb 19, 2024

View reviewed changes

docs

dba482e

BenjaminBossan requested changes Feb 19, 2024

View reviewed changes

correct model in tests

0750f48

BlackSamorez mentioned this pull request Feb 20, 2024

Add training version check for AQLM quantizer. huggingface/transformers#29142

Merged

5 tasks

BlackSamorez and others added 7 commits February 20, 2024 16:58

Update docs/source/developer_guides/quantization.md

1b2cea9

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

Update docs/source/developer_guides/quantization.md

255db4f

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

moved aqlm install and added >=

2f7ac3b

Merge branch 'aqlm' of github.com:BlackSamorez/peft into aqlm

2764745

Removed quant_linear_module

b94b02e

AqlmLoraLinear

fb8eadf

docs update

6fed145

younesbelkada approved these changes Feb 21, 2024

View reviewed changes

BenjaminBossan requested changes Feb 21, 2024

View reviewed changes

transformers version check

d3fa2f9

BenjaminBossan approved these changes Feb 21, 2024

View reviewed changes

younesbelkada approved these changes Feb 22, 2024

View reviewed changes

younesbelkada merged commit 23213ca into huggingface:main Feb 22, 2024
14 checks passed

Minami-su mentioned this pull request Mar 27, 2024

support 2bit quip# method #1293

Closed


		Additive Quantization of Language Models ([AQLM](https://arxiv.org/abs/2401.06118)) is a Large Language Models compression method. It quantizes multiple weights together and take advantage of interdependencies between them. AQLM represents groups of 8-16 weights as a sum of multiple vector codes. This allows it to compress models down to as low as 2-bit with considerably low accuracy losses.

		Since the AQLM quantization process is computationally expensive, a use of prequantized models is recommended. A partial list of available models can be found in the official aqlm [repository](https://github.com/Vahe1994/AQLM).



		if is_aqlm_available():
		from aqlm import QuantizedLinear as AqlmQuantizedLinear

AQLM support for LoRA #1476

AQLM support for LoRA #1476

Conversation

BlackSamorez commented Feb 17, 2024

BlackSamorez commented Feb 17, 2024

BlackSamorez commented Feb 19, 2024

younesbelkada left a comment • edited Loading

Choose a reason for hiding this comment

BlackSamorez commented Feb 19, 2024

BenjaminBossan commented Feb 19, 2024

BlackSamorez commented Feb 19, 2024

younesbelkada commented Feb 19, 2024

BlackSamorez commented Feb 19, 2024

BenjaminBossan commented Feb 19, 2024

pacman100 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BlackSamorez commented Feb 19, 2024

HuggingFaceDocBuilderDev commented Feb 19, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BlackSamorez commented Feb 20, 2024

younesbelkada left a comment

Choose a reason for hiding this comment

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

younesbelkada Feb 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BlackSamorez Feb 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BlackSamorez Feb 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenjaminBossan left a comment

Choose a reason for hiding this comment

younesbelkada left a comment

Choose a reason for hiding this comment

younesbelkada left a comment •

edited

Loading

younesbelkada Feb 21, 2024 •

edited

Loading

BlackSamorez Feb 21, 2024 •

edited

Loading

BlackSamorez Feb 21, 2024 •

edited

Loading