Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to train with 2bit quantization model? #10

Open
duany049 opened this issue Dec 15, 2023 · 6 comments
Open

how to train with 2bit quantization model? #10

duany049 opened this issue Dec 15, 2023 · 6 comments

Comments

@duany049
Copy link

I found the implementation of 4-bit quantified , but I couldn't find a 2-bit one. Can you tell me how to implement a finereturn for a 2-bit quantization model

@yxli2123
Copy link
Owner

Hi @duany049, we have moved our quantization framework into PEFT.

You can use the command here to obtain 2bit weights: https://github.com/yxli2123/LoftQ/tree/main#apply-loftq-and-save. Just change to --bits 2.

Keep in mind that we only provide 2-bit equivalent fp16 weights because 2-bit backend is not supported by bitsandbytes. If you have limited resource, we suggest you load the 2-bit equivalent fp16 weights in 4 bit by bitsandbytes, which saves 75% GPU compared to fp16.

@duany049
Copy link
Author

Thanks for you reply.
I changed --bits from 4 to 2 as you said. But the following exception was thrown.

  File "/data2/duan/miniconda3/envs/loftq/lib/python3.11/site-packages/peft/utils/loftq_utils.py", line 215, in loftq_init
    quantized_weight, max_abs, shape = quantizer.quantize_block(res)
                                       ^^^^^^^^^
UnboundLocalError: cannot access local variable 'quantizer' where it is not associated with a value

I fixed the problem by adding an new condition: num_bits == 2 in line 201, below is the code:

    if not is_bnb_4bit_available() or num_bits == 2:
        quantizer = NFQuantizer(num_bits=num_bits, device=device, method="normal", block_size=64)

Is my modification correct? Do I need to submit the code?

@duany049
Copy link
Author

Hi @duany049, we have moved our quantization framework into PEFT.

You can use the command here to obtain 2bit weights: https://github.com/yxli2123/LoftQ/tree/main#apply-loftq-and-save. Just change to --bits 2.

Keep in mind that we only provide 2-bit equivalent fp16 weights because 2-bit backend is not supported by bitsandbytes. If you have limited resource, we suggest you load the 2-bit equivalent fp16 weights in 4 bit by bitsandbytes, which saves 75% GPU compared to fp16.

I have fineturned 2bits llama2-7b with fakequantization, could I merge the adapter and 2bit model to a 2bit merged model?

@yxli2123
Copy link
Owner

Hi @duany049, please install the up-to-date peft by pip install git+https://github.com/huggingface/peft.git. This issue has been resolved in the up-to-date version. https://github.com/huggingface/peft/blob/main/src/peft/utils/loftq_utils.py#L201

@duany049
Copy link
Author

duany049 commented Dec 25, 2023

Hi @duany049, we have moved our quantization framework into PEFT.

You can use the command here to obtain 2bit weights: https://github.com/yxli2123/LoftQ/tree/main#apply-loftq-and-save. Just change to --bits 2.

Keep in mind that we only provide 2-bit equivalent fp16 weights because 2-bit backend is not supported by bitsandbytes. If you have limited resource, we suggest you load the 2-bit equivalent fp16 weights in 4 bit by bitsandbytes, which saves 75% GPU compared to fp16.

Thank you for your reply. I have an another question:

  1. Could I load the 2-bit equivalent fp16 weights in 2 bit by AutoGPTQ
  2. Will it help me save 87.5% of GPU compared to fp16?

@yxli2123
Copy link
Owner

yxli2123 commented Jan 7, 2024

  1. No, because I don't think AutoGPTQ and NF2 (a variant version of NF4) use the same quantization function.
  2. No, since it uses NF4 on GPU. It can only save up to 75% of GPU compared to fp16, even if the values are mathematically equivalent to 2-bit values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants