[Doc]: BNB 8 bit quantization is undocumented #10723

molereddy · 2024-11-27T18:59:10Z

📚 The doc issue

BNB 8 bit quantization is apparently supported as of #7445, but there is no detail on how to load in 8 bit on the BNB documentation page

Suggest a potential alternative/fix

Give an example of using load_in_4bit/load_in_8bit on the documentation page

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

jeejeelee · 2024-11-28T01:28:20Z

Indeed, please feel free to contribute this. Thank you very much!

molereddy · 2024-11-28T05:35:53Z

@jeejeelee I actually am unsure about the usage myself. I was hoping someone could help me out with that. I've seen the PR where 8 bit was introduced,but wasn't able to which arguments I must change while calling LLM().

molereddy · 2024-11-28T05:37:58Z

I did request the author of the PR for clarification #7445 (comment)

jeejeelee · 2024-11-28T06:02:10Z

IIUC, you don't need to set the specific argument (see:https://github.com/vllm-project/vllm/blob/main/tests/quantization/test_bitsandbytes.py#L24), like :

llm = LLM(
    model=model_name,
    trust_remote_code=True,
    load_format="bitsandbytes",
    quantization="bitsandbytes",
)

molereddy · 2024-11-28T18:18:38Z

@jeejeelee the code you shared works to give an 8-bit quantized BNB model when the model_name (model ID or path) corresponds to a 8 bit already BNB-quantized checkpoint.

But, as described in the docs, vLLM supports in-flight quantization, which takes the base full precision model ID and returns the 4-bit BNB quantized model. To achieve this you run the same code from your comment but give a full precision model path. Though you never mention the precision in this function call, it always returns a 4-bit quantized version.

In-flight quantization is also supported in HuggingFace, which on the other hand, does the in-flight BNB quantization using load_in_4bit/load_in_bit arguments (see here) to customize precision while creating a quantized model from the base full precision checkpoint.

vLLM's own BitsAndBytesConfig class has load_in_4bit/load_in_8bit flags present, but it is unclear how we pass these in while calling LLM().

The definition of the LLM class has no such information. It only takes one-related argument (quantization).

jeejeelee · 2024-11-29T02:16:25Z

Currently, vLLM only supports 4-bit for in-flight quantization, see: https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/model_loader/loader.py#L997.
vLLM's load_in_4bit/load_in_8bit arg are used for pre-quantization(at least for now). In general, they are obtained from the model's configuration file,see: https://huggingface.co/openbmb/MiniCPM-V-2_6-int4/blob/main/config.json#L28

molereddy · 2024-11-29T03:08:48Z

Should I close this then?

jeejeelee · 2024-11-29T03:27:22Z

Should I close this then?

Could you please submit a PR to clarify in the documentation that inflight quantization only supports 4-bit quantization? thanks very much

molereddy · 2024-12-01T00:23:41Z

The documentation does say that.

ShelterWFF · 2024-12-30T06:35:38Z

There is currently no support for Inflight 8bit quantization.

molereddy added the documentation Improvements or additions to documentation label Nov 27, 2024

molereddy closed this as completed Dec 1, 2024

noooop mentioned this issue Jan 6, 2025

[Feature]: Support Inflight quantization: load as 8bit quantization. #11655

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc]: BNB 8 bit quantization is undocumented #10723

[Doc]: BNB 8 bit quantization is undocumented #10723

molereddy commented Nov 27, 2024

jeejeelee commented Nov 28, 2024

molereddy commented Nov 28, 2024

molereddy commented Nov 28, 2024

jeejeelee commented Nov 28, 2024

molereddy commented Nov 28, 2024 •

edited

Loading

jeejeelee commented Nov 29, 2024

molereddy commented Nov 29, 2024

jeejeelee commented Nov 29, 2024

molereddy commented Dec 1, 2024

ShelterWFF commented Dec 30, 2024

[Doc]: BNB 8 bit quantization is undocumented #10723

[Doc]: BNB 8 bit quantization is undocumented #10723

Comments

molereddy commented Nov 27, 2024

📚 The doc issue

Suggest a potential alternative/fix

Before submitting a new issue...

jeejeelee commented Nov 28, 2024

molereddy commented Nov 28, 2024

molereddy commented Nov 28, 2024

jeejeelee commented Nov 28, 2024

molereddy commented Nov 28, 2024 • edited Loading

jeejeelee commented Nov 29, 2024

molereddy commented Nov 29, 2024

jeejeelee commented Nov 29, 2024

molereddy commented Dec 1, 2024

ShelterWFF commented Dec 30, 2024

molereddy commented Nov 28, 2024 •

edited

Loading