Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc]: BNB 8 bit quantization is undocumented #10723

Closed
1 task done
molereddy opened this issue Nov 27, 2024 · 10 comments
Closed
1 task done

[Doc]: BNB 8 bit quantization is undocumented #10723

molereddy opened this issue Nov 27, 2024 · 10 comments
Labels
documentation Improvements or additions to documentation

Comments

@molereddy
Copy link

📚 The doc issue

BNB 8 bit quantization is apparently supported as of #7445, but there is no detail on how to load in 8 bit on the BNB documentation page

Suggest a potential alternative/fix

Give an example of using load_in_4bit/load_in_8bit on the documentation page

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@molereddy molereddy added the documentation Improvements or additions to documentation label Nov 27, 2024
@jeejeelee
Copy link
Collaborator

Indeed, please feel free to contribute this. Thank you very much!

@molereddy
Copy link
Author

@jeejeelee I actually am unsure about the usage myself. I was hoping someone could help me out with that. I've seen the PR where 8 bit was introduced,but wasn't able to which arguments I must change while calling LLM().

@molereddy
Copy link
Author

I did request the author of the PR for clarification #7445 (comment)

@jeejeelee
Copy link
Collaborator

IIUC, you don't need to set the specific argument (see:https://github.com/vllm-project/vllm/blob/main/tests/quantization/test_bitsandbytes.py#L24), like :

llm = LLM(
    model=model_name,
    trust_remote_code=True,
    load_format="bitsandbytes",
    quantization="bitsandbytes",
)

@molereddy
Copy link
Author

molereddy commented Nov 28, 2024

@jeejeelee the code you shared works to give an 8-bit quantized BNB model when the model_name (model ID or path) corresponds to a 8 bit already BNB-quantized checkpoint.

But, as described in the docs, vLLM supports in-flight quantization, which takes the base full precision model ID and returns the 4-bit BNB quantized model. To achieve this you run the same code from your comment but give a full precision model path. Though you never mention the precision in this function call, it always returns a 4-bit quantized version.

In-flight quantization is also supported in HuggingFace, which on the other hand, does the in-flight BNB quantization using load_in_4bit/load_in_bit arguments (see here) to customize precision while creating a quantized model from the base full precision checkpoint.

vLLM's own BitsAndBytesConfig class has load_in_4bit/load_in_8bit flags present, but it is unclear how we pass these in while calling LLM().

The definition of the LLM class has no such information. It only takes one-related argument (quantization).

@jeejeelee
Copy link
Collaborator

Currently, vLLM only supports 4-bit for in-flight quantization, see: https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/model_loader/loader.py#L997.
vLLM's load_in_4bit/load_in_8bit arg are used for pre-quantization(at least for now). In general, they are obtained from the model's configuration file,see: https://huggingface.co/openbmb/MiniCPM-V-2_6-int4/blob/main/config.json#L28

@molereddy
Copy link
Author

Should I close this then?

@jeejeelee
Copy link
Collaborator

Should I close this then?

Could you please submit a PR to clarify in the documentation that inflight quantization only supports 4-bit quantization? thanks very much

@molereddy
Copy link
Author

The documentation does say that.

@ShelterWFF
Copy link

There is currently no support for Inflight 8bit quantization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants