Enable vllm load gptq model #12083

hzjane · 2024-09-14T05:42:32Z

Description

Enable vllm load gptq model.
Have tested Llama-2-13B-chat-GPTQ and Llama-2-7B-Chat-GPTQ will vllm.

1. Why the change?

2. User API changes

Only supports asym_int4.

llm = LLM(model="/llm/models/Llama-2-13B-chat-GPTQ",
          quantization="GPTQ",
          load_in_low_bit="asym_int4",
          ....
)

or

python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server \
--quantization gptq
--load-in-low-bit asym_int4
...

3. Summary of the change

4. How to test?

N/A
Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
Application test
Document test
...

5. New dependencies

New Python dependencies
- Dependency1
- Dependency2
- ...
New Java/Scala dependencies and their license
- Dependency1 and license1
- Dependency2 and license2
- ...

jason-dai · 2024-09-18T02:07:29Z

python/llm/src/ipex_llm/transformers/convert.py

@@ -291,16 +296,23 @@ def convert_vllm(module, qtype, in_features, out_features, mp_group, cur_qtype,
 return new_linear


-def convert_vllm_awq(module):
+def convert_vllm_awq(module, gptq=False, act_order=False):


convert_vllm_awq_or_gptq

glorysdj

LGTM

hzjane added 3 commits September 14, 2024 13:23

enable vllm load gptq model

d023d17

update

b8fa924

update

6940534

hzjane marked this pull request as ready for review September 14, 2024 06:18

hzjane requested review from gc-fu and glorysdj September 18, 2024 01:24

jason-dai reviewed Sep 18, 2024

View reviewed changes

hzjane added 2 commits September 18, 2024 10:28

update

19f7535

update style

a22cc84

glorysdj approved these changes Sep 18, 2024

View reviewed changes

hzjane merged commit 40e463c into intel-analytics:main Sep 18, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable vllm load gptq model #12083

Enable vllm load gptq model #12083

hzjane commented Sep 14, 2024 •

edited

Loading

jason-dai Sep 18, 2024

glorysdj left a comment

Enable vllm load gptq model #12083

Enable vllm load gptq model #12083

Conversation

hzjane commented Sep 14, 2024 • edited Loading

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

5. New dependencies

jason-dai Sep 18, 2024

Choose a reason for hiding this comment

glorysdj left a comment

Choose a reason for hiding this comment

hzjane commented Sep 14, 2024 •

edited

Loading