support inference AWQ INT4 model of Yi-34B from QLoRA #11946

Fred-cell · 2024-08-28T01:12:38Z

I will provide AWQ model from customer, and customer will evaluate FP8 and Int4 performance.

gc-fu · 2024-08-28T01:24:27Z

Hi, I have verified AWQ models can be supported (loaded in vLLM and converted to LowBitLinear in ipex-llm), but only asym_int4 quantization format is supported.

This feature will need some adaption from both vLLM side and ipex-llm side. I will update to this thread once those supported prs are merged.

glorysdj assigned gc-fu Aug 28, 2024

glorysdj added the user issue label Aug 28, 2024

glorysdj added the multi-arc label Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support inference AWQ INT4 model of Yi-34B from QLoRA #11946

support inference AWQ INT4 model of Yi-34B from QLoRA #11946

Fred-cell commented Aug 28, 2024 •

edited

Loading

gc-fu commented Aug 28, 2024

support inference AWQ INT4 model of Yi-34B from QLoRA #11946

support inference AWQ INT4 model of Yi-34B from QLoRA #11946

Comments

Fred-cell commented Aug 28, 2024 • edited Loading

gc-fu commented Aug 28, 2024

Fred-cell commented Aug 28, 2024 •

edited

Loading