Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support inference AWQ INT4 model of Yi-34B from QLoRA #11946

Open
Fred-cell opened this issue Aug 28, 2024 · 1 comment
Open

support inference AWQ INT4 model of Yi-34B from QLoRA #11946

Fred-cell opened this issue Aug 28, 2024 · 1 comment

Comments

@Fred-cell
Copy link

Fred-cell commented Aug 28, 2024

I will provide AWQ model from customer, and customer will evaluate FP8 and Int4 performance.

@gc-fu
Copy link
Contributor

gc-fu commented Aug 28, 2024

Hi, I have verified AWQ models can be supported (loaded in vLLM and converted to LowBitLinear in ipex-llm), but only asym_int4 quantization format is supported.

This feature will need some adaption from both vLLM side and ipex-llm side. I will update to this thread once those supported prs are merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants