New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Feature] Are there plans to support AWQ and torch compile? #1991

Open

2 tasks done

sitabulaixizawaluduo opened this issue Nov 11, 2024 · 3 comments

Open

2 tasks done

[Feature] Are there plans to support AWQ and torch compile? #1991

sitabulaixizawaluduo opened this issue Nov 11, 2024 · 3 comments

sitabulaixizawaluduo commented Nov 11, 2024

Checklist

1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
2. Please use English, otherwise it will be closed.

Motivation

Are there plans to support AWQ and torch compile?

Related resources

No response

Contributor

merrymercy commented Nov 14, 2024

Yes. I think we should be able to do this with torchao.
cc @jerryzh168 @msaroufim

merrymercy mentioned this issue

[Bug] Unable to use gptq or awq with torch.compile (8*A40) #1522

Closed

5 tasks

Contributor

jerryzh168 commented Nov 14, 2024

yes, we have awq (https://github.com/pytorch/ao/tree/main/torchao/prototype/awq) and GPTQ (https://github.com/pytorch/ao/blob/main/torchao/quantization/GPTQ_MT.py) implementations, both are compatible with torch.compile I think

code example for awq (uintx but only int4wo has speedup I think): https://github.com/pytorch/ao/blob/06ad55acb0d034a4e98e82a9eeddbd41d4d94b31/torchao/_models/llama/generate.py#L258-L284

code example for gptq (int4wo): https://github.com/pytorch/ao/blob/06ad55acb0d034a4e98e82a9eeddbd41d4d94b31/torchao/_models/llama/eval.py#L105-L127

Contributor

jerryzh168 commented Nov 14, 2024

this requires full model quantization though I think, so we have change our current way of integrating torchao (which is applied per layer)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment