-
Notifications
You must be signed in to change notification settings - Fork 666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add integration with gemlite weight only quant #2528
Conversation
Summary: gemlite Only available with nightly torchao right now (or install from source) Test Plan: ``` python3 -m sglang.bench_one_batch --model meta-llama/Llama-3.1-8B-Instruct --batch-size 1 --input 1024 --output 512 --json-model-override-args '{"architectures": ["TorchNativeLlamaForCausalLM"]}' --enable-torch-compile —torchao-config gemlite-4-64 --tp-size 1 ``` Reviewers: Subscribers: Tasks: Tags:
Hi @jerryzh168 What is the release cycle of torchao? I can accept using the torchao nightly version, maybe you can try enabling it in the https://github.com/sgl-project/sglang/blob/main/python/pyproject.toml. What do you think? cc @merrymercy @Ying1123 @ispobock |
we have ~ monthly releases, yeah depend on nightly version would be better for now, and we can update to a stable version a bit later I think |
I tried
to install the nightly version, do we just want to add a version check here? |
@zhyncs I think we can land, it's fine to have this as an experimental feature for now I think, I added a print to ask people to use torchao nightly |
Summary:
gemlite Only available with nightly torchao right now (or install from source)
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Motivation
Modifications
Checklist