Add integration with gemlite weight only quant #2528

jerryzh168 · 2024-12-19T18:35:33Z

Summary:
gemlite Only available with nightly torchao right now (or install from source)

Test Plan:

python3 -m sglang.bench_one_batch --model meta-llama/Llama-3.1-8B-Instruct --batch-size 1 --input 1024 --output 512 --json-model-override-args '{"architectures": ["TorchNativeLlamaForCausalLM"]}' --enable-torch-compile —torchao-config gemlite-4-64 --tp-size 1

Reviewers:

Subscribers:

Tasks:

Tags:

Motivation

Modifications

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

Summary: gemlite Only available with nightly torchao right now (or install from source) Test Plan: ``` python3 -m sglang.bench_one_batch --model meta-llama/Llama-3.1-8B-Instruct --batch-size 1 --input 1024 --output 512 --json-model-override-args '{"architectures": ["TorchNativeLlamaForCausalLM"]}' --enable-torch-compile —torchao-config gemlite-4-64 --tp-size 1 ``` Reviewers: Subscribers: Tasks: Tags:

zhyncs · 2024-12-19T18:42:08Z

Hi @jerryzh168 What is the release cycle of torchao? I can accept using the torchao nightly version, maybe you can try enabling it in the https://github.com/sgl-project/sglang/blob/main/python/pyproject.toml. What do you think? cc @merrymercy @Ying1123 @ispobock

python/sglang/srt/layers/torchao_utils.py

jerryzh168 · 2024-12-19T18:44:25Z

Hi @jerryzh168 What is the release cycle of torchao? I can accept using the torchao nightly version, maybe you can try enabling it in the main/python/pyproject.toml. What do you think? cc @merrymercy @Ying1123 @ispobock

we have ~ monthly releases, yeah depend on nightly version would be better for now, and we can update to a stable version a bit later I think

jerryzh168 · 2024-12-19T18:53:32Z

I tried pip install torchao>=0.8.0.dev20241219 but it doesn't work, we probably need to use

pip install --pre torchao --index-url https://download.pytorch.org/whl/nightly/cu124

to install the nightly version, do we just want to add a version check here?

jerryzh168 · 2024-12-19T23:02:20Z

@zhyncs I think we can land, it's fine to have this as an experimental feature for now I think, I added a print to ask people to use torchao nightly

This reverts commit b749db9.

jerryzh168 requested review from merrymercy, Ying1123, zhyncs and ispobock as code owners December 19, 2024 18:35

jerryzh168 commented Dec 19, 2024

View reviewed changes

python/sglang/srt/layers/torchao_utils.py Outdated Show resolved Hide resolved

formatting

6aabf04

jerryzh168 added 3 commits December 19, 2024 10:53

format

c448c77

add gemlite dep

21cd459

add error checks

bb9f133

jerryzh168 and others added 2 commits December 19, 2024 15:07

format

88f4ece

Merge branch 'main' into add-gemlite

111e0fe

zhyncs approved these changes Dec 20, 2024

View reviewed changes

zhyncs merged commit feb2b76 into sgl-project:main Dec 20, 2024
15 checks passed

zhyncs added a commit that referenced this pull request Dec 21, 2024

fix #2528

b749db9

zhyncs added a commit that referenced this pull request Dec 21, 2024

Revert "fix #2528"

0d076ad

This reverts commit b749db9.

zhyncs added a commit that referenced this pull request Dec 21, 2024

fix #2528

a089ba3

zhyncs added a commit that referenced this pull request Dec 21, 2024

fix #2528

06f3818

zhyncs added a commit that referenced this pull request Dec 21, 2024

fix #2528

78246d9

zhyncs added a commit that referenced this pull request Dec 21, 2024

fix #2528 (#2541)

4e1e3cf

chosen-ox pushed a commit to chosen-ox/sglang that referenced this pull request Dec 22, 2024

Add integration with gemlite weight only quant (sgl-project#2528)

5773c63

chosen-ox pushed a commit to chosen-ox/sglang that referenced this pull request Dec 22, 2024

fix sgl-project#2528 (sgl-project#2541)

54a9d69

merrymercy mentioned this pull request Dec 26, 2024

torcho gemlite integration #2498

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add integration with gemlite weight only quant #2528

Add integration with gemlite weight only quant #2528

jerryzh168 commented Dec 19, 2024

zhyncs commented Dec 19, 2024

jerryzh168 commented Dec 19, 2024

jerryzh168 commented Dec 19, 2024

jerryzh168 commented Dec 19, 2024 •

edited

Loading

Add integration with gemlite weight only quant #2528

Add integration with gemlite weight only quant #2528

Conversation

jerryzh168 commented Dec 19, 2024

Motivation

Modifications

Checklist

zhyncs commented Dec 19, 2024

jerryzh168 commented Dec 19, 2024

jerryzh168 commented Dec 19, 2024

jerryzh168 commented Dec 19, 2024 • edited Loading

jerryzh168 commented Dec 19, 2024 •

edited

Loading