Make name of `compressed-tensors` quant method consistent across vLLM #17255

hmellor · 2025-04-27T10:28:35Z

The compressed tensors quantization method was referred to as compressed-tensors and compressed_tensors in different places, they've now been standardised to compressed-tensors.

Issue was discovered in #17130.

Splitting the fix into this PR to reduce the review load on the original.

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

github-actions · 2025-04-27T10:28:43Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mgoin · 2025-04-27T11:31:01Z

This was done for backcompat since some model configs use dash and others use underscore, so I would want to make sure both would still match

hmellor · 2025-04-27T12:45:20Z

Ok, I'll add some post processing to coerce any hyphens to underscores, then after that point we can standardise on compressed_tensors inside vLLM.

Since we already call lower() to standardise the quant method string it shouldn't be controversial to call replace("-", "_") for the same reason.

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

hmellor · 2025-04-27T18:30:27Z

The error in the fast check seems unrelated link:

[2025-04-27T14:09:03Z] ERROR 04-27 07:09:03 [serving_completion.py:116] ValueError: This model's maximum context length is 8192 tokens. However, you requested 10010 tokens (10000 in the messages, 10 in the completion). Please reduce the length of the messages or completion.

mgoin · 2025-04-28T01:42:19Z

FWIW the standard is compressed-tensors if we want to convert to dash instead for it:

compressed-tensors: 1,209 models https://huggingface.co/models?other=compressed-tensors
compressed_tensors: 1 model https://huggingface.co/models?other=compressed_tensors

tests/compile/test_full_graph.py

vllm/transformers_utils/config.py

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

hmellor · 2025-04-28T09:06:07Z

Since compressed-tensors is more correct, I've switched to using that instead

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

mgoin

LGTM, thanks for getting this in a good state

…vllm-project#17255) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

…vllm-project#17255) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

…vllm-project#17255) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>

Make name of compressed_tensors quant method consistent across vLLM

ad7749c

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

hmellor requested review from mgoin, robertgshaw2-redhat and tlrmchlsmth as code owners April 27, 2025 10:28

hmellor mentioned this pull request Apr 27, 2025

Improve configs - ModelConfig #17130

Merged

mergify bot added the tpu Related to Google TPUs label Apr 27, 2025

hmellor added the quantization label Apr 27, 2025

Ensure no hyphens in quant_method

d7f4616

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

mgoin reviewed Apr 28, 2025

View reviewed changes

tests/compile/test_full_graph.py Outdated Show resolved Hide resolved

vllm/transformers_utils/config.py Outdated Show resolved Hide resolved

hmellor changed the title ~~Make name of compressed_tensors quant method consistent across vLLM~~ Make name of compressed-tensors quant method consistent across vLLM Apr 28, 2025

compressed-tensors instead of compressed_tensors

b8ab6ee

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Depete quant specifiers in test

7228c3a

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

hmellor requested a review from mgoin April 28, 2025 09:10

mgoin approved these changes Apr 28, 2025

View reviewed changes

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 28, 2025

hmellor enabled auto-merge (squash) April 28, 2025 15:57

hmellor merged commit b6dd32a into vllm-project:main Apr 28, 2025
65 checks passed

hmellor deleted the consolidate-conpressed-tensors branch April 28, 2025 16:28

jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Apr 29, 2025

Make name of compressed-tensors quant method consistent across vLLM (…

f01ec77

…vllm-project#17255) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

Make name of compressed-tensors quant method consistent across vLLM (…

eabf411

…vllm-project#17255) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

mgoin mentioned this pull request May 13, 2025

[Feature]Add support for models quantized with AutoRound #17850

Merged

ckhordiasma mentioned this pull request May 14, 2025

nm vllm ent 0.8.5 sync red-hat-data-services/vllm#139

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Make name of `compressed-tensors` quant method consistent across vLLM #17255

Make name of `compressed-tensors` quant method consistent across vLLM #17255

Uh oh!

hmellor commented Apr 27, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Apr 27, 2025

Uh oh!

mgoin commented Apr 27, 2025

Uh oh!

hmellor commented Apr 27, 2025 •

edited

Loading

Uh oh!

hmellor commented Apr 27, 2025

Uh oh!

mgoin commented Apr 28, 2025

Uh oh!

Uh oh!

Uh oh!

hmellor commented Apr 28, 2025 •

edited

Loading

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Make name of compressed-tensors quant method consistent across vLLM #17255

Make name of compressed-tensors quant method consistent across vLLM #17255

Uh oh!

Conversation

hmellor commented Apr 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 27, 2025

Uh oh!

mgoin commented Apr 27, 2025

Uh oh!

hmellor commented Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hmellor commented Apr 27, 2025

Uh oh!

mgoin commented Apr 28, 2025

Uh oh!

Uh oh!

Uh oh!

hmellor commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Make name of `compressed-tensors` quant method consistent across vLLM #17255

Make name of `compressed-tensors` quant method consistent across vLLM #17255

hmellor commented Apr 27, 2025 •

edited by github-actions bot

Loading

hmellor commented Apr 27, 2025 •

edited

Loading

hmellor commented Apr 28, 2025 •

edited

Loading