initial commit on compressed-tensors quantization support for fp8 #1011

qihqi · 2025-11-04T21:59:43Z

Description

This PR is to enable models with FusedMoE module (such as Qwen3-coder) to run with compressed-tensors quantization scheme.

NOTE:
currently because of old jax version, we need to change
this line /home/hanq_google_com/uv_venv/lib/python3.12/site-packages/jax/experimental/pallas/ops/tpu/megablox/common.py: 44
which is:

_TPU_KIND_PATTERN = re.compile(r"TPU v(\d+)")

To:

_TPU_KIND_PATTERN = re.compile(r"TPU.*(\d+)")

Tests

MODEL_IMPL_TYPE=vllm VLLM_DISABLE_SHARED_EXPERTS_STREAM=1 python examples/offline_inference.py --model=BCCard/Qwen3-Coder-480B-A35B-Instruct-FP8-Dynamic --tensor_parallel_size=8 --task=generate --max_model_len=128 --max_num_seqs=1

runs and produces good result.

github-actions · 2025-11-04T21:59:57Z

Description

Start with a short description of what the PR does and how this is a change from
the past.

The rest of the description includes relevant details and context, examples:

why is this change being made,
the problem being solved and any relevant context,
why this is a good solution,
some information about the specific implementation,
shortcomings of the solution and possible future improvements.

If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/123456
FIXES: #123456

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure:

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have made or will make corresponding changes to any relevant documentation.

Signed-off-by: Han Qi <hanq@google.com>

tpu_inference/layers/vllm/quantization/compressed_tensors/compressed_tensors.py

tpu_inference/layers/vllm/quantization/compressed_tensors/compressed_tensors_moe.py

Signed-off-by: Han Qi <hanq@google.com>

kyuyeunk

I've noticed that the code doesn't have any unit test? Can you add one?

tpu_inference/layers/vllm/quantization/compressed_tensors/compressed_tensors_moe.py

qihqi · 2025-11-10T19:37:34Z

I've noticed that the code doesn't have any unit test? Can you add one?

added unit test, PTAL. Thanks!

Signed-off-by: Han Qi <hanq@google.com>

qihqi force-pushed the qihqi-qwen3-fp8 branch 2 times, most recently from a0462b4 to 72a458f Compare November 6, 2025 04:12

initial commit on compressed-tensors quantization support for fp8

c0bd708

Signed-off-by: Han Qi <hanq@google.com>

qihqi force-pushed the qihqi-qwen3-fp8 branch from 72a458f to c0bd708 Compare November 6, 2025 04:16

qihqi marked this pull request as ready for review November 6, 2025 04:19

qihqi requested review from hfan and kyuyeunk November 6, 2025 04:19

kyuyeunk reviewed Nov 7, 2025

View reviewed changes

address comments

a73e34a

Signed-off-by: Han Qi <hanq@google.com>

qihqi requested a review from kyuyeunk November 7, 2025 04:24

kyuyeunk reviewed Nov 7, 2025

View reviewed changes

tpu_inference/layers/vllm/quantization/compressed_tensors/compressed_tensors_moe.py Outdated Show resolved Hide resolved

kyuyeunk reviewed Nov 7, 2025

View reviewed changes

qihqi requested a review from kyuyeunk November 10, 2025 19:37

Add unit test

6c68136

Signed-off-by: Han Qi <hanq@google.com>

qihqi force-pushed the qihqi-qwen3-fp8 branch from 08c4667 to 6c68136 Compare November 10, 2025 19:46

remove init

0e136dc

Signed-off-by: Han Qi <hanq@google.com>

kyuyeunk approved these changes Nov 10, 2025

View reviewed changes

qihqi merged commit 5c04ffa into main Nov 10, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

initial commit on compressed-tensors quantization support for fp8 #1011

initial commit on compressed-tensors quantization support for fp8 #1011

qihqi commented Nov 4, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kyuyeunk left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qihqi commented Nov 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

initial commit on compressed-tensors quantization support for fp8 #1011

initial commit on compressed-tensors quantization support for fp8 #1011

Conversation

qihqi commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Uh oh!

github-actions bot commented Nov 4, 2025

Description

Tests

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kyuyeunk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qihqi commented Nov 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qihqi commented Nov 4, 2025 •

edited

Loading