Skip to content

Conversation

@qihqi
Copy link
Collaborator

@qihqi qihqi commented Nov 4, 2025

Description

This PR is to enable models with FusedMoE module (such as Qwen3-coder) to run with compressed-tensors quantization scheme.

NOTE:
currently because of old jax version, we need to change
this line /home/hanq_google_com/uv_venv/lib/python3.12/site-packages/jax/experimental/pallas/ops/tpu/megablox/common.py: 44
which is:

_TPU_KIND_PATTERN = re.compile(r"TPU v(\d+)")

To:

_TPU_KIND_PATTERN = re.compile(r"TPU.*(\d+)")

Tests

MODEL_IMPL_TYPE=vllm VLLM_DISABLE_SHARED_EXPERTS_STREAM=1 python examples/offline_inference.py --model=BCCard/Qwen3-Coder-480B-A35B-Instruct-FP8-Dynamic --tensor_parallel_size=8 --task=generate --max_model_len=128 --max_num_seqs=1

runs and produces good result.

@github-actions
Copy link

github-actions bot commented Nov 4, 2025

Description

Start with a short description of what the PR does and how this is a change from
the past.

The rest of the description includes relevant details and context, examples:

  • why is this change being made,
  • the problem being solved and any relevant context,
  • why this is a good solution,
  • some information about the specific implementation,
  • shortcomings of the solution and possible future improvements.

If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/123456
FIXES: #123456

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure:

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have made or will make corresponding changes to any relevant documentation.

@qihqi qihqi force-pushed the qihqi-qwen3-fp8 branch 2 times, most recently from a0462b4 to 72a458f Compare November 6, 2025 04:12
@qihqi qihqi marked this pull request as ready for review November 6, 2025 04:19
@qihqi qihqi requested review from hfan and kyuyeunk November 6, 2025 04:19
Signed-off-by: Han Qi <hanq@google.com>
@qihqi qihqi requested a review from kyuyeunk November 7, 2025 04:24
Copy link
Collaborator

@kyuyeunk kyuyeunk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed that the code doesn't have any unit test? Can you add one?

@qihqi
Copy link
Collaborator Author

qihqi commented Nov 10, 2025

I've noticed that the code doesn't have any unit test? Can you add one?

added unit test, PTAL. Thanks!

@qihqi qihqi requested a review from kyuyeunk November 10, 2025 19:37
Signed-off-by: Han Qi <hanq@google.com>
Signed-off-by: Han Qi <hanq@google.com>
@qihqi qihqi merged commit 5c04ffa into main Nov 10, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants