Skip to content

Conversation

@IwakuraRein
Copy link
Collaborator

@IwakuraRein IwakuraRein commented Aug 25, 2025

📌 Description

  • remove .to(torch.bfloat16) which causes illegal memory access when using deepseek v3 routing.
  • handle the case when routing_logits is None to support trtllm_fp4_block_scale_routed_moe
  • fix the invalid argument error in llama and renormalize routing

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

# TODO(siyuan): support fp8
moe_op.trtllm_fp4_block_scale_moe(
routing_logits.to(torch.bfloat16),
routing_logits,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove .to(torch.bfloat16) which causes illegal memory access when using deepseek v3 routing.

could you explain why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IwakuraRein IwakuraRein changed the title update trtllm-gen fp4 autotuner update trtllm-gen fp4 autotuner and routing Aug 26, 2025
@cyx-6 cyx-6 merged commit 8ce1b08 into flashinfer-ai:main Aug 27, 2025
2 checks passed
nvpohanh added a commit to nvpohanh/vllm that referenced this pull request Sep 4, 2025
Mainly to get the GPT-OSS MXFP4 trtllm-gen MoE autotuning and the bug
fix in: flashinfer-ai/flashinfer#1573

Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants