-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[Bug] Fix Cutlass Scaled MM Compilation Error #24887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Fix Cutlass Scaled MM Compilation Error #24887
Conversation
Signed-off-by: yewentao256 <zhyanwentao@126.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively resolves a compilation error in the CUTLASS scaled matrix multiplication kernels. The primary fix involves replacing a problematic aggregate initialization of MainloopArguments with explicit member assignments, which enhances code clarity and robustness. Additionally, the changes correctly enforce const correctness for input tensor data pointers. The fix is well-implemented and consistently applied across different SM architectures. I have one suggestion to further improve code maintainability by reducing redundancy.
csrc/quantization/cutlass_w8a8/c3x/scaled_mm_blockwise_sm100_fp8_dispatch.cuh
Show resolved
Hide resolved
Signed-off-by: yewentao256 <zhyanwentao@126.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me. Do you know what caused the issue locally for you while it works in CI? Are you using CUDA 13 or something?
csrc/quantization/cutlass_w8a8/c3x/scaled_mm_blockwise_sm100_fp8_dispatch.cuh
Show resolved
Hide resolved
This can be reproduced in beaker (H100 CUDA Version: 12.8). |
|
@yewentao256 we don't have Hopper runners in CI, but we do build for the arch obviously and use cu128 by default. I would expect this to be an nvcc version issue, so that is strange.. |
|
|
Let's get this landed since all CI tests passes, and avoid others meeting the same issue. |
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: bbartels <benjamin@bartels.dev> [gpt-oss] Add IncompleteDetails to ResponsesRepsonse (vllm-project#24561) Signed-off-by: Andrew Xia <axia@meta.com> [gpt-oss][1a] create_responses stream outputs BaseModel type, api server is SSE still (vllm-project#24759) Signed-off-by: Andrew Xia <axia@meta.com> [Performance] Remove redundant clone() calls in cutlass_mla (vllm-project#24891) [Bug] Fix Cutlass Scaled MM Compilation Error (vllm-project#24887) Signed-off-by: yewentao256 <zhyanwentao@126.com> [ci] fix wheel names for arm wheels (vllm-project#24898) Signed-off-by: simon-mo <simon.mo@hey.com> [Tests] fix initialization of kv hash in tests (vllm-project#24273) Signed-off-by: Mickael Seznec <mickael@mistral.ai> [Compile] Fix noop_elimination pass and add tests for noop_elimination (vllm-project#24880) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Propagate entire tokens to connector for resumed preemptions Signed-off-by: Qier Li <kevin44036@gmail.com> Fix pre-commit Signed-off-by: Qier Li <kevin44036@gmail.com> Rename field and nullify empty lists Signed-off-by: Qier Li <kevin44036@gmail.com> Update vllm/v1/core/sched/scheduler.py Co-authored-by: Nick Hill <nhill@redhat.com> Signed-off-by: Qier Li <kevin44036@gmail.com> Add unit test for preemption resumption Signed-off-by: Qier Li <kevin44036@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Purpose
Test
Now: