-
-
Notifications
You must be signed in to change notification settings - Fork 11k
[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses #22537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant refactoring of the Mixture of Experts (MoE) quantization configuration by introducing a new FusedMoEQuantConfig structure. This is a positive change towards a more structured and extensible configuration. However, the refactoring appears to be incomplete, as there are several critical issues, including assert False statements, NotImplementedErrors, and usage of undefined variables in the new code paths. These issues will cause runtime failures and need to be addressed before this PR can be considered for merging. My review focuses on these critical issues.
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py
Outdated
Show resolved
Hide resolved
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
This pull request has merge conflicts that must be resolved before it can be |
27a4513 to
688374b
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
a6b4b30 to
d5b12e8
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
328fc4c to
ad0e7ff
Compare
d1f132f to
417e037
Compare
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
5f60537 to
8d94b93
Compare
…odBase subclasses (vllm-project#22537) Signed-off-by: Bill Nell <bnell@redhat.com>
|
is this tested with mxfp4? @bnellnm The test result section is still TBD. |
…2907) ### What this PR does / why we need it? 1. This pr bump vllm commit to vllm-project/vllm@6d8246a 2. fix upstream changes vllm-project/vllm#24548 abort multi-modal kwargs, make vllm main and `v0.10.2` both adaptable 3. fix metadata_builder changes introduced by vllm-project/vllm#23693 4. fix `structured_outputs_config` changes introduced by vllm-project/vllm#22772 5. fix `moe_config` changes introduced by vllm-project/vllm#22537 Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@c60e613 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Co-authored-by: MengqingCao <cmq0113@163.com>
…llm-project#2907) ### What this PR does / why we need it? 1. This pr bump vllm commit to vllm-project/vllm@6d8246a 2. fix upstream changes vllm-project/vllm#24548 abort multi-modal kwargs, make vllm main and `v0.10.2` both adaptable 3. fix metadata_builder changes introduced by vllm-project/vllm#23693 4. fix `structured_outputs_config` changes introduced by vllm-project/vllm#22772 5. fix `moe_config` changes introduced by vllm-project/vllm#22537 Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@c60e613 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Co-authored-by: MengqingCao <cmq0113@163.com>
…llm-project#2907) ### What this PR does / why we need it? 1. This pr bump vllm commit to vllm-project/vllm@6d8246a 2. fix upstream changes vllm-project/vllm#24548 abort multi-modal kwargs, make vllm main and `v0.10.2` both adaptable 3. fix metadata_builder changes introduced by vllm-project/vllm#23693 4. fix `structured_outputs_config` changes introduced by vllm-project/vllm#22772 5. fix `moe_config` changes introduced by vllm-project/vllm#22537 Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@c60e613 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Co-authored-by: MengqingCao <cmq0113@163.com> Signed-off-by: Che Ruan <cr623@ic.ac.uk>
…llm-project#2907) ### What this PR does / why we need it? 1. This pr bump vllm commit to vllm-project/vllm@6d8246a 2. fix upstream changes vllm-project/vllm#24548 abort multi-modal kwargs, make vllm main and `v0.10.2` both adaptable 3. fix metadata_builder changes introduced by vllm-project/vllm#23693 4. fix `structured_outputs_config` changes introduced by vllm-project/vllm#22772 5. fix `moe_config` changes introduced by vllm-project/vllm#22537 Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@c60e613 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Co-authored-by: MengqingCao <cmq0113@163.com> Signed-off-by: Che Ruan <cr623@ic.ac.uk>
@minosfuture I think there was one issue with mxfp4 which has been fixed. I've not tested every possible combination but afaik everything should work. |
…odBase subclasses (vllm-project#22537) Signed-off-by: Bill Nell <bnell@redhat.com>
…odBase subclasses (vllm-project#22537) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: charlifu <charlifu@amd.com>
…odBase subclasses (vllm-project#22537) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…odBase subclasses (vllm-project#22537) Signed-off-by: Bill Nell <bnell@redhat.com>
…llm-project#2907) ### What this PR does / why we need it? 1. This pr bump vllm commit to vllm-project/vllm@6d8246a 2. fix upstream changes vllm-project/vllm#24548 abort multi-modal kwargs, make vllm main and `v0.10.2` both adaptable 3. fix metadata_builder changes introduced by vllm-project/vllm#23693 4. fix `structured_outputs_config` changes introduced by vllm-project/vllm#22772 5. fix `moe_config` changes introduced by vllm-project/vllm#22537 Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@c60e613 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Co-authored-by: MengqingCao <cmq0113@163.com>
…odBase subclasses (vllm-project#22537) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Purpose
FusedMoEQuantConfigobjects to the subclass ofFusedMoEMethodBasethat will use that info.FusedMoEQuantConfigand make it more uniform.fused_expertswith aFusedMoEQuantConfig. This eliminates the varioususe_bool flags and quantization parameters_scales,_zp,_bias,_gscale, etc.Test Plan
Test Result
(Optional) Documentation Update
cc @varun-sundar-rabindranath , @LucasWilkinson , @jeejeelee , @wenscarl , @nvpohanh , @mgoin