-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[Quantization] Enable BNB support for more MoE models #21370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the weight loading mechanism for dots1 and glm4_moe models to enable BNB quantization support. The changes primarily involve:
dots1model: Migrating toAutoWeightsLoaderfor weight loading, which simplifies the code and aligns it with other models. This also includes addingSupportsLoRAand the necessarypacked_modules_mapping.- Both
dots1andglm4_moemodels: Refactoring the expert parameter mapping logic into aget_expert_mappingmethod within the respectiveModelclasses. This improves code clarity and reusability.
The changes are well-structured and consistent across both models. I did not find any issues of high or critical severity in the provided diffs. The refactoring appears to be a solid step towards better maintainability and feature support.
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Isotr0py
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…1370) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
…1370) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: x22x22 <wadeking@qq.com>
…1370) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
…1370) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
…1370) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
…1370) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>
…1370) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>
…1370) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
Following #21100
Test Plan
Test Result
I have tested the above script, and the generated results look reasonable
(Optional) Documentation Update