-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
[xpu]support moe models on XPU platform #21643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for Falcon3-MoE models on the XPU platform by adding XPU-specific logic to the FusedMoE layer. The changes leverage intel-extension-for-pytorch for the MoE implementation on XPU.
I've identified a couple of issues that should be addressed to ensure correctness and maintainability:
- A configuration parameter is hardcoded, which should be made configurable for consistency and performance tuning.
- The new
forward_xpumethod has an incomplete implementation that silently ignores crucial parameters. This is a critical issue that could lead to incorrect model behavior for other MoE models.
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
can you rename the PR title? I think this change is not for falcon only. xpu should not support any MoE model before? |
updated. W/o this change, no MoE models can be supported but for models like deepseek, we need more kernels that are available in ipex 2.8. |
|
@yma11 , with this PR, we will continue the optimizations on |
0597ffc to
753b8cd
Compare
|
Can you merge from main to fix the CI failures? |
Signed-off-by: yan <yan.ma@intel.com>
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: yan <yan.ma@intel.com> Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: yan <yan.ma@intel.com> Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Signed-off-by: yan <yan.ma@intel.com> Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Noam Gat <noamgat@gmail.com>
Signed-off-by: yan <yan.ma@intel.com> Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>
Signed-off-by: yan <yan.ma@intel.com> Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>
Signed-off-by: yan <yan.ma@intel.com> Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: yan <yan.ma@intel.com> Signed-off-by: Yan Ma <yan.ma@intel.com>
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
Support models like ehristoforu/Falcon3-MoE-2x7B-Insruct, mistralai/Mixtral-8x7B-Instruct-v0.1 on XPU platform. For complicated models which require more moe releated kernels, will upstream support later as dependencies not published.
Test Plan
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 VLLM_WORKER_MULTIPROC_METHOD=spawn python3 examples/offline_inference/basic/generate.py --model ehristoforu/Falcon3-MoE-2x7B-Insruct --enforce-eager --dtype=float16 --trust_remote_code
Test Result
(Optional) Documentation Update