-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[Refactor][Kernel] support loading kernel from other place #25823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Refactor][Kernel] support loading kernel from other place #25823
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors kernel loading to be platform-specific by introducing import_general_kernels and import_moe_kernels methods in the Platform interface. The changes in _custom_ops.py now delegate kernel imports to the current platform. The default implementation is provided in interface.py, and TpuPlatform and XPUPlatform provide overrides.
My review identifies a couple of areas for improvement. The error logging in _custom_ops.py should be generalized to avoid confusion. Additionally, for consistency and correctness, the TpuPlatform and XPUPlatform should also override the import_moe_kernels method, similar to how import_general_kernels is handled.
NickLucche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clean change thank you @ILikeIneine !
Left a minor comment agreeing with @ProExpertProg .
I am not too enthusiastic with the import_general_kernels function name, but I am also not great with names. What do you think of:
-import_core_kernels
-import_base_kernels
-import_common_kernels
Other than that this is LGTM.
|
@NickLucche @ProExpertProg Changes are updated! Still I'm using |
I agree in principle, I think the try-except guard we have right now it's just for cases where vllm is compiled without a device (eg you just run benchmarks). |
ProExpertProg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much cleaner, thanks! Name seems fine to me now
|
This pull request has merge conflicts that must be resolved before it can be |
Head branch was pushed to by a user without write access
|
Hi, seems the pr tests are not blocked by this pr, could you take a look and re-trigger the CI? or merge directly? @NickLucche @ProExpertProg |
Signed-off-by: Hank <hcc.mayday@gmail.com>
Signed-off-by: Hank <hcc.mayday@gmail.com>
dee9eb8 to
0358999
Compare
Signed-off-by: Hank <hcc.mayday@gmail.com>
| import vllm._moe_C # noqa: F401 | ||
| supports_moe_ops = True | ||
| current_platform.import_core_kernels() | ||
| supports_moe_ops = current_platform.try_import_moe_kernels() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this attribute is only used here:
Line 1539 in 17edd8a
| if supports_moe_ops and hasattr(torch.ops._moe_C, "marlin_gemm_moe"): |
I wonder if we can remove supports_moe_ops and just do:
if hasattr(torch.ops, "_moe_C") and hasattr(torch.ops._moe_C, "marlin_gemm_moe"):then having a simple import_kernels interface for the platform class would sounds better.
cc @NickLucche
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ProExpertProg do you know why we don't use if hasattr(torch.ops, "_moe_C") and hasattr(torch.ops._moe_C, "marlin_gemm_moe"): in the first place? or maybe @tlrmchlsmth @bnellnm have better ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using hasattr seems reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah ++ that seems simpler
Signed-off-by: Hank <hcc.mayday@gmail.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: Hank <hcc.mayday@gmail.com> Signed-off-by: Karan Goel <3261985+karan@users.noreply.github.com>
Signed-off-by: Hank <hcc.mayday@gmail.com>
Signed-off-by: Hank <hcc.mayday@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Hank <hcc.mayday@gmail.com>
Signed-off-by: Hank <hcc.mayday@gmail.com>
Signed-off-by: Hank <hcc.mayday@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Hank <hcc.mayday@gmail.com>
Purpose
add platform interface to support loading kernel
relate to: #25822
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.