-
Notifications
You must be signed in to change notification settings - Fork 489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Optimization] Add custom NCHW to NHWC kernel for implicit GEMM #2530
[Optimization] Add custom NCHW to NHWC kernel for implicit GEMM #2530
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2530 +/- ##
==========================================
- Coverage 82.59% 82.53% -0.07%
==========================================
Files 827 828 +1
Lines 106712 106897 +185
==========================================
+ Hits 88143 88231 +88
- Misses 18569 18666 +97 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
let source_template = self.kernel_source.source(); | ||
let source = source_template.complete(); | ||
|
||
CompiledKernel { | ||
name: Some(core::any::type_name::<K>()), | ||
entrypoint_name: "kernel".to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you change entry point name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an update to cubecl, name
is now debug_name
and there's a new entrypoint_name
. However, it should be "main" and not "kernel", I'm fixing that in the PR I'm currently opening.
Pull Request Template
Checklist
run-checks all
script has been executed.Changes
Adds a custom NCHW to NHWC transpose kernel for use in
implicit_gemm
. This is faster than normalinto_contiguous
by specializing on this specific transposition.Testing
All tests compatible with
implicit_gemm
pass with the new kernel, added a new test to ensure the kernel output is the same asinto_contiguous
.