-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cutlass
integration + segment_matmul
implementation
#51
Conversation
Codecov Report
@@ Coverage Diff @@
## master #51 +/- ##
==========================================
- Coverage 94.42% 90.68% -3.74%
==========================================
Files 12 13 +1
Lines 233 247 +14
==========================================
+ Hits 220 224 +4
- Misses 13 23 +10
Continue to review full report at Codecov.
|
cutlass
integration + segment_matmul
implementationcutlass
integration + segment_matmul
implementation
@pyg-team/nvidia-team This PR is now ready to review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really a great example showing cutlass integration. Nice job @rusty1s ! Do you have an example where the pyg_lib.segment.grouped_matmul
is actually getting called in a training script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
pyg_lib/csrc/segment/matmul.cpp
Outdated
@@ -0,0 +1,41 @@ | |||
#include "matmul.h" | |||
|
|||
#include <ATen/core/dispatch/Dispatcher.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
Thanks @teju85. I will work on backward implementation and PyG integration next. Can share an example by then! |
Haicheng from nvidia cutlass. LGTM. Thank you. BTW, we are improving group gemm now. |
@hwu36 Thanks! Please ping me if you make any improvements :) |
@jackkosaian just fixed occupancy calculation in NVIDIA/cutlass#532 . This number is used to calculate the number of threadblocks to launch group gemm. I know you hard coded this number now so you are not affected. @jackkosaian is going to further improve group gemm in the summer. |
No description provided.