-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing and improving indexing type handling #2522
Conversation
436a745
to
311563a
Compare
311563a
to
5a7cfb4
Compare
In the case of matmuls, this happens to fix the cases where:
|
f47a0cc
to
87c71a4
Compare
87c71a4
to
c4796f8
Compare
c4796f8
to
b09834c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally this makes sense to me, but I'm concerned about recompilation of the kernel. It doesn't seem good to retrigger non-cached recompilation. With thread recompilation it seemed okay to me since we would just retrigger for high water mark, but if we're going to enable an option to go back from int64 indexing to compile int32 indexing, we should cache both options somehow.
I'm not really sure what we want to do with the caching here. I wonder if it even makes sense to do this on the register side. CCing @naoyam and @jjsjann123 for opinions.
As I mentioned to @mmigdal-nv, I think the fix of this PR is sufficient. As long as a fusion is executed through FusionExecutorCache, we should not see back-and-forth recompilations due to index mode changes. The only request I have for @mmigdal-nv is to add a simple C++ test that verifies this behavior. #2522 (comment) |
7438f68
to
8fd6b0c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the fix and improving the PR.
abcd5d0
to
b7124e5
Compare
approved by naoya, and caching is not a problem
Fixed issues:
KernelArgumentHolder
's indexing mode changes.kernelName()
so we can useKernelDb
with the keykernel_code_
. CurrentlyKernelDb
ignores the wrapped code (#defines, runtime library, ...) and relies only on the kernel. Without changing the kernel name we would be getting back the wrong cubins.Improvements:
KernelArgumentHolder
retroactively.-1
incollectIndexMode
is misleading. In the case of a 1D tensor, having a type that holds the tensor's index is not enough - we need to be able to hold the bound itself (so we can compare index to the bound without overflows).Changes:
cparams.index_type
is not set toDataType::Index
so the kernel can be lowered once and we update/setnvfuser_index_t
after, as required.