Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing and improving indexing type handling #2522

Merged
merged 3 commits into from
Mar 8, 2023

Conversation

mmigdal-nv
Copy link
Collaborator

@mmigdal-nv mmigdal-nv commented Feb 26, 2023

Fixed issues:

  • Recompiling kernel if KernelArgumentHolder's indexing mode changes.
  • Taking into account the output tensors to update indexing mode.
  • Current indexing type is appended to kernelName() so we can use KernelDb with the key kernel_code_. Currently KernelDb ignores the wrapped code (#defines, runtime library, ...) and relies only on the kernel. Without changing the kernel name we would be getting back the wrong cubins.

Improvements:

  • Allowing to change tensor indexing mode in KernelArgumentHolder retroactively.
  • The -1 in collectIndexMode is misleading. In the case of a 1D tensor, having a type that holds the tensor's index is not enough - we need to be able to hold the bound itself (so we can compare index to the bound without overflows).

Changes:

  • cparams.index_type is not set to DataType::Index so the kernel can be lowered once and we update/set nvfuser_index_t after, as required.

@mmigdal-nv mmigdal-nv force-pushed the rebuild_index_change branch 2 times, most recently from 436a745 to 311563a Compare February 27, 2023 01:37
@mmigdal-nv mmigdal-nv marked this pull request as ready for review February 27, 2023 01:52
@mmigdal-nv mmigdal-nv changed the title Recompiling kernel when nvfuser_index_t changes Fixing and improving indexing type handling Feb 27, 2023
@mmigdal-nv
Copy link
Collaborator Author

In the case of matmuls, this happens to fix the cases where:

  • MNK = 65536, 65536, 128, as the output shape was never taken into account (and overflowed nvfuser_index_t)
  • Crash when problem launched with small input tensors, followed by large input tensors -(overflow in nvfuser_index_t as we don't recompile even if we compute the right size in that case.
  • Perf impact of running a large problem (that required 64b indexing), followed by smalls, as we will be running 64b kernels for small problems

Copy link
Owner

@csarofeen csarofeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally this makes sense to me, but I'm concerned about recompilation of the kernel. It doesn't seem good to retrigger non-cached recompilation. With thread recompilation it seemed okay to me since we would just retrigger for high water mark, but if we're going to enable an option to go back from int64 indexing to compile int32 indexing, we should cache both options somehow.

I'm not really sure what we want to do with the caching here. I wonder if it even makes sense to do this on the register side. CCing @naoyam and @jjsjann123 for opinions.

third_party/nvfuser/csrc/executor_kernel_arg.cpp Outdated Show resolved Hide resolved
third_party/nvfuser/csrc/executor.cpp Show resolved Hide resolved
third_party/nvfuser/csrc/executor.h Show resolved Hide resolved
third_party/nvfuser/csrc/executor_kernel_arg.cpp Outdated Show resolved Hide resolved
third_party/nvfuser/csrc/executor_kernel_arg.h Outdated Show resolved Hide resolved
third_party/nvfuser/csrc/executor_kernel_arg.h Outdated Show resolved Hide resolved
third_party/nvfuser/csrc/executor.cpp Show resolved Hide resolved
@naoyam
Copy link
Collaborator

naoyam commented Mar 1, 2023

As I mentioned to @mmigdal-nv, I think the fix of this PR is sufficient. As long as a fusion is executed through FusionExecutorCache, we should not see back-and-forth recompilations due to index mode changes. The only request I have for @mmigdal-nv is to add a simple C++ test that verifies this behavior. #2522 (comment)

@mmigdal-nv mmigdal-nv requested review from naoyam, Michoumichmich and csarofeen and removed request for Michoumichmich March 1, 2023 18:18
Copy link
Collaborator

@naoyam naoyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix and improving the PR.

@zasdfgbnm zasdfgbnm dismissed csarofeen’s stale review March 8, 2023 16:29

approved by naoya, and caching is not a problem

@mmigdal-nv mmigdal-nv merged commit 3b85308 into csarofeen:devel Mar 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants