Clean up index type handling #2570

naoyam · 2023-03-10T03:58:13Z

This is a cleanup PR of the code around the kernel index mode/type.

FusionExecutor::compileFusion respects the optional index type parameter if given. Previously, it wasn't used at all. The index type is now used as long as it doesn't conflict with the required index type for the given kernel inputs. More specifically, if the index type is int32, but the kernel inputs need int64, it throws an error. On the other hand, int64 index type is allowed even when the kernel inputs are small enough to use int32.
FusionExecutor::runFusion also has an option, CompileParams, which includes an index type. It's just unused before. It's now an error if a different index type is passed since the index type of a Kernel is immutable.
The index type of Kernel must not be DataType::Index and must be either Int or Int32. Currently we assume the index type is resolved at the time when a Fusion is lowered to a Kernel and then it's immutable. We could relax this restriction, but that's not part of this PR.
Previously, both a SchedulerEntry and its HeuristicParams have index type info. It seems the one of HeuristicParams wasn't used. Removed the one from SchedulerEntry, and made sure the one of HeuristicParams is used.

I'm sure we could do more, but I'll stop here for now.

Note that this does not address the issues @mmigdal-nv attempted to fix (#2522). Notably, these remain:

We only look at kernel inputs to determine index type. Intermediate and output tensors are not considered, which can result in underestimate.
Once a Fusion is lowered to a Kernel, its index type cannot be changed. I believe we could relax this constraint, but it's unclear how important it would be. We don't want to compile back and forth between int32 and int64, so we would need to keep two compiled kernel images of a single Kernel. This would certainly reduce the overhead of lowering a Fusion to a Kernel as it would need to be done just once for both int32 and int64, but the nvrtc compilation still needs to be done twice. And it only matters when the (rest of) scheduler heuristics are the same for problem sizes that range from small enough to use int32 and to large enough to use int64.

I think issue 1 is important and should be fixed, but the second one doesn't seem that urgent.

naoyam · 2023-03-10T19:10:05Z

All tests are green.

I was concerned if there would be any change with the benchmarks, but I confirmed all of the generated CUDA kernels (i.e., __tmp_kernel*.cu) are exactly the same as before, so I'm pretty confident nothing changes.

zasdfgbnm

Thank you for the cleanup! Left some minor comments.

third_party/nvfuser/csrc/executor_params.h

third_party/nvfuser/test/test_gpu3.cpp

zasdfgbnm · 2023-03-10T19:30:08Z

third_party/nvfuser/test/test_gpu3.cpp

+  at::Tensor t0 = at::randn({999}, options);
+  std::vector<c10::IValue> small_inputs = {t0, t0};
+
+  at::Tensor t0_large = at::randn({std::numeric_limits<int>::max()}, options);


Would it be safer to use (int64_t)std::numeric_limits<int>::max() + 1? In theory std::numeric_limits<int>::max() is still indexable with 32bit indexing.

This assumes how we determine the index mode, i.e., we only allow up to half of the int32 max. If this assumption breaks, the TORCH_ERROR about large_inputs should break, so this should be fine.

third_party/nvfuser/csrc/executor.cpp

zasdfgbnm · 2023-03-10T19:48:05Z

third_party/nvfuser/csrc/type.h

@@ -135,7 +135,8 @@ bool PointerOf::operator==(const PointerOf& other) const {

 enum class KernelIndexMode { INT32, INT64 };


Not blocking this PR because this PR is already making the logic much more clear, but I think we should consider completely remote KernelIndexMode and just use PrimDataType wherever KernelIndexMode was used.

Generally agreed, and I did consider it as well, but technically speaking, KernelIndexModel would allow something more than just always int32 or int64, although I don't have any specific idea.

naoyam added 8 commits March 9, 2023 15:50

Clean up compile-time and run-time index options

3d04fdd

More cleanup

1666a9e

cleanup

fae4e59

fix

6a175f7

format

ea8d818

fix

76fe752

format

bff328f

benchmark cleanup

0977e83

naoyam marked this pull request as ready for review March 10, 2023 19:08

naoyam requested a review from zasdfgbnm March 10, 2023 19:10

naoyam changed the title ~~[WIP] Clean up index type handling~~ Clean up index type handling Mar 10, 2023

zasdfgbnm approved these changes Mar 10, 2023

View reviewed changes

naoyam added 2 commits March 10, 2023 12:16

PR feedback

578a7b6

compileRtc and runRtc require index type

933d5aa

naoyam merged commit 3c4b3da into devel Mar 10, 2023

zasdfgbnm deleted the index_type_validation branch March 10, 2023 21:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up index type handling #2570

Clean up index type handling #2570

naoyam commented Mar 10, 2023 •

edited

Loading

naoyam commented Mar 10, 2023

zasdfgbnm left a comment

zasdfgbnm Mar 10, 2023

naoyam Mar 10, 2023

zasdfgbnm Mar 10, 2023

naoyam Mar 10, 2023

		@@ -135,7 +135,8 @@ bool PointerOf::operator==(const PointerOf& other) const {

		enum class KernelIndexMode { INT32, INT64 };

Clean up index type handling #2570

Clean up index type handling #2570

Conversation

naoyam commented Mar 10, 2023 • edited Loading

naoyam commented Mar 10, 2023

zasdfgbnm left a comment

Choose a reason for hiding this comment

zasdfgbnm Mar 10, 2023

Choose a reason for hiding this comment

naoyam Mar 10, 2023

Choose a reason for hiding this comment

zasdfgbnm Mar 10, 2023

Choose a reason for hiding this comment

naoyam Mar 10, 2023

Choose a reason for hiding this comment

naoyam commented Mar 10, 2023 •

edited

Loading