Indexing refactor stage 1: remove reference tensor creation in all tensor indexing logic #1690

shmsong · 2022-05-10T17:32:14Z

Original message :
This PR is an initial step towards IterdomainGraph based indexing and predicate optimizations.

In this PR, all index variables of realized loop structure, including parallel loops, are allocated before any actual for loops are created.

Current message:
This PR is now combination of 5 PRs:
#1750
#1737
#1735
#1734
and this one (#1690).

Each PR has been separately reviewed and merging them all as one to ensure easy reverting in case we need to.

All tensor indexing logic should not be creating reference tensors at this point after this PR is merged.

csarofeen

LGTM, minor Nits.

torch/csrc/jit/codegen/cuda/compute_at_map.h

torch/csrc/jit/codegen/cuda/lower2device.cpp

torch/csrc/jit/codegen/cuda/compute_at_map.cpp

torch/csrc/jit/codegen/cuda/lower_double_buffer.cpp

torch/csrc/jit/codegen/cuda/test/test_gpu.cpp

Co-authored-by: Christian Sarofeen <csarofeen@nvidia.com>

Co-authored-by: Naoya Maruyama <naoyam@users.noreply.github.com> Co-authored-by: Christian Sarofeen <csarofeen@nvidia.com>

Co-authored-by: Christian Sarofeen <csarofeen@nvidia.com>

) Co-authored-by: Christian Sarofeen <csarofeen@nvidia.com>

shmsong · 2022-06-25T04:35:54Z

renaming the new files to lower_index_compute.{h,cpp}. In follow ups will try to unify the new file and index_compute.cpp into one ideally, or maybe lower_index_compute and lower_index_compute_impl. Depends on the complexity we end up having after the refactor.

Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Code changes includes: - TransformPropagator refactor: switched to Dijkstra instead of exhaustive enumeration on all possible paths to reduce compilation time on transform propagation; - Indexing refactor: remove reference tensor creation in all tensor indexing logic (csarofeen#1690) - (more) generic grouped grid reduction kernel; - Minor parser/fuser patches: 1. zero-dim tensor reduction support 3. no-op binary removal within fused graph 4. expand supported in fusion Squashed commits to WAR github API Commits that's actually in this PR from the devel branch: ``` a054b3e Refactor TransormPropagator to allow specifying a position and propagating to part of the DAG (csarofeen#1775) d67e1cd Indexing refactor stage 1: remove reference tensor creation in all tensor indexing logic (csarofeen#1690) 1b65299 Issue 1770 (csarofeen#1774) 35b0427 Avoid compilation errors like below: (csarofeen#1773) 452c773 Ignore reductions of zero-dim tensors per PyTorch conventions (csarofeen#1771) 31d6c56 TransformPropagator refactor (csarofeen#1769) 570c5a8 Merge pull request csarofeen#1767 from csarofeen/upstream_merge_0621 9d6c3d8 merging upstream 61305cd 0ed815f New TransformPropagator algorithm (csarofeen#1763) 6c19520 no-op binary removal (csarofeen#1764) ec7fa41 Proper propagation of IterType (csarofeen#1762) b263562 Fix dimensionality check (csarofeen#1759) 2d6343f More generic grouped grid reduction kernel (csarofeen#1740) 64e2b56 [nvfuser] prevent spamming warning message (pytorch#77777) (csarofeen#1758) 0c43162 [nvFuser] Improving bitwise ops support (pytorch#77158) (csarofeen#1757) b93a147 Parser expand (csarofeen#1754) ``` RUN_TORCHBENCH: nvfuser Pull Request resolved: pytorch#80355 Approved by: https://github.com/davidberard98

shmsong added 7 commits May 9, 2022 13:35

pre-allocate loop index variables

0666bb9

Merge remote-tracking branch 'origin/devel' into alloc_idx_val

ff76b8f

Merge remote-tracking branch 'origin/devel' into alloc_idx_val

08a0672

add allocated index in debug print

374082f

assertion on loop map entry

0d4da1d

comment

9be2874

use original index in double buffer cloned loop

627af9d

csarofeen approved these changes Jun 11, 2022

View reviewed changes

naoyam reviewed Jun 14, 2022

View reviewed changes

torch/csrc/jit/codegen/cuda/compute_at_map.cpp Outdated Show resolved Hide resolved

torch/csrc/jit/codegen/cuda/lower_double_buffer.cpp Outdated Show resolved Hide resolved

torch/csrc/jit/codegen/cuda/test/test_gpu.cpp Show resolved Hide resolved

shmsong added 3 commits June 23, 2022 15:21

Merge remote-tracking branch 'origin/devel' into alloc_idx_val

95b67eb

allocate different index variable for different double buffer stages

0b666d6

comments

b18ca04

shmsong changed the title ~~WIP: Allocate loop index variable before lowering~~ Allocate loop index variable before lowering Jun 24, 2022

shmsong and others added 4 commits June 24, 2022 16:20

Use IterDomain Graph to generate gmem consumer indexing (#1734)

456bdcb

Co-authored-by: Christian Sarofeen <csarofeen@nvidia.com>

Clean up reference domain dependency in gmem consumer indexing (#1735)

494294f

Co-authored-by: Naoya Maruyama <naoyam@users.noreply.github.com> Co-authored-by: Christian Sarofeen <csarofeen@nvidia.com>

Use iterdomain graph to index non-global consumers (#1737)

e89071a

Co-authored-by: Christian Sarofeen <csarofeen@nvidia.com>

Remove reference tensor creation in producer tensor indexing path (#1750

0de6736

) Co-authored-by: Christian Sarofeen <csarofeen@nvidia.com>

shmsong changed the title ~~Allocate loop index variable before lowering~~ Indexing refactor stage 1: remove reference tensor creation in all tensor indexing logic Jun 25, 2022

rename the new files

7205f3a

shmsong merged commit d67e1cd into devel Jun 25, 2022

shmsong deleted the alloc_idx_val branch June 25, 2022 07:03

shmsong mentioned this pull request Jun 29, 2022

Indexing refactor stage 2 : Remove reference tensor in predicate indexing logic #1784

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexing refactor stage 1: remove reference tensor creation in all tensor indexing logic #1690

Indexing refactor stage 1: remove reference tensor creation in all tensor indexing logic #1690

shmsong commented May 10, 2022 •

edited

Loading

csarofeen left a comment

shmsong commented Jun 25, 2022

Indexing refactor stage 1: remove reference tensor creation in all tensor indexing logic #1690

Indexing refactor stage 1: remove reference tensor creation in all tensor indexing logic #1690

Conversation

shmsong commented May 10, 2022 • edited Loading

csarofeen left a comment

Choose a reason for hiding this comment

shmsong commented Jun 25, 2022

shmsong commented May 10, 2022 •

edited

Loading