Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing refactor stage 1: remove reference tensor creation in all tensor indexing logic #1690

Merged
merged 15 commits into from
Jun 25, 2022

Conversation

shmsong
Copy link

@shmsong shmsong commented May 10, 2022

Original message :
This PR is an initial step towards IterdomainGraph based indexing and predicate optimizations.

In this PR, all index variables of realized loop structure, including parallel loops, are allocated before any actual for loops are created.

Current message:
This PR is now combination of 5 PRs:
#1750
#1737
#1735
#1734
and this one (#1690).

Each PR has been separately reviewed and merging them all as one to ensure easy reverting in case we need to.

All tensor indexing logic should not be creating reference tensors at this point after this PR is merged.

Copy link
Owner

@csarofeen csarofeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor Nits.

torch/csrc/jit/codegen/cuda/compute_at_map.h Show resolved Hide resolved
torch/csrc/jit/codegen/cuda/compute_at_map.h Outdated Show resolved Hide resolved
torch/csrc/jit/codegen/cuda/compute_at_map.h Show resolved Hide resolved
torch/csrc/jit/codegen/cuda/lower2device.cpp Show resolved Hide resolved
@shmsong shmsong changed the title WIP: Allocate loop index variable before lowering Allocate loop index variable before lowering Jun 24, 2022
shmsong and others added 4 commits June 24, 2022 16:20
Co-authored-by: Christian Sarofeen <csarofeen@nvidia.com>
Co-authored-by: Naoya Maruyama <naoyam@users.noreply.github.com>

Co-authored-by: Christian Sarofeen <csarofeen@nvidia.com>
Co-authored-by: Christian Sarofeen <csarofeen@nvidia.com>
)

Co-authored-by: Christian Sarofeen <csarofeen@nvidia.com>
@shmsong shmsong changed the title Allocate loop index variable before lowering Indexing refactor stage 1: remove reference tensor creation in all tensor indexing logic Jun 25, 2022
@shmsong
Copy link
Author

shmsong commented Jun 25, 2022

renaming the new files to lower_index_compute.{h,cpp}. In follow ups will try to unify the new file and index_compute.cpp into one ideally, or maybe lower_index_compute and lower_index_compute_impl. Depends on the complexity we end up having after the refactor.

@shmsong shmsong merged commit d67e1cd into devel Jun 25, 2022
@shmsong shmsong deleted the alloc_idx_val branch June 25, 2022 07:03
shmsong pushed a commit to shmsong/pytorch that referenced this pull request Jul 24, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Code changes includes:

- TransformPropagator refactor: switched to Dijkstra instead of exhaustive enumeration on all possible paths to reduce compilation time on transform propagation;
- Indexing refactor: remove reference tensor creation in all tensor indexing logic (csarofeen#1690)
- (more) generic grouped grid reduction kernel;
- Minor parser/fuser patches:
  1. zero-dim tensor reduction support
  3. no-op binary removal within fused graph
  4. expand supported in fusion

Squashed commits to WAR github API
Commits that's actually in this PR from the devel branch:

```
a054b3e Refactor TransormPropagator to allow specifying a position and propagating to part of the DAG (csarofeen#1775)
d67e1cd Indexing refactor stage 1: remove reference tensor creation in all tensor indexing logic (csarofeen#1690)
1b65299 Issue 1770 (csarofeen#1774)
35b0427 Avoid compilation errors like below: (csarofeen#1773)
452c773 Ignore reductions of zero-dim tensors per PyTorch conventions (csarofeen#1771)
31d6c56 TransformPropagator refactor (csarofeen#1769)
570c5a8 Merge pull request csarofeen#1767 from csarofeen/upstream_merge_0621
9d6c3d8 merging upstream 61305cd
0ed815f New TransformPropagator algorithm (csarofeen#1763)
6c19520 no-op binary removal (csarofeen#1764)
ec7fa41 Proper propagation of IterType (csarofeen#1762)
b263562 Fix dimensionality check (csarofeen#1759)
2d6343f More generic grouped grid reduction kernel (csarofeen#1740)
64e2b56 [nvfuser] prevent spamming warning message (pytorch#77777) (csarofeen#1758)
0c43162 [nvFuser] Improving bitwise ops support (pytorch#77158) (csarofeen#1757)
b93a147 Parser expand (csarofeen#1754)
```

RUN_TORCHBENCH: nvfuser
Pull Request resolved: pytorch#80355
Approved by: https://github.com/davidberard98
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants