New TransformPropagator algorithm #1763

zasdfgbnm · 2022-06-16T18:27:48Z

Fixes #1760, but far beyond that.

Per offline discussion with @csarofeen and @naoyam, I completely rewrite the TransformPropagator. In this new TransformPropagator, I explicitly keep track of the information about which root ID in the starting tensor is preserved. The RootIDInfo stores the information for each root ID. view is not treated differently from other ops. During propagation, I do Dijkstra to find the path for each tensor in the graph that preserves the most amount of information. Each tensor will only be replayed once.

This reverts commit 9fff442.

zasdfgbnm · 2022-06-16T20:57:00Z

Marking as ready for review. Tests are all green. 😎 Feel free to play around, and I will do some cleanup and add more comments.

zasdfgbnm · 2022-06-17T00:25:31Z

I am done adding docs.

naoyam

Just some superficial suggestions for now

torch/csrc/jit/codegen/cuda/transform_replay.h

torch/csrc/jit/codegen/cuda/transform_replay.cpp

torch/csrc/jit/codegen/cuda/transform_replay.h

torch/csrc/jit/codegen/cuda/transform_replay.cpp

naoyam · 2022-06-17T01:39:03Z

torch/csrc/jit/codegen/cuda/transform_replay.cpp

+    // nullptr used to start from starting_tv
+    return next_hop.to->nDims();
+  }
+  // TODO: why does TransformReplay require specifying a position in the


I think that's because TransformReplay was designed for computeAt, which takes a position in the leaf domain.

TransformPropagator does not take such a position parameter, but that may be something we would want eventually?

torch/csrc/jit/codegen/cuda/transform_replay.cpp

torch/csrc/jit/codegen/cuda/test/test_gpu.cpp

naoyam · 2022-06-17T16:43:58Z

torch/csrc/jit/codegen/cuda/ir_internal_nodes.h

@@ -4,7 +4,6 @@

 #include <torch/csrc/jit/codegen/cuda/fusion.h>
 #include <torch/csrc/jit/codegen/cuda/ir_base_nodes.h>
-#include <torch/csrc/jit/codegen/cuda/ir_interface_nodes.h>


Just curious, was this causing any problem?

No, it worked fine, though I don't know why. It looks weird to me that two headers include each other.

torch/csrc/jit/codegen/cuda/transform_replay.cpp

naoyam · 2022-06-17T16:48:19Z

torch/csrc/jit/codegen/cuda/transform_replay.cpp

+// I think I need to modify TransformReplay to add a new interface to specify
+// the root domains, instead of a position in the leaf domain. With the new
+// interface, this function will not be needed.


This is due to a mismatch between TransformReplay and TransformPropagator. The former replays the first N leaf IDs, whereas the latter replays everything from the starting reference tensor. I suspect we would want the behavior of TransformReplay, but not sure.

naoyam · 2022-06-17T16:48:53Z

torch/csrc/jit/codegen/cuda/transform_replay.cpp

+  // Find the pos where all leaf IDs at <= pos contains
+  // information about the starting root domain
+  //
+  // TODO: should I change to the following behavior?


Can you be more specific?

I added a few more lines in the comment below

Can you create a test exhibiting this behavior?

Looks like this will never happen? I added a new test at 5cc10c8, but looks like this is not triggered. I think this is because, if from is the reference tensor, then we always have full information. If not, then from must come from a replay, then the axes containing reference tensor information will always be put to the front. So this case will never happen?

I will merge this PR for now, and will write a followup PR for TransformReplay starting from specified root domain. Once that is done, then this issue will not be relevant. Feel free to leave more comment, and I will resolve in my followup PR.

I'm not sure we can rewrite TransformReplay to use root domain positions rather than leaf domain positions. In other words, computeAt currently is specified with leaf positions. Can we change that to use root positions?

For the meantime, we may want to assert that no remaining domains are included in relevant_leaves.

I'm not sure we can rewrite TransformReplay to use root domain positions rather than leaf domain positions. In other words, computeAt currently is specified with leaf positions. Can we change that to use root positions?

I think the interface and outcome of computeAt should still use the leaf positions. But during propagation, since we are saving information about root/rfactor domains, I think it makes sense to change the interface (or at least add an additional interface) to specify root/rfactor domains? I am not sure how doable this is. Need to dig into the code to see.

For the meantime, we may want to assert that no remaining domains are included in relevant_leaves.

I think it makes sense to add an assert. If we decide that we should not change TransformReplay to specify the root domain, then I will add assert in a separate PR.

We should talk more about this, root domains is a good mechanism to understand "how replayed" one replay is versus another. I'm skeptical propagating based on root domains is a good idea.

torch/csrc/jit/codegen/cuda/transform_replay.cpp

Co-authored-by: Naoya Maruyama <naoyam@users.noreply.github.com>

naoyam

LGTM!

csarofeen · 2022-06-21T12:47:27Z

Is this in replacement of #1743 ? Can we close #1743?

zasdfgbnm · 2022-06-21T17:00:03Z

This does not contain the functionality that #1743 provides, but this does make the code in #1743 obsolete. And the discussions in #1743 is still valuable and I have not looked at them yet. I would prefer to keep #1743 open, and will rebase on this after merged.

csarofeen · 2022-06-22T12:20:30Z

torch/csrc/jit/codegen/cuda/transform_replay.cpp

+      continue;
+    }
+    for (auto root_id : root_ids) {
+      if (id == root_id || DependencyCheck::isDependencyOf(root_id, id)) {


So this is giving full credit to a domain in rfactor that could potentially only have a partial domain from the root. I wonder how safe this is through complex uses of view. Are there instances where we would have to accurately track "partial" ownership of an rfactor domain with view?

That's an interesting question. I don't know if there's an actually adverse case. It seems to me also about how propagation should be done, whether transformation should be propagated through partial ownership? Not sure which should be preferred.

csarofeen · 2022-06-22T12:20:47Z

torch/csrc/jit/codegen/cuda/transform_replay.cpp

+      continue;
+    }
+    for (auto rfactor_id : rfactor_ids) {
+      if (DependencyCheck::isDependencyOf(id, rfactor_id)) {


Same as above.

csarofeen · 2022-06-22T12:41:01Z

Really cool algorithm, only comments I really have:
I don't know why we'd want to propagate based on the root instead of leaf position.
I'm still struggling to understand what can happen in complex view patterns.

I think it make sense to update the compute at PR with the new algorithm, fix up our current scheduling, then come back and revisit for view, as we want view support for after the next PyT release.

naoyam · 2022-06-22T16:55:27Z

Agree with Christian.

Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Code changes includes: - TransformPropagator refactor: switched to Dijkstra instead of exhaustive enumeration on all possible paths to reduce compilation time on transform propagation; - Indexing refactor: remove reference tensor creation in all tensor indexing logic (csarofeen#1690) - (more) generic grouped grid reduction kernel; - Minor parser/fuser patches: 1. zero-dim tensor reduction support 3. no-op binary removal within fused graph 4. expand supported in fusion Squashed commits to WAR github API Commits that's actually in this PR from the devel branch: ``` a054b3e Refactor TransormPropagator to allow specifying a position and propagating to part of the DAG (csarofeen#1775) d67e1cd Indexing refactor stage 1: remove reference tensor creation in all tensor indexing logic (csarofeen#1690) 1b65299 Issue 1770 (csarofeen#1774) 35b0427 Avoid compilation errors like below: (csarofeen#1773) 452c773 Ignore reductions of zero-dim tensors per PyTorch conventions (csarofeen#1771) 31d6c56 TransformPropagator refactor (csarofeen#1769) 570c5a8 Merge pull request csarofeen#1767 from csarofeen/upstream_merge_0621 9d6c3d8 merging upstream 61305cd 0ed815f New TransformPropagator algorithm (csarofeen#1763) 6c19520 no-op binary removal (csarofeen#1764) ec7fa41 Proper propagation of IterType (csarofeen#1762) b263562 Fix dimensionality check (csarofeen#1759) 2d6343f More generic grouped grid reduction kernel (csarofeen#1740) 64e2b56 [nvfuser] prevent spamming warning message (pytorch#77777) (csarofeen#1758) 0c43162 [nvFuser] Improving bitwise ops support (pytorch#77158) (csarofeen#1757) b93a147 Parser expand (csarofeen#1754) ``` RUN_TORCHBENCH: nvfuser Pull Request resolved: pytorch#80355 Approved by: https://github.com/davidberard98

zasdfgbnm added 10 commits June 14, 2022 11:45

Add test

d41181d

cleanup

b69701b

save

9fff442

Revert "save"

d064f04

This reverts commit 9fff442.

new-algo-draft

987778d

fix

4bd6c95

save

db21de5

fix

d90fad9

save

e82031a

cleanup

ec4d021

zasdfgbnm marked this pull request as ready for review June 16, 2022 20:57

zasdfgbnm changed the title ~~WIP: New TransformPropagator algorithm~~ New TransformPropagator algorithm Jun 16, 2022

zasdfgbnm requested review from csarofeen and naoyam June 16, 2022 20:57

zasdfgbnm added 5 commits June 16, 2022 14:43

docs

da91c73

doc

3cbad71

more doc

a6467d5

more doc

4e4d0bb

save

b238f5d

naoyam reviewed Jun 17, 2022

View reviewed changes

torch/csrc/jit/codegen/cuda/transform_replay.cpp Outdated Show resolved Hide resolved

naoyam reviewed Jun 17, 2022

View reviewed changes

zasdfgbnm added 5 commits June 16, 2022 20:59

resolve review

8eac976

review comments

27c2f5b

more renaming

3ad3391

move classes to cpp

a34bf97

vector -> list

49a2d80

naoyam reviewed Jun 17, 2022

View reviewed changes

torch/csrc/jit/codegen/cuda/transform_replay.cpp Outdated Show resolved Hide resolved

naoyam reviewed Jun 17, 2022

View reviewed changes

torch/csrc/jit/codegen/cuda/test/test_gpu.cpp Outdated Show resolved Hide resolved

naoyam reviewed Jun 17, 2022

View reviewed changes

torch/csrc/jit/codegen/cuda/transform_replay.cpp Outdated Show resolved Hide resolved

naoyam reviewed Jun 17, 2022

View reviewed changes

zasdfgbnm and others added 2 commits June 17, 2022 12:28

resolve review

48bfac1

Apply suggestions from code review

083cfea

Co-authored-by: Naoya Maruyama <naoyam@users.noreply.github.com>

naoyam approved these changes Jun 17, 2022

View reviewed changes

more comment

26e842f

zasdfgbnm added 2 commits June 21, 2022 10:40

add test for a TODO

5cc10c8

remove debugging print

2dcf2b1

zasdfgbnm merged commit 0ed815f into devel Jun 21, 2022

zasdfgbnm deleted the 1760 branch June 21, 2022 19:30

csarofeen reviewed Jun 22, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New TransformPropagator algorithm #1763

New TransformPropagator algorithm #1763

zasdfgbnm commented Jun 16, 2022 •

edited

Loading

zasdfgbnm commented Jun 16, 2022

zasdfgbnm commented Jun 17, 2022

naoyam left a comment

naoyam Jun 17, 2022

naoyam Jun 17, 2022

zasdfgbnm Jun 17, 2022

naoyam Jun 17, 2022

naoyam Jun 17, 2022

zasdfgbnm Jun 17, 2022

naoyam Jun 18, 2022

zasdfgbnm Jun 21, 2022

naoyam Jun 21, 2022

naoyam Jun 21, 2022

zasdfgbnm Jun 21, 2022

csarofeen Jun 22, 2022

naoyam left a comment

csarofeen commented Jun 21, 2022

zasdfgbnm commented Jun 21, 2022

csarofeen Jun 22, 2022

naoyam Jun 22, 2022

csarofeen Jun 22, 2022

csarofeen commented Jun 22, 2022

naoyam commented Jun 22, 2022

New TransformPropagator algorithm #1763

New TransformPropagator algorithm #1763

Conversation

zasdfgbnm commented Jun 16, 2022 • edited Loading

zasdfgbnm commented Jun 16, 2022

zasdfgbnm commented Jun 17, 2022

naoyam left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

naoyam left a comment

Choose a reason for hiding this comment

csarofeen commented Jun 21, 2022

zasdfgbnm commented Jun 21, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csarofeen commented Jun 22, 2022

naoyam commented Jun 22, 2022

zasdfgbnm commented Jun 16, 2022 •

edited

Loading