AOTDispatch: allow subclasses to correct when we guess metadata of tangents incorrectly #118670

bdhirsh · 2024-01-30T20:08:18Z

This PR is enough to fix #118600.

More description of the problem is in the issue, but the high-level problem is similar to the "tangents might be non-contiguous" problem that we handle today, via forcing all tangents to be contiguous. There, the problem was something like:

"We guessed the tangent strides incorrectly, because strides on the runtime tangents were different from strides on the forward outputs, which we used to generate tangents"

Here, the problem is similar:

"We guessed the tangent tensor subclass's metadata incorrectly, because the runtime tangent was a subclass with different metadata than the forward output subclass".

This happened in an internal DTensor issue, where the metadata in question was the placements (shard vs. replicate vs. Partial).

One option is to solve this problem via backward guards. This is needed to unblock internal though, so I figured handling this similarly to how we handle non-contiguous tangents would be reasonable. I did this by:

(1) Assert that the metadata on subclass tangents is the same as what we guessed, and if not raise a loud error

(2) In the error message, provide the name of an optional method that the subclass must implement to handle this case:

def __force_same_metadata__(self, metadata_tensor):: If the forward output had a Replicate() placement, but the runtime tangent had a Shard(1) placement, this method allows a subclass to take the tangent and "convert" it to one with a Replicate() placement.

__force_standard_metadata__(self): One issue is that there is another placement called _Partial, and its semantics are such that DTensor is unable to convert a DTensor with some placement type into another DTensor with a _Partial placement.

__force_standard_metadata__ is now called on all (fake) subclass forward outs at trace-time to generate tangents, and gives subclasses a chance to "fix" any outputs with metadata that they cannot convert to later. Morally, this is similar to the fact that we force a contiguous() call on all tangents at trace-time.

I'm interested in thoughts/feedback! Two new dunder methods on traceable subclasses is definitely a contentious change.

Stack from ghstack (oldest at bottom):

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang

…ngents incorrectly [ghstack-poisoned]

pytorch-bot · 2024-01-30T20:08:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118670

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6d43803 with merge base 9347a79 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…adata of tangents incorrectly" This PR is enough to fix #118600. More description of the problem is in the issue, but the high-level problem is similar to the "tangents might be non-contiguous" problem that we handle today, via forcing all tangents to be contiguous. There, the problem was something like: "We guessed the tangent strides incorrectly, because strides on the runtime tangents were different from strides on the forward outputs, which we used to generate tangents" Here, the problem is similar: "We guessed the tangent tensor subclass's metadata incorrectly, because the runtime tangent was a subclass with different metadata than the forward output subclass". This happened in an internal DTensor issue, where the metadata in question was the `placements` (shard vs. replicate vs. Partial). One option is to solve this problem via backward guards. This is needed to unblock internal though, so I figured handling this similarly to how we handle non-contiguous tangents would be reasonable. I did this by: (1) Assert that the metadata on subclass tangents is the same as what we guessed, and if not raise a loud error (2) In the error message, provide the name of an optional method that the subclass must implement to handle this case: `def __force_same_metadata__(self, metadata_tensor):`: If the forward output had a `Replicate()` placement, but the runtime tangent had a `Shard(1)` placement, this method allows a subclass to take the tangent and "convert" it to one with a `Replicate()` placement. `__force_standard_metadata__(self)`: One issue is that there is another placement called `_Partial`, and its semantics are such that DTensor is **unable** to convert a DTensor with some placement type into another DTensor with a `_Partial` placement. `__force_standard_metadata__` is now called on all (fake) subclass forward outs at trace-time to generate tangents, and gives subclasses a chance to "fix" any outputs with metadata that they cannot convert to later. Morally, this is similar to the fact that we force a `contiguous()` call on all tangents at trace-time. I'm interested in thoughts/feedback! Two new dunder methods on traceable subclasses is definitely a contentious change. cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 [ghstack-poisoned]

bdhirsh · 2024-02-21T23:44:30Z

torch/_functorch/_aot_autograd/collect_metadata_analysis.py

+    if not isinstance(x, Tensor):
+        return x
+    out = x.detach().contiguous()
+    # Note [Tangents must be contiguous, Part 2]


cc @ezyang / @zou3519, I have CI mostly passing on my stack now so this PR is ready for another look. (I couldn't quite remember the final names we came up with from API bikeshedding Ed, but I tried to update them here).

My read of the situation of this PR is something like:

(1) these two new API's are not very ideal (two new subclass API's particular to tangents), but at the very least they are purely optional, you get a loud error in the rare situation that your subclass needed them but they weren't provided, and in the long-term state we can effectively forget about them

(2) the "right" solution would be to retrace the backward. There is still some design necessary, we should probably sit down with Horace and get in agreement on a long term design. It's probably not worth blocking internal models on this though, hence the PR.

ezyang

This is nice and simple, thanks.

bdhirsh · 2024-02-26T19:54:54Z

FWIW - after looking at the performance profile of the model that needs this fix, this PR is even more of a reason to eventually do the "retrace the backward" work.

This PR fixes the problem by essentially forcing DTensor to perform extra collectives at runtime, and each of these collectives can potentially be extremely bad for performance (but for other subclasses, maybe this "coercing" won't have as much of a runtime cost).

…adata of tangents incorrectly" This PR is enough to fix #118600. More description of the problem is in the issue, but the high-level problem is similar to the "tangents might be non-contiguous" problem that we handle today, via forcing all tangents to be contiguous. There, the problem was something like: "We guessed the tangent strides incorrectly, because strides on the runtime tangents were different from strides on the forward outputs, which we used to generate tangents" Here, the problem is similar: "We guessed the tangent tensor subclass's metadata incorrectly, because the runtime tangent was a subclass with different metadata than the forward output subclass". This happened in an internal DTensor issue, where the metadata in question was the `placements` (shard vs. replicate vs. Partial). One option is to solve this problem via backward guards. This is needed to unblock internal though, so I figured handling this similarly to how we handle non-contiguous tangents would be reasonable. I did this by: (1) Assert that the metadata on subclass tangents is the same as what we guessed, and if not raise a loud error (2) In the error message, provide the name of an optional method that the subclass must implement to handle this case: `def __force_same_metadata__(self, metadata_tensor):`: If the forward output had a `Replicate()` placement, but the runtime tangent had a `Shard(1)` placement, this method allows a subclass to take the tangent and "convert" it to one with a `Replicate()` placement. `__force_standard_metadata__(self)`: One issue is that there is another placement called `_Partial`, and its semantics are such that DTensor is **unable** to convert a DTensor with some placement type into another DTensor with a `_Partial` placement. `__force_standard_metadata__` is now called on all (fake) subclass forward outs at trace-time to generate tangents, and gives subclasses a chance to "fix" any outputs with metadata that they cannot convert to later. Morally, this is similar to the fact that we force a `contiguous()` call on all tangents at trace-time. I'm interested in thoughts/feedback! Two new dunder methods on traceable subclasses is definitely a contentious change. cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 [ghstack-poisoned]

bdhirsh · 2024-03-22T20:04:14Z

@pytorchbot merge

pytorchmergebot · 2024-03-22T20:06:05Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…ngents incorrectly (#118670) This PR is enough to fix #118600. More description of the problem is in the issue, but the high-level problem is similar to the "tangents might be non-contiguous" problem that we handle today, via forcing all tangents to be contiguous. There, the problem was something like: "We guessed the tangent strides incorrectly, because strides on the runtime tangents were different from strides on the forward outputs, which we used to generate tangents" Here, the problem is similar: "We guessed the tangent tensor subclass's metadata incorrectly, because the runtime tangent was a subclass with different metadata than the forward output subclass". This happened in an internal DTensor issue, where the metadata in question was the `placements` (shard vs. replicate vs. Partial). One option is to solve this problem via backward guards. This is needed to unblock internal though, so I figured handling this similarly to how we handle non-contiguous tangents would be reasonable. I did this by: (1) Assert that the metadata on subclass tangents is the same as what we guessed, and if not raise a loud error (2) In the error message, provide the name of an optional method that the subclass must implement to handle this case: `def __force_same_metadata__(self, metadata_tensor):`: If the forward output had a `Replicate()` placement, but the runtime tangent had a `Shard(1)` placement, this method allows a subclass to take the tangent and "convert" it to one with a `Replicate()` placement. `__force_standard_metadata__(self)`: One issue is that there is another placement called `_Partial`, and its semantics are such that DTensor is **unable** to convert a DTensor with some placement type into another DTensor with a `_Partial` placement. `__force_standard_metadata__` is now called on all (fake) subclass forward outs at trace-time to generate tangents, and gives subclasses a chance to "fix" any outputs with metadata that they cannot convert to later. Morally, this is similar to the fact that we force a `contiguous()` call on all tangents at trace-time. I'm interested in thoughts/feedback! Two new dunder methods on traceable subclasses is definitely a contentious change. Pull Request resolved: #118670 Approved by: https://github.com/ezyang

AOTDispatch: allow subclasses to correct when we guess metadata of ta…

04b9559

…ngents incorrectly [ghstack-poisoned]

github-actions bot added oncall: distributed Add this issue/PR to distributed oncall triage queue ciflow/inductor labels Jan 30, 2024

bdhirsh mentioned this pull request Jan 30, 2024

DTensor: when forcing tangents to be correct, force syncing collectives at graph boundaries #118671

Closed

bdhirsh requested review from ezyang, zou3519 and tugsbayasgalan January 30, 2024 21:15

This was referenced Jan 31, 2024

dynamo: handle DTensor.device_mesh.device_type #118803

Closed

inductor: fix for functional_collectives.wait() followed by view() #118802

Closed

[not ready yet] guard on tensor subclass types before running inner tensor guards #119067

Closed

This was referenced Feb 15, 2024

dynamo: support placement kwargs for DTensor.to_local() #119947

Closed

[not ready yet] potentially fix set_() path of fakeifying for subclasses #119948

Closed

bdhirsh added 4 commits February 15, 2024 08:29

bdhirsh commented Feb 21, 2024

View reviewed changes

ezyang approved these changes Feb 22, 2024

View reviewed changes

bdhirsh added 2 commits February 29, 2024 08:33

yoyoyocmu self-requested a review March 1, 2024 17:28

bdhirsh added 2 commits March 20, 2024 08:13

bdhirsh mentioned this pull request Mar 22, 2024

compile: ban mutations on non-compositional uses of as_strided #122502

Closed

bdhirsh added the release notes: composability release notes category label Mar 22, 2024

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 22, 2024

pytorchmergebot added the merging label Mar 22, 2024

pytorchmergebot added the Merged label Mar 22, 2024

pytorchmergebot closed this in e7fa3f7 Mar 22, 2024

pytorchmergebot removed the merging label Mar 22, 2024

bdhirsh mentioned this pull request Mar 27, 2024

fix correctness for dynamo inlining RangeVariable __contains__ #122751

Closed

bdhirsh mentioned this pull request Apr 12, 2024

torch.compile accuracy issue when graph outputs are DTensors with _Partial placements #119485

Closed

github-actions bot deleted the gh/bdhirsh/525/head branch April 22, 2024 01:55

kshitij12345 mentioned this pull request Mar 26, 2025

Representing DTensor in thunder traces Lightning-AI/lightning-thunder#1907

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AOTDispatch: allow subclasses to correct when we guess metadata of tangents incorrectly #118670

AOTDispatch: allow subclasses to correct when we guess metadata of tangents incorrectly #118670

Uh oh!

bdhirsh commented Jan 30, 2024 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Jan 30, 2024 •

edited

Loading

Uh oh!

bdhirsh Feb 21, 2024

Uh oh!

ezyang left a comment

Uh oh!

bdhirsh commented Feb 26, 2024 •

edited

Loading

Uh oh!

bdhirsh commented Mar 22, 2024

Uh oh!

pytorchmergebot commented Mar 22, 2024

Uh oh!

Uh oh!

AOTDispatch: allow subclasses to correct when we guess metadata of tangents incorrectly #118670

AOTDispatch: allow subclasses to correct when we guess metadata of tangents incorrectly #118670

Uh oh!

Conversation

bdhirsh commented Jan 30, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118670

✅ No Failures

Uh oh!

bdhirsh Feb 21, 2024

Choose a reason for hiding this comment

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

bdhirsh commented Feb 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bdhirsh commented Mar 22, 2024

Uh oh!

pytorchmergebot commented Mar 22, 2024

Merge started

Uh oh!

Uh oh!

bdhirsh commented Jan 30, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jan 30, 2024 •

edited

Loading

bdhirsh commented Feb 26, 2024 •

edited

Loading