-
Notifications
You must be signed in to change notification settings - Fork 23.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Follow up on freezing #34786
Follow up on freezing #34786
Conversation
1) Rename 'HashIValue' to 'HashAliasedIValue' 2) Added Object case in getSubValues function 3) Hashes tensors to their storage 4) Added Dict case in orverrideGradient 5) nit clean up [ghstack-poisoned]
💊 CircleCI build failures summary and remediationsAs of commit 7163bf3 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no CircleCI failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker. This comment has been revised 34 times. |
1) Rename 'HashIValue' to 'HashAliasedIValue' 2) Added Object case in getSubValues function 3) Hashes tensors to their storage 4) Added Dict case in orverrideGradient 5) nit clean up [ghstack-poisoned]
1) Rename 'HashIValue' to 'HashAliasedIValue' 2) Added Object case in getSubValues function 3) Hashes tensors to their storage 4) Added Dict case in orverrideGradient 5) nit clean up [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes here look good, could you address a few of the other comments from PR? I added each as a comment to make it easier to track.
@@ -227,6 +227,13 @@ class AttributePropagator { | |||
elems.set(i, overrideGradient(elems.extract(i))); | |||
} | |||
attr = std::move(elems); | |||
} else if (attr.isGenericDict()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the override gradient implementation here, can you call getSubValues
and then remove the gradient for all tensor subvalues ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the gradient is not overridden inplace because it is detached so that the original module has no side effects so I can really use getSubValues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that makes sense. If that's the case don't you need to return the recursively constructed new attribute and reset it as the attribute ?
here for example we're not returning a new tuple with the detatched gradients. so it's either inplace, which we're not trying to do, or we're not changing anything.
elem = overrideGradient(elem); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @Krovatkin @smessmer @zdevito how to do correct detaching of gradients in these cases
case Tag::Object: { | ||
subValues.insert(*this); | ||
auto obj_type = type()->expect<ClassType>(); | ||
auto obj_value = toObject(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/pytorch/pytorch/pull/32178/files#r385953414 for this comment, could you address or clarify in the code ?
1) Rename 'HashIValue' to 'HashAliasedIValue' 2) Added Object case in getSubValues function 3) Hashes tensors to their storage 4) Added Dict case in orverrideGradient 5) nit clean up [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one comment about the recursive gradient detaching and it should be good to go
@@ -227,6 +227,13 @@ class AttributePropagator { | |||
elems.set(i, overrideGradient(elems.extract(i))); | |||
} | |||
attr = std::move(elems); | |||
} else if (attr.isGenericDict()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that makes sense. If that's the case don't you need to return the recursively constructed new attribute and reset it as the attribute ?
here for example we're not returning a new tuple with the detatched gradients. so it's either inplace, which we're not trying to do, or we're not changing anything.
elem = overrideGradient(elem); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meant to accept
1) Rename 'HashIValue' to 'HashAliasedIValue' 2) Added Object case in getSubValues function 3) Hashes tensors to their storage 4) Added Dict case in orverrideGradient 5) nit clean up [ghstack-poisoned]
1) Rename 'HashIValue' to 'HashAliasedIValue' 2) Added Object case in getSubValues function 3) Hashes tensors to their storage 4) Added Dict case in orverrideGradient 5) nit clean up [ghstack-poisoned]
1) Rename 'HashIValue' to 'HashAliasedIValue' 2) Added Object case in getSubValues function 3) Hashes tensors to their storage 4) Added Dict case in orverrideGradient 5) nit clean up Differential Revision: [D20585270](https://our.internmc.facebook.com/intern/diff/D20585270) [ghstack-poisoned]
Stack from ghstack:
Differential Revision: D20585270