-
Notifications
You must be signed in to change notification settings - Fork 22.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add warning to a known autograd issue on XLA backend. #35449
Conversation
💊 CircleCI build failures summary and remediationsAs of commit f9c1117 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following build failures do not appear to be due to upstream breakages: pytorch_linux_xenial_py3_6_gcc5_4_test (1/1)Step: "Test" (full log | pattern match details) <confirmed not flaky by 2 failures>
|
torch/csrc/autograd/variable.cpp
Outdated
@@ -384,6 +384,13 @@ unsigned VariableHooks::_register_hook(const Tensor& self, std::function<Tensor( | |||
} | |||
|
|||
void handle_view_on_rebase(DifferentiableViewMeta* diff_view_meta, bool indirect) { | |||
// TODO: Remove this warning once we allow XLA to workaround CopySlices. | |||
if (diff_view_meta->base_.device().type() == c10::DeviceType::XLA) { | |||
TORCH_WARN("Backward through inplace update on view tensors is WIP for XLA backwend. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would formulate this slightly differently maybe and adapt to the indirect
flag:
- When indirect=False, this is called because you are modifying inplace (right now) a view that requires gradients.
- When indirect=True, this is called because another view of the same base was modified inplace (in the past) and this view requires gradients.
94444e8
to
be31227
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks much better, thanks for the changes!
torch/csrc/autograd/variable.cpp
Outdated
if (diff_view_meta->base_.device().type() == c10::DeviceType::XLA) { | ||
std::string msg; | ||
if (indirect) { | ||
msg = "This view requires gradients but its base or another view of the same base is modified inplace. "; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Not sure here, isn't it "has been modified inplace"?
be31227
to
f9c1117
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ailzhang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
} else { | ||
msg = "This view requires gradients and it's being modified inplace. "; | ||
} | ||
msg = c10::str(msg, "Backward through inplace update on view tensors is WIP for XLA backwend. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backwend -> backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha nice catch! Will send a fix!
msg = "This view requires gradients and it's being modified inplace. "; | ||
} | ||
msg = c10::str(msg, "Backward through inplace update on view tensors is WIP for XLA backwend. " | ||
"Gradient might be wrong in certain cases. Running forward alone is fine. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this prints in the forward? It's not clear what "alone is fine" means.
Is there anyway to 'tag' this and only print the warning in the backward?
Putting an example of this triggering would be helpful to see the effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this prints in the forward when the inplace is done and only if inputs require gradients.
I think we don't want to delay the error as it would make it much harder for the user to find the faulty inplace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally don't warn in the forward if it will cause a problem in the backward, right?
This seems like a good place for a mechanism you can turn on that warns/errors in the forward and the error in the backward tells you to turn it on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested rewording:
Running a backward pass through an inplace update on view tensors is a WIP for the XLA backend and may result in incorrect gradient computation in certain cases. Note this warning is being triggered on the inplace update (not the corresponding backward pass), and this update is safe if a backward pass is not run. To work around this limitation and to silence this warning, please replace the inplace operation by the corresponding out-of-place operation."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these, we actually warn/error in the forward because we know that the backward will be wrong. Note that if you Tensor does not require gradients, then this won't be called. So people that do not use the autograd won't see it.
No description provided.