Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add warning to a known autograd issue on XLA backend. #35449

Closed
wants to merge 1 commit into from

Conversation

ailzhang
Copy link
Contributor

No description provided.

@ailzhang ailzhang requested review from albanD and apaszke as code owners March 26, 2020 03:42
@dr-ci
Copy link

dr-ci bot commented Mar 26, 2020

💊 CircleCI build failures summary and remediations

As of commit f9c1117 (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following build failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_test (1/1)

Step: "Test" (full log | pattern match details) <confirmed not flaky by 2 failures>

Mar 26 20:59:54 [E request_callback_impl.cpp:94] Received error while processing request type 2: PickleError: ScriptModules cannot be deepcopied using copy.deepcopy or saved using torch.save. Mixed serialization of script and non-script modules is not supported. For purely script modules use my_script_module.save() instead.
Mar 26 20:59:54   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(86): serialize 
Mar 26 20:59:54   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(135): serialize 
Mar 26 20:59:54  
Mar 26 20:59:54 [E request_callback_impl.cpp:94] Received error while processing request type 2: PickleError: ScriptModules cannot be deepcopied using copy.deepcopy or saved using torch.save. Mixed serialization of script and non-script modules is not supported. For purely script modules use my_script_module.save(<filename>) instead. 
Mar 26 20:59:54  
Mar 26 20:59:54 At: 
Mar 26 20:59:54   /opt/conda/lib/python3.6/site-packages/torch/jit/__init__.py(1779): __getstate__ 
Mar 26 20:59:54   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(86): serialize 
Mar 26 20:59:54   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(135): serialize 
Mar 26 20:59:54  
Mar 26 20:59:54 [E request_callback_impl.cpp:94] Received error while processing request type 2: PickleError: ScriptModules cannot be deepcopied using copy.deepcopy or saved using torch.save. Mixed serialization of script and non-script modules is not supported. For purely script modules use my_script_module.save(<filename>) instead. 
Mar 26 20:59:54  
Mar 26 20:59:54 At: 
Mar 26 20:59:54   /opt/conda/lib/python3.6/site-packages/torch/jit/__init__.py(1779): __getstate__ 
Mar 26 20:59:54   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(86): serialize 
Mar 26 20:59:54   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(135): serialize 
Mar 26 20:59:54  
Mar 26 20:59:54 ok (1.118s) 
Mar 26 20:59:55   test_unexepected_kwarg_is_specified (__main__.JitRpcTestWithSpawn) ... ok (1.118s) 
Mar 26 20:59:56   test_user_rrefs_confirmed (__main__.JitRpcTestWithSpawn) ... ok (1.117s) 
Mar 26 20:59:57   test_user_rrefs_confirmed_remote (__main__.JitRpcTestWithSpawn) ... ok (1.117s) 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 6 times.

@@ -384,6 +384,13 @@ unsigned VariableHooks::_register_hook(const Tensor& self, std::function<Tensor(
}

void handle_view_on_rebase(DifferentiableViewMeta* diff_view_meta, bool indirect) {
// TODO: Remove this warning once we allow XLA to workaround CopySlices.
if (diff_view_meta->base_.device().type() == c10::DeviceType::XLA) {
TORCH_WARN("Backward through inplace update on view tensors is WIP for XLA backwend. "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would formulate this slightly differently maybe and adapt to the indirect flag:

  • When indirect=False, this is called because you are modifying inplace (right now) a view that requires gradients.
  • When indirect=True, this is called because another view of the same base was modified inplace (in the past) and this view requires gradients.

@ailzhang ailzhang force-pushed the xla_warning_copyslice branch from 94444e8 to be31227 Compare March 26, 2020 18:13
@ailzhang ailzhang requested a review from albanD March 26, 2020 18:18
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better, thanks for the changes!

if (diff_view_meta->base_.device().type() == c10::DeviceType::XLA) {
std::string msg;
if (indirect) {
msg = "This view requires gradients but its base or another view of the same base is modified inplace. ";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Not sure here, isn't it "has been modified inplace"?

@ailzhang ailzhang force-pushed the xla_warning_copyslice branch from be31227 to f9c1117 Compare March 26, 2020 18:30
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ailzhang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ailzhang merged this pull request in 2f6f178.

} else {
msg = "This view requires gradients and it's being modified inplace. ";
}
msg = c10::str(msg, "Backward through inplace update on view tensors is WIP for XLA backwend. "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backwend -> backend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha nice catch! Will send a fix!

msg = "This view requires gradients and it's being modified inplace. ";
}
msg = c10::str(msg, "Backward through inplace update on view tensors is WIP for XLA backwend. "
"Gradient might be wrong in certain cases. Running forward alone is fine. "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this prints in the forward? It's not clear what "alone is fine" means.

Is there anyway to 'tag' this and only print the warning in the backward?

Putting an example of this triggering would be helpful to see the effect.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this prints in the forward when the inplace is done and only if inputs require gradients.
I think we don't want to delay the error as it would make it much harder for the user to find the faulty inplace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally don't warn in the forward if it will cause a problem in the backward, right?

This seems like a good place for a mechanism you can turn on that warns/errors in the forward and the error in the backward tells you to turn it on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested rewording:

Running a backward pass through an inplace update on view tensors is a WIP for the XLA backend and may result in incorrect gradient computation in certain cases. Note this warning is being triggered on the inplace update (not the corresponding backward pass), and this update is safe if a backward pass is not run. To work around this limitation and to silence this warning, please replace the inplace operation by the corresponding out-of-place operation."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gchanan ! It has been updated in #35543.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these, we actually warn/error in the forward because we know that the backward will be wrong. Note that if you Tensor does not require gradients, then this won't be called. So people that do not use the autograd won't see it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants