-
Notifications
You must be signed in to change notification settings - Fork 22.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix crash on unload torch cpu dll #67632
Conversation
CI Flow Status⚛️ CI FlowRuleset - Version:
You can add a comment to the PR and tag @pytorchbot with the following commands: # ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun
# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow For more information, please take a look at the CI Flow Wiki. |
Hi @DBraun! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks! |
🔗 Helpful links
✅ No Failures (0 Pending)As of commit 61f3d60 (more details on the Dr. CI page): Expand to see more💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
Hi @ezyang, could you please help me get the Windows libtorch binaries? It looks like that kind of job wasn't triggered https://app.circleci.com/pipelines/github/pytorch/pytorch?branch=pull%2F67632 |
Hi @jamesr66a, since you reviewed a related PR, can you take a look at this too? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Custom class related changes look fine to me. Adding @ilia-cher and @ZolotukhinM as reviewers for the profiler and tensorexpr changes respectively
Hi @ZolotukhinM and @gdankel , would it be possible to get the libtorch binaries for this PR, the same ones that I would have otherwise gotten from pytorch.org and selecting libtorch? For me, it's more urgent to try out the binaries than for this PR to be merged. Thank you. |
Fuser and tensorexpr changes look good to me as well!
You should be able to build pytorch libraries from sources: https://github.com/pytorch/pytorch#from-source |
Thanks for reviewing and your suggestion. I was hoping to rely on cloud infrastructure to get the binaries but I'll weigh this against compiling it locally. |
Ah, I see what you meant. For each PR we run lots of CI jobs and build pytorch in different configurations in the process. Maybe we can download build artifacts somehow, but I'm not sure. @malfet do you know if it's possible? |
@DBraun binary artifacts should be available for this PR from hud.pytorch.org/pr/67632 |
@@ -402,6 +402,23 @@ class class_ : public ::torch::detail::class_base { | |||
registerCustomClassMethod(std::move(method)); | |||
return method_val; | |||
} | |||
|
|||
// Wrapper function to force method deregistration on shutdown. | |||
static void registerCustomClassMethod(std::unique_ptr<jit::Function> method) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why couldn't we reuse RAII mechanism, already implemented in https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/dispatch/RegistrationHandleRAII.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just letting you know, I'm not the original author of this code, so feel free to edit as necessary.
Can you please add a unit-test that would crash before this change is introduced, but work fine afterwards? pytorch/torch/csrc/autograd/engine.cpp Lines 260 to 262 in 96d116f
|
It's pretty far out of my expertise, so I'm afraid I can't help much at this point. Update: The binaries that I downloaded fixed the crash I was experiencing. Thanks again! I hope it can be merged into the main branch sometime. |
I'm seeing a large number of test binaries crashing after this change. They seem to be primarily those using ASAN/TSAN, but also a handful that don't use such sanitizers:
Can somebody take a closer look? Meta link here: D38306879. cc @malfet @ezyang @jamesr66a |
@pytorchbot revert -c ghfirst "crashing in fbcode" |
❌ 🤖 pytorchbot command failed:
Try |
@pytorchbot revert -c ghfirst -m "crashing in fbcode" |
@pytorchbot successfully started a revert job. Check the current status here |
@DBraun your PR has been successfully reverted. |
This reverts commit a54c9a4. Reverted #67632 on behalf of https://github.com/ezyang due to crashing in fbcode
I propose we revert and investigate in the followup PR... Also few suggestion to followup PR:
|
I thought the original PR's description was pretty good |
@ezyang true, but original PR were never landed, was it? |
Trying to rebase pytorch/pytorch#61290 into latest pytorch:master Pull Request resolved: pytorch/pytorch#67632 Approved by: https://github.com/ezyang
This reverts commit 984e771. Reverted pytorch/pytorch#67632 on behalf of https://github.com/ezyang due to crashing in fbcode
Trying to rebase pytorch/pytorch#61290 into latest pytorch:master Pull Request resolved: pytorch/pytorch#67632 Approved by: https://github.com/ezyang
This reverts commit a036f0e. Reverted pytorch/pytorch#67632 on behalf of https://github.com/ezyang due to crashing in fbcode
This issue still persists in the latest release.
|
|
update: I built pytorch by applying these changes locally and the issue was not yet resolved. |
rebase the PR onto master and open a new PR |
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Trying to rebase #61290 into latest pytorch:master
cc @EikanWang @jgong5 @wenzhe-nrv @sanchitintel