-
Notifications
You must be signed in to change notification settings - Fork 24.6k
inductor: fix for functional_collectives.wait() followed by view() #118802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118802
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 3 Unrelated FailuresAs of commit 25e9a62 with merge base a631461 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
It looks good in general, would be curious if there is any failed tests from CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a problem with the fix. But I don't have confidence either bc I don't understand the IR very well or the changes that were made the FC inductor implementation as it evolved... Maybe @yifuwang is a good person to review.
…y view()" Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome! cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
…y view()" Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome! cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
…y view()" Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome! cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
…y view()" Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome! cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
…y view()" Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome! cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
…y view()" Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome! cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
…y view()" Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome! cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
…y view()" Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome! cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
@@ -469,7 +480,7 @@ def forward(self, input): | |||
FileCheck().check( | |||
"buf0 = torch.ops._c10d_functional.all_gather_into_tensor.default(primal" | |||
).check("buf1 = torch.ops._c10d_functional.wait_tensor.default(buf0").check( | |||
"extern_kernels.mm(buf0," | |||
"extern_kernels.mm(buf1," |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this test broken? Curious since I don't think your change affects this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I don't think it was. wait()
was just broken in some situations, like when there is a view right after it, but this PR technically changed the name of the buffer of all downstream users of wait_tensor
(so use the output of wait_tensor
instead of the input)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a heads up - we've switch to native funcol which doesn't use the Inductor IR you modified. The test you introduced is still legit though.
We are planning to remove the old funcol ops and IRs soon. In case you want the test to also cover the old funcol in the meanwhile, you can decorate it with @run_with_both_funcol_impls
.
…y view()" Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome! cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
…y view()" Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome! cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
@pytorchbot merge |
Merge failedReason: This PR needs a If not, please add the To add a label, you can comment to pytorchbot, for example For more information, see Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 5 jobs have failed, first few of them are: inductor, trunk, linux-binary-manywheel, linux-binary-libtorch-cxx11-abi, linux-binary-libtorch-pre-cxx11 Details for Dev Infra teamRaised by workflow job |
…y view()" Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome! cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire [ghstack-poisoned]
…y view()" Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome! cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire [ghstack-poisoned]
…y view()" Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome! cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire [ghstack-poisoned]
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome!
Stack from ghstack (oldest at bottom):
cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @rohan-varma @aakhundov