Skip to content

inductor: fix for functional_collectives.wait() followed by view() #118802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from

Conversation

Copy link

pytorch-bot bot commented Jan 31, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118802

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 3 Unrelated Failures

As of commit 25e9a62 with merge base a631461 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@yf225
Copy link
Contributor

yf225 commented Feb 1, 2024

It looks good in general, would be curious if there is any failed tests from CI

Copy link
Contributor

@wconstab wconstab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a problem with the fix. But I don't have confidence either bc I don't understand the IR very well or the changes that were made the FC inductor implementation as it evolved... Maybe @yifuwang is a good person to review.

…y view()"

Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome!




cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
…y view()"

Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome!




cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
…y view()"

Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome!




cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
…y view()"

Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome!




cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
…y view()"

Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome!




cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
@wanchaol wanchaol requested a review from yifuwang February 29, 2024 06:19
…y view()"

Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome!




cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
…y view()"

Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome!




cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
…y view()"

Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome!




cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
bdhirsh added a commit that referenced this pull request Mar 20, 2024
@@ -469,7 +480,7 @@ def forward(self, input):
FileCheck().check(
"buf0 = torch.ops._c10d_functional.all_gather_into_tensor.default(primal"
).check("buf1 = torch.ops._c10d_functional.wait_tensor.default(buf0").check(
"extern_kernels.mm(buf0,"
"extern_kernels.mm(buf1,"
Copy link
Collaborator

@yifuwang yifuwang Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this test broken? Curious since I don't think your change affects this test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I don't think it was. wait() was just broken in some situations, like when there is a view right after it, but this PR technically changed the name of the buffer of all downstream users of wait_tensor (so use the output of wait_tensor instead of the input)

Copy link
Collaborator

@yifuwang yifuwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a heads up - we've switch to native funcol which doesn't use the Inductor IR you modified. The test you introduced is still legit though.

We are planning to remove the old funcol ops and IRs soon. In case you want the test to also cover the old funcol in the meanwhile, you can decorate it with @run_with_both_funcol_impls.

…y view()"

Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome!




cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
…y view()"

Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome!




cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
@bdhirsh
Copy link
Contributor Author

bdhirsh commented Mar 22, 2024

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 22, 2024
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@bdhirsh
Copy link
Contributor Author

bdhirsh commented Mar 22, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 5 jobs have failed, first few of them are: inductor, trunk, linux-binary-manywheel, linux-binary-libtorch-cxx11-abi, linux-binary-libtorch-pre-cxx11

Details for Dev Infra team Raised by workflow job

…y view()"

Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome!




cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire

[ghstack-poisoned]
bdhirsh added 2 commits April 4, 2024 08:45
…y view()"

Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome!




cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire

[ghstack-poisoned]
…y view()"

Potential fix for #118759. See the issue linked for more diagnosis / explanation of this (tentative) fix. Feedback welcome!




cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire

[ghstack-poisoned]
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Jun 10, 2024
@github-actions github-actions bot closed this Jul 10, 2024
@github-actions github-actions bot deleted the gh/bdhirsh/528/head branch August 10, 2024 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request module: inductor oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: inductor Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants