-
Notifications
You must be signed in to change notification settings - Fork 22.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JIT] Optimize before inlining #35562
Conversation
💊 CircleCI build failures summary and remediationsAs of commit c51bfc1 (more details on the Dr. CI page): ✅ None of the build failures appear to be your fault 💚
🚧 4 upstream failures:These were probably caused by upstream breakages:
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker. This comment has been revised 88 times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eellison has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eellison has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Edit:
|
@BowenBao how do you recommend I proceed ? as far as I can tell the test is not failing on master. |
@eellison yea I checked a few other PRs and they don't seem to fail on this one, so I edited my previous comment. One thing bothers me is that you mentioned in the other PR that you could not repro this error locally. Let me change the |
@BowenBao could you check out the PR ? |
af6ceaf
to
c7aabfb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eellison has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@sanekmelnikov @J0Nreynolds I'm getting a tensorboard test failure here. i'm having trouble running the test for unrelated reasons |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eellison has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eellison has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eellison has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: Resubmit of pytorch#35424, only this time I run optimizations in the right order so the PR description is actually true. This speeds up the inlining pass of FairSeq model from 180s -> 13s, and MaskRCNN model from 5s -> 1.5s. Pull Request resolved: pytorch#35562 Differential Revision: D20738922 Pulled By: eellison fbshipit-source-id: 1439cf9d1f0bc780e2d64a744694f8b3b7ba4b70
Summary: With #35562, we are running peephole optimization on inlining to reduce the number of nodes that are copied. The tracer encodes the sizes in the graph like: ``` graph(%0 : Double(7)): %1 : Function = prim::Constant[name="tensor_size"]() %2 : Tensor = prim::CallFunction(%1, %0) return (%2) ``` however people would like to reuse the graph with different shapes so running size invalidations would invalidate that. long term it might be better for the tracer to not include shape information but there are downstream users of that. Separates out FuseAddMM from peephole so that now there is a single `disable_size_optimizations` parameter, and onnx explicitly invokes fuseaddmm. Pull Request resolved: #36404 Differential Revision: D20968974 Pulled By: eellison fbshipit-source-id: 56f8f1699e3b0adeeccdfd5a67bb975fd41a2913
Resubmit of #35424, only this time I run optimizations in the right order so the PR description is actually true.
This speeds up the inlining pass of FairSeq model from 180s -> 13s, and MaskRCNN model from 5s -> 1.5s.