-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TOPI][x86] Injective schedule improvement #4786
Conversation
Please do not merge, running few more performance tests |
Update - Interesting observation. Even though the single pad operator sees a large speedup with this PR, the operators that follow pad sees a consistent slowdown in the original graph. I think the reason is that h and w are spread across cores, causing data transfer issues for the second operator. Will try a few more options. If nothing works, I will close the PR |
One test fails with this
Is this expected? |
Yizhi helped. Vectorize works only with const extents. Added a split to make it work. |
looks good to me. Thanks @anijain2305 |
* [TOPI][x86] Injective Schedule Improvement. * Add tiling. * Vectorize when there is an axis.
* [TOPI][x86] Injective Schedule Improvement. * Add tiling. * Vectorize when there is an axis.
* [TOPI][x86] Injective Schedule Improvement. * Add tiling. * Vectorize when there is an axis.
While working on quantized mobilenet V2, I saw that pad operator was taking around 25% of total time on cascade lake machine. This PR optimizes the injective schedule by performing vectorization
For following test
Before PR - 80 us
After PR - 5 us
@yzhliu @vinx13 @shoubhik @yidawang please review