-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
another round of perf improvements for equalize #6776
Conversation
Co-authored-by: lezcano <lezcano-93@hotmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great improvements @lezcano and @pmeier. How thoroughly have we tested the new implementation to confirm it returns the same results as previously? The code changed quite substantially from the original approach. If you are confident, I'm happy to merge. Just being mindful that our tests might have minor gaps at the moment, so we should be extra careful.
I'm currently doing a deep dive into the implementation adding comprehensive comments and the like. I'll ping here if I'm done and convinced everything is correct. |
I didn't run comprehensive tests, but the result does agree on matrices and batches of matrices. @vfdev-5 already reviewed it, but it's always fine to have it reviewed by a second pair of eyes :) |
@pmeier Sounds great. If possible please compare the outputs of the 2 implementations for a few thousand random inputs. I'll give it another look on my side after you are done too, just to be safe. @lezcano Understood, we should do a bit more checks on our side but I don't have any concerns at the moment. Thanks a lot for your work on this one. |
@datumbox I've improved the reference test quite a bit to carry more information. The new kernel passes all tests. Plus, I've also run @vfdev-5's benchmark with it a couple of times and nothing triggered there, so I think we are good. Re-running the benchmark after the recent changes gives:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments in case they are helpful, but it LGTM really
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Let's make the changes on the comments, otherwise we are good to go.
Summary: * perf improvements for equalize * improve reference tests * add extensive comments and minor fixes to the kernel * improve comments Reviewed By: YosuaMichael Differential Revision: D40588160 fbshipit-source-id: ffe05fa6aa188a3d2dfe98f4367cb1d81abe1e47 Co-authored-by: lezcano <lezcano-93@hotmail.com> Co-authored-by: lezcano <lezcano-93@hotmail.com>
This patch was mainly authored by @lezcano, with me only making minor adjustments for JIT.
With this change the kernel "natively" supports arbitrary batch sizes. Furthermore, CUDA execution is quite a bit faster. CPU execution is the same within measuring tolerance.
benchmark script