-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(rust): Use defunctionalization in polars-core scalar.rs in order to reduce code duplication #20377
base: main
Are you sure you want to change the base?
Conversation
The use of capturing closures when calling bitonic_mask leads to unnecessary code duplication, because Rust will create one new type per closure, and then make one copy of bitonic_mask per call site. With this change, these copies are avoided, which should improve build speed noticeably. cargo llvm-lines shows that the number of copies is reduced from 144 to 12.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #20377 +/- ##
=======================================
Coverage 79.10% 79.10%
=======================================
Files 1572 1572
Lines 219958 219963 +5
Branches 2465 2465
=======================================
+ Hits 173991 174006 +15
+ Misses 45399 45389 -10
Partials 568 568 ☔ View full report in Codecov by Sentry. |
While better compile times are good, and it probably doesn't matter since it's only called a logarithmic number of times it still feels bad to me we do a match on every single comparison now. |
The apply method should in theory be inlined and the match disappear. I have an alternative approach that uses a custom closure type that achieves the same reduction, would you prefer a PR for that instead? |
eb8d01b
to
d21f75d
Compare
@burakemir If it were inlined it would not reduce the amount of code generated after all, no? |
The problem addressed is the needless duplication because of monorphization: the FA and FD generic parameters get instantiated with a special (per-call site) closure type. This leads to the 144 copies of bitonic_mask, which is reduced to 12 by either this or the alternative approach. You are of course correct that inlining can in general inlining will cause code size increases. In case of the match statement with a statically known scrutinee, the whole match statement should be removed, which then ideally gives rise to further inlining of the actual comparison method (though I have not checked). |
This change reduces the number of copies of the
bitonic_map
function from 144 to 12. It should help with build times and code size, without any change in performance.