-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance pitfall and regression around closures in Rust 2021 #89470
Comments
This actually seems to even be a performance regression when passing closures into functions (where either https://rust.godbolt.org/z/8G9e5GKn8 Rust 2018: example::without_move:
sub rsp, 24
mov qword ptr [rsp + 8], rdi
lea rax, [rsp + 8]
mov qword ptr [rsp + 16], rax
lea rsi, [rip + .L__unnamed_2]
lea rdi, [rsp + 16]
call qword ptr [rip + use_it@GOTPCREL]
add rsp, 24
ret Rust 2021: example::without_move:
sub rsp, 40
lea rax, [rdi + 4]
lea rcx, [rdi + 8]
mov qword ptr [rsp + 8], rdi
add rdi, 12
mov qword ptr [rsp + 16], rax
mov qword ptr [rsp + 24], rcx
mov qword ptr [rsp + 32], rdi
lea rsi, [rip + .L__unnamed_2]
lea rdi, [rsp + 8]
call qword ptr [rip + use_it@GOTPCREL]
add rsp, 40
ret |
cc @rust-lang/wg-rfc-2229 |
Per #88126 it looks like this is because these functions take I believe that we can introduce an optimization similar to the minimization rule ("Minimization: If the set C contains two places (P1, M1) and (P2, M2) where P1 is a prefix of P2, then just capture (P1, max(M1, M2))") that would say in this case that However, that specific optimization would be defeated if the closure did not capture some other field in the struct. I believe we can't do anything about that -- we don't want capture inference to be based on what goes on outside the closure, and it's possible that the other field is used by code (after all, that's exactly the point of these changes). |
Theoretically (for codegen) we could just pass |
Assigning priority as discussed in the Zulip thread of the Prioritization Working Group. @rustbot label -I-prioritize +P-medium |
It is definitely true that closures can be suboptimal relative to the past -- they can also be better. We did measurements that found very little impact overall. The general fix for this problem is to do some kind of "fancy patch" that rewrites the closure to store a pointer to the surrounding stack frame but use unsafe code to ensure we only access a subpart of that path, but that's a rather complex rewrite and didn't seem to be justified. We could definitely optimize cases where all fields are captured -- that'd be easy enough to do, but I'm not sure how often it applies. It'd be worth gathering some data. I don't love it because of the "cliff" that occurs if you use less. Hmm, it occurs to me: I've been contemplating the idea of "view types", which certainly seems relevant here. That would be the safe version of the "just capture a raw pointer" optimization. Interesting! (The vague idea of a "view type" is something like impl MyStruct {
fn inc_counter(&mut [self.counter] self) {
self.counter += 1; // other fields are not touched
}
} although there are a number of different ways one could approach that.) |
It turns out our type system and borrow checker is good enough: with PhantomData and raw pointers it wouldn't be too big a rewrite: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=63b95db3755118d6b8c11a929722c045 However this doesn't work well with stacked borrows. |
I do also prefer to have a straightforward desugaring. |
Sorry for getting to this late. I just want to add some actual benchmarks that might help decide the amount of impact and how much on an immediate concern this is from edition 2021 prespective. So when we first got this feature working I did some perf experiments by making the compiler work with this feature enabled by defaul and some of the capture precision discussions that were being discussed to fix some bugs. Those details are here. The benchmark that is probably closest to current implementation is this one. There is definetly an impact but the number too bad (maybe someone else should judge this). After this we had made an optimization for closure size that @Mark-Simulacrum mentioned that would translate to codegen as well. PR with stats (#86701). I'd also like to add that |
This also leads to a size regression: this testcase prints "8" on old editions but "16" on the current edition. So, the moment a closure captures more than 1 field from a struct by-ref, we now carry around twice as much data (or more) as we need to. |
In Rust 2021 the borrow checker got extended to see through borrow paths when writing closures. I've been wondering how this is implemented, but it seems like it's basically just syntax sugar for individual manual borrows that get moved in. This however leads to worse codegen than when a move closure would've been possible. So you might end up with code that performs worse by accident:
Godbolt
The text was updated successfully, but these errors were encountered: