Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance pitfall and regression around closures in Rust 2021 #89470

Open
CryZe opened this issue Oct 2, 2021 · 10 comments
Open

Performance pitfall and regression around closures in Rust 2021 #89470

CryZe opened this issue Oct 2, 2021 · 10 comments
Labels
A-closures Area: Closures (`|…| { … }`) A-edition-2021 Area: The 2021 edition C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. P-medium Medium priority

Comments

@CryZe
Copy link
Contributor

CryZe commented Oct 2, 2021

In Rust 2021 the borrow checker got extended to see through borrow paths when writing closures. I've been wondering how this is implemented, but it seems like it's basically just syntax sugar for individual manual borrows that get moved in. This however leads to worse codegen than when a move closure would've been possible. So you might end up with code that performs worse by accident:

Godbolt

pub struct Foo {
    x: f32,
    y: f32,
    z: f32,
}

pub fn with_move(bar: &mut Foo) -> impl Fn() -> f32 + '_ {
    move || bar.x + bar.y + bar.z
}

pub fn without_move(bar: &mut Foo) -> impl Fn() -> f32 + '_ {
    || bar.x + bar.y + bar.z
}
example::with_move:
        mov     rax, rdi
        ret

example::without_move:
        mov     rax, rdi
        lea     rcx, [rsi + 4]
        mov     qword ptr [rdi], rsi
        add     rsi, 8
        mov     qword ptr [rdi + 8], rcx
        mov     qword ptr [rdi + 16], rsi
        ret
@CryZe CryZe added the C-bug Category: This is a bug. label Oct 2, 2021
@CryZe
Copy link
Contributor Author

CryZe commented Oct 2, 2021

This actually seems to even be a performance regression when passing closures into functions (where either move and without worked just fine in Rust 2018 as opposed to the previous example):

https://rust.godbolt.org/z/8G9e5GKn8

Rust 2018:

example::without_move:
        sub     rsp, 24
        mov     qword ptr [rsp + 8], rdi
        lea     rax, [rsp + 8]
        mov     qword ptr [rsp + 16], rax
        lea     rsi, [rip + .L__unnamed_2]
        lea     rdi, [rsp + 16]
        call    qword ptr [rip + use_it@GOTPCREL]
        add     rsp, 24
        ret

Rust 2021:

example::without_move:
        sub     rsp, 40
        lea     rax, [rdi + 4]
        lea     rcx, [rdi + 8]
        mov     qword ptr [rsp + 8], rdi
        add     rdi, 12
        mov     qword ptr [rsp + 16], rax
        mov     qword ptr [rsp + 24], rcx
        mov     qword ptr [rsp + 32], rdi
        lea     rsi, [rip + .L__unnamed_2]
        lea     rdi, [rsp + 8]
        call    qword ptr [rip + use_it@GOTPCREL]
        add     rsp, 40
        ret

@CryZe CryZe changed the title Performance pitfall around closures in Rust 2021 Performance pitfall and regression around closures in Rust 2021 Oct 2, 2021
@camelid camelid added A-closures Area: Closures (`|…| { … }`) A-edition-2021 Area: The 2021 edition I-slow Issue: Problems and improvements with respect to performance of generated code. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Oct 2, 2021
@Mark-Simulacrum
Copy link
Member

cc @rust-lang/wg-rfc-2229

@Mark-Simulacrum
Copy link
Member

Per #88126 it looks like this is because these functions take &mut Foo, not &Foo, which would fix(?) the regression entirely.

I believe that we can introduce an optimization similar to the minimization rule ("Minimization: If the set C contains two places (P1, M1) and (P2, M2) where P1 is a prefix of P2, then just capture (P1, max(M1, M2))") that would say in this case that &mut Foo is a prefix of all the places and is universally captures (i.e. all subfields) which would remove this performance hit. That should be possible even in the future; it's not generally user-visible.

However, that specific optimization would be defeated if the closure did not capture some other field in the struct. I believe we can't do anything about that -- we don't want capture inference to be based on what goes on outside the closure, and it's possible that the other field is used by code (after all, that's exactly the point of these changes).

@nbdd0121
Copy link
Contributor

nbdd0121 commented Oct 2, 2021

Theoretically (for codegen) we could just pass *mut Foo and access via it. But that's quite a major change because it needs to be natively handled by borrow checker and we couldn't just "desugar and forget" closures anymore.

@apiraino
Copy link
Contributor

apiraino commented Oct 3, 2021

Assigning priority as discussed in the Zulip thread of the Prioritization Working Group.

@rustbot label -I-prioritize +P-medium

@rustbot rustbot added P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Oct 3, 2021
@nikomatsakis
Copy link
Contributor

It is definitely true that closures can be suboptimal relative to the past -- they can also be better. We did measurements that found very little impact overall.

The general fix for this problem is to do some kind of "fancy patch" that rewrites the closure to store a pointer to the surrounding stack frame but use unsafe code to ensure we only access a subpart of that path, but that's a rather complex rewrite and didn't seem to be justified.

We could definitely optimize cases where all fields are captured -- that'd be easy enough to do, but I'm not sure how often it applies. It'd be worth gathering some data. I don't love it because of the "cliff" that occurs if you use less.

Hmm, it occurs to me: I've been contemplating the idea of "view types", which certainly seems relevant here. That would be the safe version of the "just capture a raw pointer" optimization. Interesting!

(The vague idea of a "view type" is something like [self.foo.bar, self.bar.baz] T which is a T where you only have access to a subset of the fields that match the filter. You could then borrow a view type and leave the other fields untouched. This would be useful for scenarios like:

impl MyStruct {
    fn inc_counter(&mut [self.counter] self) {
       self.counter += 1; // other fields are not touched
    }
}

although there are a number of different ways one could approach that.)

@nbdd0121
Copy link
Contributor

nbdd0121 commented Oct 4, 2021

It turns out our type system and borrow checker is good enough: with PhantomData and raw pointers it wouldn't be too big a rewrite: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=63b95db3755118d6b8c11a929722c045

However this doesn't work well with stacked borrows.

@nikomatsakis
Copy link
Contributor

I do also prefer to have a straightforward desugaring.

@arora-aman
Copy link
Member

Sorry for getting to this late.

I just want to add some actual benchmarks that might help decide the amount of impact and how much on an immediate concern this is from edition 2021 prespective.

So when we first got this feature working I did some perf experiments by making the compiler work with this feature enabled by defaul and some of the capture precision discussions that were being discussed to fix some bugs. Those details are here.

The benchmark that is probably closest to current implementation is this one. There is definetly an impact but the number too bad (maybe someone else should judge this).

After this we had made an optimization for closure size that @Mark-Simulacrum mentioned that would translate to codegen as well. PR with stats (#86701).

I'd also like to add that move closure is not intended as an optimizaiton but is actually a limitation on precision we can provide to handle all cases involving derefernces in a move closure. Check (#88477)

@RalfJung
Copy link
Member

RalfJung commented Aug 28, 2023

This also leads to a size regression: this testcase prints "8" on old editions but "16" on the current edition. So, the moment a closure captures more than 1 field from a struct by-ref, we now carry around twice as much data (or more) as we need to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-closures Area: Closures (`|…| { … }`) A-edition-2021 Area: The 2021 edition C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. P-medium Medium priority
Projects
None yet
Development

No branches or pull requests

9 participants