Performance pitfall and regression around closures in Rust 2021 #89470

CryZe · 2021-10-02T17:58:16Z

In Rust 2021 the borrow checker got extended to see through borrow paths when writing closures. I've been wondering how this is implemented, but it seems like it's basically just syntax sugar for individual manual borrows that get moved in. This however leads to worse codegen than when a move closure would've been possible. So you might end up with code that performs worse by accident:

Godbolt

pub struct Foo {
    x: f32,
    y: f32,
    z: f32,
}

pub fn with_move(bar: &mut Foo) -> impl Fn() -> f32 + '_ {
    move || bar.x + bar.y + bar.z
}

pub fn without_move(bar: &mut Foo) -> impl Fn() -> f32 + '_ {
    || bar.x + bar.y + bar.z
}

example::with_move:
        mov     rax, rdi
        ret

example::without_move:
        mov     rax, rdi
        lea     rcx, [rsi + 4]
        mov     qword ptr [rdi], rsi
        add     rsi, 8
        mov     qword ptr [rdi + 8], rcx
        mov     qword ptr [rdi + 16], rsi
        ret

CryZe · 2021-10-02T18:17:36Z

This actually seems to even be a performance regression when passing closures into functions (where either move and without worked just fine in Rust 2018 as opposed to the previous example):

https://rust.godbolt.org/z/8G9e5GKn8

Rust 2018:

example::without_move:
        sub     rsp, 24
        mov     qword ptr [rsp + 8], rdi
        lea     rax, [rsp + 8]
        mov     qword ptr [rsp + 16], rax
        lea     rsi, [rip + .L__unnamed_2]
        lea     rdi, [rsp + 16]
        call    qword ptr [rip + use_it@GOTPCREL]
        add     rsp, 24
        ret

Rust 2021:

example::without_move:
        sub     rsp, 40
        lea     rax, [rdi + 4]
        lea     rcx, [rdi + 8]
        mov     qword ptr [rsp + 8], rdi
        add     rdi, 12
        mov     qword ptr [rsp + 16], rax
        mov     qword ptr [rsp + 24], rcx
        mov     qword ptr [rsp + 32], rdi
        lea     rsi, [rip + .L__unnamed_2]
        lea     rdi, [rsp + 8]
        call    qword ptr [rip + use_it@GOTPCREL]
        add     rsp, 40
        ret

Mark-Simulacrum · 2021-10-02T20:48:23Z

cc @rust-lang/wg-rfc-2229

Mark-Simulacrum · 2021-10-02T20:55:55Z

Per #88126 it looks like this is because these functions take &mut Foo, not &Foo, which would fix(?) the regression entirely.

I believe that we can introduce an optimization similar to the minimization rule ("Minimization: If the set C contains two places (P1, M1) and (P2, M2) where P1 is a prefix of P2, then just capture (P1, max(M1, M2))") that would say in this case that &mut Foo is a prefix of all the places and is universally captures (i.e. all subfields) which would remove this performance hit. That should be possible even in the future; it's not generally user-visible.

However, that specific optimization would be defeated if the closure did not capture some other field in the struct. I believe we can't do anything about that -- we don't want capture inference to be based on what goes on outside the closure, and it's possible that the other field is used by code (after all, that's exactly the point of these changes).

nbdd0121 · 2021-10-02T23:32:14Z

Theoretically (for codegen) we could just pass *mut Foo and access via it. But that's quite a major change because it needs to be natively handled by borrow checker and we couldn't just "desugar and forget" closures anymore.

apiraino · 2021-10-03T12:53:54Z

Assigning priority as discussed in the Zulip thread of the Prioritization Working Group.

@rustbot label -I-prioritize +P-medium

nikomatsakis · 2021-10-04T14:26:08Z

It is definitely true that closures can be suboptimal relative to the past -- they can also be better. We did measurements that found very little impact overall.

The general fix for this problem is to do some kind of "fancy patch" that rewrites the closure to store a pointer to the surrounding stack frame but use unsafe code to ensure we only access a subpart of that path, but that's a rather complex rewrite and didn't seem to be justified.

We could definitely optimize cases where all fields are captured -- that'd be easy enough to do, but I'm not sure how often it applies. It'd be worth gathering some data. I don't love it because of the "cliff" that occurs if you use less.

Hmm, it occurs to me: I've been contemplating the idea of "view types", which certainly seems relevant here. That would be the safe version of the "just capture a raw pointer" optimization. Interesting!

(The vague idea of a "view type" is something like [self.foo.bar, self.bar.baz] T which is a T where you only have access to a subset of the fields that match the filter. You could then borrow a view type and leave the other fields untouched. This would be useful for scenarios like:

impl MyStruct {
    fn inc_counter(&mut [self.counter] self) {
       self.counter += 1; // other fields are not touched
    }
}

although there are a number of different ways one could approach that.)

nbdd0121 · 2021-10-04T15:59:40Z

It turns out our type system and borrow checker is good enough: with PhantomData and raw pointers it wouldn't be too big a rewrite: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=63b95db3755118d6b8c11a929722c045

However this doesn't work well with stacked borrows.

nikomatsakis · 2021-10-05T00:18:21Z

I do also prefer to have a straightforward desugaring.

arora-aman · 2021-10-05T08:11:18Z

Sorry for getting to this late.

I just want to add some actual benchmarks that might help decide the amount of impact and how much on an immediate concern this is from edition 2021 prespective.

So when we first got this feature working I did some perf experiments by making the compiler work with this feature enabled by defaul and some of the capture precision discussions that were being discussed to fix some bugs. Those details are here.

The benchmark that is probably closest to current implementation is this one. There is definetly an impact but the number too bad (maybe someone else should judge this).

After this we had made an optimization for closure size that @Mark-Simulacrum mentioned that would translate to codegen as well. PR with stats (#86701).

I'd also like to add that move closure is not intended as an optimizaiton but is actually a limitation on precision we can provide to handle all cases involving derefernces in a move closure. Check (#88477)

RalfJung · 2023-08-28T08:09:43Z

This also leads to a size regression: this testcase prints "8" on old editions but "16" on the current edition. So, the moment a closure captures more than 1 field from a struct by-ref, we now carry around twice as much data (or more) as we need to.

CryZe added the C-bug Category: This is a bug. label Oct 2, 2021

CryZe changed the title ~~Performance pitfall around closures in Rust 2021~~ Performance pitfall and regression around closures in Rust 2021 Oct 2, 2021

camelid added A-closures Area: Closures (`|…| { … }`) A-edition-2021 Area: The 2021 edition I-slow Issue: Problems and improvements with respect to performance of generated code. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Oct 2, 2021

rustbot added P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Oct 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance pitfall and regression around closures in Rust 2021 #89470

Performance pitfall and regression around closures in Rust 2021 #89470

CryZe commented Oct 2, 2021 •

edited

Loading

CryZe commented Oct 2, 2021 •

edited

Loading

Mark-Simulacrum commented Oct 2, 2021

Mark-Simulacrum commented Oct 2, 2021

nbdd0121 commented Oct 2, 2021

apiraino commented Oct 3, 2021

nikomatsakis commented Oct 4, 2021

nbdd0121 commented Oct 4, 2021

nikomatsakis commented Oct 5, 2021

arora-aman commented Oct 5, 2021

RalfJung commented Aug 28, 2023 •

edited

Loading

Performance pitfall and regression around closures in Rust 2021 #89470

Performance pitfall and regression around closures in Rust 2021 #89470

Comments

CryZe commented Oct 2, 2021 • edited Loading

CryZe commented Oct 2, 2021 • edited Loading

Mark-Simulacrum commented Oct 2, 2021

Mark-Simulacrum commented Oct 2, 2021

nbdd0121 commented Oct 2, 2021

apiraino commented Oct 3, 2021

nikomatsakis commented Oct 4, 2021

nbdd0121 commented Oct 4, 2021

nikomatsakis commented Oct 5, 2021

arora-aman commented Oct 5, 2021

RalfJung commented Aug 28, 2023 • edited Loading

CryZe commented Oct 2, 2021 •

edited

Loading

CryZe commented Oct 2, 2021 •

edited

Loading

RalfJung commented Aug 28, 2023 •

edited

Loading