Missed optimizations for Vec::pop() followed by Vec::push() #61939

broken-pen · 2019-06-18T20:03:25Z

The following example should be get optimized to a no-op by the compiler:

pub fn noop(v: &mut Vec<i32>) {
    if let Some(last) = v.pop() {
        v.push(last)
    }
}

Instead, it produces about 60 lines of assembly, which also contain calls to __rust_alloc() and alloc::raw_vec::capacity_overflow().
The following code is better:

pub fn noop(v: &mut Vec<i32>) {
    if let Some(last) = v.pop() {
        assert!(v.len() < v.capacity());
        v.push(last)
    }
}

producing:

example::noop:
        push    rax
        mov     rax, qword ptr [rdi + 16]
        test    rax, rax
        je      .LBB7_3
        lea     rcx, [rax - 1]
        mov     qword ptr [rdi + 16], rcx
        cmp     rcx, qword ptr [rdi + 8]
        jae     .LBB7_4
        mov     qword ptr [rdi + 16], rax
.LBB7_3:
        pop     rax
        ret
.LBB7_4:
        call    std::panicking::begin_panic
        ud2

And even better:

pub fn noop(v: &mut Vec<i32>) {
    if let Some(last) = v.pop() {
        if v.capacity() <= v.len() {
            unsafe { std::hint::unreachable_unchecked() }
        }
        v.push(last)
    }
}

producing:

example::noop:
        ret

Compiler explorer

I feel like the compiler should be able to handle this situation better, considering that all the information is readily available.

The text was updated successfully, but these errors were encountered:

Diomendius · 2019-08-07T03:13:40Z

Similarly, both of these functions fail to optimize out the unwrap()s.

pub fn push_pop(v: &mut Vec<i32>) -> i32 {
    v.push(1);
    v.pop().unwrap()
}

pub fn push_and_get_ref(v: &mut Vec<i32>) -> &i32 {
    v.push(1);
    v.last().unwrap()
}

Unlike the example in the OP, this can technically cause v.len() to overflow, but I'm not sure it's even possible for a system to exist where an allocation of > usize::max() bytes is possible, let alone possible in practice, so it should be safe to assume that any operation that extends a container always ensures len() > 0.

The modifications to push() in #44452 (comment) might let this be optimized (specifically assume(self.len() == len) in the OP's case) although I hope a more elegant solution is possible.

I think in the case of push-after-pop it should be possible for the compiler to optimize out len() < capacity() without any kind of hint, but I'm not sure about len() > 0 as, given Rust's guarantees about overflow, the compiler would have to be able to statically determine that len() can't overflow without triggering an allocation failure (and that allocation failure is a panic/abort/early return).

broken-pen · 2020-12-22T12:32:58Z

FWIW, the workaround has been pessimized in Rust 1.38. The variant using unreachable_unchecked() now produces this assembly:

example::noop:
        mov     rax, qword ptr [rdi + 16]
        test    rax, rax
        je      .LBB0_2
        mov     qword ptr [rdi + 16], rax
.LBB0_2:
        ret

Compiler explorer

In other words, there is now no possibility at all to get Rustc to optimize this piece of code correctly down to a no-op.

broken-pen · 2021-05-16T12:50:26Z

The above behavior is still present in Rust 1.53 beta. The current 1.54.0-nightly (4de7572 2021-05-01) produces this assembly:

example::noop:
        cmp     qword ptr [rdi + 16], 0
        ret

Compiler explorer

I'm not sure what has changed between these two versions.

Diomendius · 2021-05-24T01:08:04Z

It looks like the v.len() == 0 (or rather if let None = v.pop()) branch somehow interferes with the other branch being fully optimized away. If v.len() > 0 is (unsafely and unsoundly) assumed, the assembly returns to a simple ret.

memoryruins · 2021-05-24T08:34:05Z

@Diomendius it appears you were testing with a dated rustc 1.38 (2019-09-26). Try experimenting with nightly.

Diomendius · 2021-05-25T00:39:54Z

@memoryruins I was using 1.38 as @troiganto originally reported that this version was the first to break the unreachable_unchecked() workaround and I wanted to compare like with like. The current nightly produces different assembly as @troiganto also reported above, but the assembly still amounts to a no-op comparison of the Vec's length, although the branch instruction is no longer present.

You're right that testing old versions is not generally helpful, but in this case both versions seem to exhibit the same bug, and the same "fix" (as brutal as it is) works for both.

LingMan · 2023-12-02T14:29:45Z

Starting with Rust 1.75 (currently in beta) all three examples given in the opening post properly optimize to a noop.

The two examples given by @Diomendius in the second post still don't optimize out the unwrap though.

LingMan · 2023-12-02T14:44:53Z

Although just adding too many pop/push pairs breaks it again :/

pub fn noop(v: &mut Vec<i32>) {
    if let Some(last) = v.pop() {
        v.push(last);
        v.pop();
        v.push(last);
        v.pop();
        v.push(last);
        v.pop();
        v.push(last);
        v.pop();
        v.push(last);
        v.pop();
        v.push(last);
        v.pop();
        v.push(last);
        v.pop();
        v.push(last);
        v.pop();
        v.push(last);
        /*v.pop();
        v.push(last);
        v.pop();
        v.push(last);
        v.pop();
        v.push(last);
        v.pop();
        v.push(last);
        v.pop();
        v.push(last);*/
    }
}

That still compiles to a noop, but uncomment one more pair and you get:

example::noop:
        mov     rax, qword ptr [rdi + 16]
        test    rax, rax
        je      .LBB0_2
        mov     qword ptr [rdi + 16], rax
.LBB0_2:
        ret

Uncomment another one and you get the full Vec resizing logic.

jonas-schievink added I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 18, 2019

jonas-schievink added A-collections Area: `std::collections` C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Aug 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Missed optimizations for Vec::pop() followed by Vec::push() #61939

Missed optimizations for Vec::pop() followed by Vec::push() #61939

broken-pen commented Jun 18, 2019

Diomendius commented Aug 7, 2019

Uh oh!

broken-pen commented Dec 22, 2020

Uh oh!

broken-pen commented May 16, 2021 •

edited

Loading

Uh oh!

Diomendius commented May 24, 2021

Uh oh!

memoryruins commented May 24, 2021

Uh oh!

Diomendius commented May 25, 2021

Uh oh!

LingMan commented Dec 2, 2023

Uh oh!

LingMan commented Dec 2, 2023

Uh oh!

Missed optimizations for Vec::pop() followed by Vec::push() #61939

Missed optimizations for Vec::pop() followed by Vec::push() #61939

Comments

broken-pen commented Jun 18, 2019

Diomendius commented Aug 7, 2019

Uh oh!

broken-pen commented Dec 22, 2020

Uh oh!

broken-pen commented May 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Diomendius commented May 24, 2021

Uh oh!

memoryruins commented May 24, 2021

Uh oh!

Diomendius commented May 25, 2021

Uh oh!

LingMan commented Dec 2, 2023

Uh oh!

LingMan commented Dec 2, 2023

Uh oh!

broken-pen commented May 16, 2021 •

edited

Loading