Skip to content

Missed optimizations for Vec::pop() followed by Vec::push() #61939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
broken-pen opened this issue Jun 18, 2019 · 8 comments
Open

Missed optimizations for Vec::pop() followed by Vec::push() #61939

broken-pen opened this issue Jun 18, 2019 · 8 comments
Labels
A-collections Area: `std::collections` C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@broken-pen
Copy link

The following example should be get optimized to a no-op by the compiler:

pub fn noop(v: &mut Vec<i32>) {
    if let Some(last) = v.pop() {
        v.push(last)
    }
}

Instead, it produces about 60 lines of assembly, which also contain calls to __rust_alloc() and alloc::raw_vec::capacity_overflow().
The following code is better:

pub fn noop(v: &mut Vec<i32>) {
    if let Some(last) = v.pop() {
        assert!(v.len() < v.capacity());
        v.push(last)
    }
}

producing:

example::noop:
        push    rax
        mov     rax, qword ptr [rdi + 16]
        test    rax, rax
        je      .LBB7_3
        lea     rcx, [rax - 1]
        mov     qword ptr [rdi + 16], rcx
        cmp     rcx, qword ptr [rdi + 8]
        jae     .LBB7_4
        mov     qword ptr [rdi + 16], rax
.LBB7_3:
        pop     rax
        ret
.LBB7_4:
        call    std::panicking::begin_panic
        ud2

And even better:

pub fn noop(v: &mut Vec<i32>) {
    if let Some(last) = v.pop() {
        if v.capacity() <= v.len() {
            unsafe { std::hint::unreachable_unchecked() }
        }
        v.push(last)
    }
}

producing:

example::noop:
        ret

Compiler explorer

I feel like the compiler should be able to handle this situation better, considering that all the information is readily available.

@jonas-schievink jonas-schievink added I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 18, 2019
@Diomendius
Copy link

Similarly, both of these functions fail to optimize out the unwrap()s.

pub fn push_pop(v: &mut Vec<i32>) -> i32 {
    v.push(1);
    v.pop().unwrap()
}
pub fn push_and_get_ref(v: &mut Vec<i32>) -> &i32 {
    v.push(1);
    v.last().unwrap()
}

Unlike the example in the OP, this can technically cause v.len() to overflow, but I'm not sure it's even possible for a system to exist where an allocation of > usize::max() bytes is possible, let alone possible in practice, so it should be safe to assume that any operation that extends a container always ensures len() > 0.

The modifications to push() in #44452 (comment) might let this be optimized (specifically assume(self.len() == len) in the OP's case) although I hope a more elegant solution is possible.

I think in the case of push-after-pop it should be possible for the compiler to optimize out len() < capacity() without any kind of hint, but I'm not sure about len() > 0 as, given Rust's guarantees about overflow, the compiler would have to be able to statically determine that len() can't overflow without triggering an allocation failure (and that allocation failure is a panic/abort/early return).

@jonas-schievink jonas-schievink added A-collections Area: `std::collections` C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Aug 7, 2019
@broken-pen
Copy link
Author

FWIW, the workaround has been pessimized in Rust 1.38. The variant using unreachable_unchecked() now produces this assembly:

example::noop:
        mov     rax, qword ptr [rdi + 16]
        test    rax, rax
        je      .LBB0_2
        mov     qword ptr [rdi + 16], rax
.LBB0_2:
        ret

Compiler explorer

In other words, there is now no possibility at all to get Rustc to optimize this piece of code correctly down to a no-op.

@broken-pen
Copy link
Author

broken-pen commented May 16, 2021

The above behavior is still present in Rust 1.53 beta. The current 1.54.0-nightly (4de7572 2021-05-01) produces this assembly:

example::noop:
        cmp     qword ptr [rdi + 16], 0
        ret

Compiler explorer

I'm not sure what has changed between these two versions.

@Diomendius
Copy link

It looks like the v.len() == 0 (or rather if let None = v.pop()) branch somehow interferes with the other branch being fully optimized away. If v.len() > 0 is (unsafely and unsoundly) assumed, the assembly returns to a simple ret.

@memoryruins
Copy link
Contributor

@Diomendius it appears you were testing with a dated rustc 1.38 (2019-09-26). Try experimenting with nightly.

@Diomendius
Copy link

@memoryruins I was using 1.38 as @troiganto originally reported that this version was the first to break the unreachable_unchecked() workaround and I wanted to compare like with like. The current nightly produces different assembly as @troiganto also reported above, but the assembly still amounts to a no-op comparison of the Vec's length, although the branch instruction is no longer present.

You're right that testing old versions is not generally helpful, but in this case both versions seem to exhibit the same bug, and the same "fix" (as brutal as it is) works for both.

@LingMan
Copy link
Contributor

LingMan commented Dec 2, 2023

Starting with Rust 1.75 (currently in beta) all three examples given in the opening post properly optimize to a noop.

The two examples given by @Diomendius in the second post still don't optimize out the unwrap though.

@LingMan
Copy link
Contributor

LingMan commented Dec 2, 2023

Although just adding too many pop/push pairs breaks it again :/

pub fn noop(v: &mut Vec<i32>) {
    if let Some(last) = v.pop() {
        v.push(last);
        v.pop();
        v.push(last);
        v.pop();
        v.push(last);
        v.pop();
        v.push(last);
        v.pop();
        v.push(last);
        v.pop();
        v.push(last);
        v.pop();
        v.push(last);
        v.pop();
        v.push(last);
        v.pop();
        v.push(last);
        /*v.pop();
        v.push(last);
        v.pop();
        v.push(last);
        v.pop();
        v.push(last);
        v.pop();
        v.push(last);
        v.pop();
        v.push(last);*/
    }
}

That still compiles to a noop, but uncomment one more pair and you get:

example::noop:
        mov     rax, qword ptr [rdi + 16]
        test    rax, rax
        je      .LBB0_2
        mov     qword ptr [rdi + 16], rax
.LBB0_2:
        ret

Uncomment another one and you get the full Vec resizing logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Area: `std::collections` C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants