Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving all elements out of a Vec generates worse assembly since 1.45 #78918

Open
silverweed opened this issue Nov 10, 2020 · 5 comments
Open
Labels
P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@silverweed
Copy link

silverweed commented Nov 10, 2020

I was looking for the best way to move all elements out of a Vec and returning them, basically what I'd write in C++ as:

struct A {
    std::vector<int> moveOutOfVec() {
        return std::move(myV); // this empties the vector
    }

    std::vector<int> myV;
};

std::vector<int> test(A& a) {
    return a.moveOutOfVec();
}

This code generates very good assembly with clang (link to Godbolt)

        mov     rax, rdi
        movups  xmm0, xmmword ptr [rsi]
        movups  xmmword ptr [rdi], xmm0
        mov     rcx, qword ptr [rsi + 16]
        mov     qword ptr [rdi + 16], rcx
        xorps   xmm0, xmm0
        movups  xmmword ptr [rsi], xmm0
        mov     qword ptr [rsi + 16], 0
        ret

so I tried to do the equivalent with Rust.

Code

NOTE: I always used rustc -O for all the following tests.

I tried this code:

pub struct A {
    v: Vec<i32>
}

impl A {
    #[inline]
    pub fn move_out_of_vec(&mut self) -> Vec<i32> {
        let mut v = Vec::default();
        std::mem::swap(&mut v, &mut self.v);
        v
    }
}

pub fn test(a: &mut A) -> Vec<i32> {
    a.move_out_of_vec()
}

I expected to see similar assembly as C++.

Instead, this was the output: (link to Godbolt)

example::foo:
        sub     rsp, 24
        mov     rax, rdi
        mov     qword ptr [rdi], 4
        xorps   xmm0, xmm0
        movups  xmmword ptr [rdi + 8], xmm0
        mov     rcx, qword ptr [rdi + 16]
        mov     qword ptr [rsp + 16], rcx
        mov     rcx, qword ptr [rdi]
        mov     qword ptr [rsp], rcx
        mov     rcx, qword ptr [rdi + 8]
        mov     qword ptr [rsp + 8], rcx
        mov     rcx, qword ptr [rsi + 16]
        mov     qword ptr [rdi + 16], rcx
        movups  xmm0, xmmword ptr [rsi]
        movups  xmmword ptr [rdi], xmm0
        mov     rcx, qword ptr [rsp + 16]
        mov     qword ptr [rsi + 16], rcx
        mov     rcx, qword ptr [rsp]
        mov     qword ptr [rsi], rcx
        mov     rcx, qword ptr [rsp + 8]
        mov     qword ptr [rsi + 8], rcx
        add     rsp, 24
        ret

Version it worked on

The same code in 1.44.0 generates much better assembly (only if you mark the method as inline): (link to Godbolt)

example::foo:
        sub     rsp, 24
        mov     rax, rdi
        mov     rcx, qword ptr [rsi]
        movups  xmm0, xmmword ptr [rsi + 8]
        movaps  xmmword ptr [rsp], xmm0
        mov     qword ptr [rsi], 4
        xorps   xmm0, xmm0
        movups  xmmword ptr [rsi + 8], xmm0
        mov     qword ptr [rdi], rcx
        movaps  xmm0, xmmword ptr [rsp]
        movups  xmmword ptr [rdi + 8], xmm0
        add     rsp, 24
        ret

Version with regression

From 1.45.0 until current nightly, the generated code is worse (inline doesn't affect the output).

Notes

I also tried using drain and split_off to achieve the same thing, but they both generate far worse assembly, with jumps and all.

@silverweed
Copy link
Author

@rustbot modify labels: regression-untriaged

@rustbot rustbot added regression-untriaged Untriaged performance or correctness regression. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Nov 10, 2020
@silverweed
Copy link
Author

silverweed commented Nov 10, 2020

Other than Godbolt, I also tested on my machine, running Windows 10. I ran rustc --emit=asm --crate-type=lib -O -C llvm-args=-x86-asm-syntax=intel test.rs and these are the results (trimmed down to the relevant parts):

rustc 1.46.0:

	sub	rsp, 24
	mov	rax, rcx
	mov	qword ptr [rcx], 4
	xorps	xmm0, xmm0
	movups	xmmword ptr [rcx + 8], xmm0
	mov	rcx, qword ptr [rcx + 16]
	mov	qword ptr [rsp + 16], rcx
	mov	rcx, qword ptr [rax]
	mov	qword ptr [rsp], rcx
	mov	rcx, qword ptr [rax + 8]
	mov	qword ptr [rsp + 8], rcx
	mov	rcx, qword ptr [rdx + 16]
	mov	qword ptr [rax + 16], rcx
	movups	xmm0, xmmword ptr [rdx]
	movups	xmmword ptr [rax], xmm0
	mov	rcx, qword ptr [rsp + 16]
	mov	qword ptr [rdx + 16], rcx
	mov	rcx, qword ptr [rsp]
	mov	qword ptr [rdx], rcx
	mov	rcx, qword ptr [rsp + 8]
	mov	qword ptr [rdx + 8], rcx
	add	rsp, 24
	ret

rustc 1.44.0:

	sub	rsp, 24
	mov	rax, rcx
	mov	rcx, qword ptr [rdx]
	movups	xmm0, xmmword ptr [rdx + 8]
	movaps	xmmword ptr [rsp], xmm0
	mov	qword ptr [rdx], 4
	xorps	xmm0, xmm0
	movups	xmmword ptr [rdx + 8], xmm0
	mov	qword ptr [rax], rcx
	movaps	xmm0, xmmword ptr [rsp]
	movups	xmmword ptr [rax + 8], xmm0
	add	rsp, 24
	ret

@SkiFire13
Copy link
Contributor

Curiously std::mem::take(&mut self.v) compiles to the better assembly even on newer rustc versions, even though it is the same as your move_out_of_vec if manually inlined.

@camelid camelid added P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Nov 10, 2020
@camelid
Copy link
Member

camelid commented Nov 10, 2020

Assigning P-medium and removing I-prioritize as discussed in the prioritization working group.

@estebank estebank added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Nov 10, 2020
@eggyal
Copy link
Contributor

eggyal commented Apr 6, 2021

Bisected this regression to nightly-2020-05-22; alas CI artefacts from back then are no longer available (to me?), but looking at the commits that made it into that nightly release (c7813ff to 9310e3b, inclusive) the one that most immediately jumps out as likely responsible is 82911b3, which was the merge of #67759 (Update to LLVM 10).

@camelid camelid added regression-from-stable-to-stable Performance or correctness regression from one stable version to another. and removed regression-untriaged Untriaged performance or correctness regression. labels Oct 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants