Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize FromIterator for boxed slice with TrustedLen iterators #99376

Closed

Conversation

GoldsteinE
Copy link
Contributor

(Probably) closes #75636

Assembly comparison after this PR (collected with cargo asm):

Vec<T>

This code:

pub fn create_vec<'a>(iter: &'a [&str]) -> Vec<&'a str> {
    iter.iter().cloned().collect()
}

Generates this assembly:

testcrate::create_vec:
 push    r15
 .cfi_def_cfa_offset 16
 push    r14
 .cfi_def_cfa_offset 24
 push    r13
 .cfi_def_cfa_offset 32
 push    r12
 .cfi_def_cfa_offset 40
 push    rbx
 .cfi_def_cfa_offset 48
 .cfi_offset rbx, -48
 .cfi_offset r12, -40
 .cfi_offset r13, -32
 .cfi_offset r14, -24
 .cfi_offset r15, -16
 mov     r12, rdx
 mov     rbx, rsi
 mov     r14, rdi
 mov     r15, rdx
 shl     r15, 4
 test    rdx, rdx
 je      .LBB0_1
 xor     r13d, r13d
 mov     rax, r12
 shr     rax, 59
 sete    al
 jne     .LBB0_9
 mov     r13b, al
 shl     r13, 3
 mov     rdi, r15
 mov     rsi, r13
 call    qword, ptr, [rip, +, __rust_alloc@GOTPCREL]
 test    rax, rax
 jne     .LBB0_4
 mov     rdi, r15
 mov     rsi, r13
 call    qword, ptr, [rip, +, _ZN5alloc5alloc18handle_alloc_error17he849d19d29b983a0E@GOTPCREL]
 ud2
.LBB0_1:
 mov     eax, 8
.LBB0_4:
 add     r15, rbx
 mov     qword, ptr, [r14], rax
 mov     qword, ptr, [r14, +, 8], r12
 cmp     r15, rbx
 je      .LBB0_5
 xor     edx, edx
 xor     ecx, ecx
 .p2align4, 0x90
.LBB0_7:
 movups  xmm0, xmmword, ptr, [rbx, +, rdx]
 movups  xmmword, ptr, [rax, +, rdx], xmm0
 add     rcx, 1
 lea     rsi, [rbx, +, rdx]
 add     rsi, 16
 add     rdx, 16
 cmp     rsi, r15
 jne     .LBB0_7
 jmp     .LBB0_8
.LBB0_5:
 xor     ecx, ecx
.LBB0_8:
 mov     qword, ptr, [r14, +, 16], rcx
 mov     rax, r14
 pop     rbx
 .cfi_def_cfa_offset 40
 pop     r12
 .cfi_def_cfa_offset 32
 pop     r13
 .cfi_def_cfa_offset 24
 pop     r14
 .cfi_def_cfa_offset 16
 pop     r15
 .cfi_def_cfa_offset 8
 ret
.LBB0_9:
 .cfi_def_cfa_offset 48
 call    qword, ptr, [rip, +, _ZN5alloc7raw_vec17capacity_overflow17h752bfcb61e0e0e00E@GOTPCREL]
 ud2
 .size_ZN9testcrate10create_vec17hcfeb54b47e1c4439E, .Lfunc_end0-_ZN9testcrate10create_vec17hcfeb54b47e1c4439E
Box<[T]>

This code:

pub fn create_boxed_slice<'a>(iter: &'a [&str]) -> Box<[&'a str]> {
    iter.iter().cloned().collect()
}

Generates this assembly:

testcrate::create_boxed_slice:
 push    r15
 .cfi_def_cfa_offset 16
 push    r14
 .cfi_def_cfa_offset 24
 push    rbx
 .cfi_def_cfa_offset 32
 .cfi_offset rbx, -32
 .cfi_offset r14, -24
 .cfi_offset r15, -16
 mov     rbx, rdi
 mov     r14, rsi
 shl     r14, 4
 test    rsi, rsi
 je      .LBB1_1
 xor     r15d, r15d
 shr     rsi, 59
 sete    al
 jne     .LBB1_9
 mov     r15b, al
 shl     r15, 3
 mov     rdi, r14
 mov     rsi, r15
 call    qword, ptr, [rip, +, __rust_alloc@GOTPCREL]
 test    rax, rax
 je      .LBB1_10
 add     r14, rbx
 cmp     r14, rbx
 je      .LBB1_5
.LBB1_6:
 xor     ecx, ecx
 xor     edx, edx
 .p2align4, 0x90
.LBB1_7:
 movups  xmm0, xmmword, ptr, [rbx, +, rcx]
 movups  xmmword, ptr, [rax, +, rcx], xmm0
 add     rdx, 1
 lea     rsi, [rbx, +, rcx]
 add     rsi, 16
 add     rcx, 16
 cmp     rsi, r14
 jne     .LBB1_7
 pop     rbx
 .cfi_def_cfa_offset 24
 pop     r14
 .cfi_def_cfa_offset 16
 pop     r15
 .cfi_def_cfa_offset 8
 ret
.LBB1_1:
 .cfi_def_cfa_offset 32
 mov     eax, 8
 add     r14, rbx
 cmp     r14, rbx
 jne     .LBB1_6
.LBB1_5:
 xor     edx, edx
 pop     rbx
 .cfi_def_cfa_offset 24
 pop     r14
 .cfi_def_cfa_offset 16
 pop     r15
 .cfi_def_cfa_offset 8
 ret
.LBB1_9:
 .cfi_def_cfa_offset 32
 call    qword, ptr, [rip, +, _ZN5alloc7raw_vec17capacity_overflow17h752bfcb61e0e0e00E@GOTPCREL]
 ud2
.LBB1_10:
 mov     rdi, r14
 mov     rsi, r15
 call    qword, ptr, [rip, +, _ZN5alloc5alloc18handle_alloc_error17he849d19d29b983a0E@GOTPCREL]
 ud2
 .size_ZN9testcrate18create_boxed_slice17hbd73ec56346d0792E, .Lfunc_end1-_ZN9testcrate18create_boxed_slice17hbd73ec56346d0792E

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Jul 17, 2022
@rustbot
Copy link
Collaborator

rustbot commented Jul 17, 2022

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @m-ou-se (or someone else) soon.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 17, 2022
@pickfire
Copy link
Contributor

pickfire commented Jul 17, 2022

I saw it generated this assembly which it wasn't there previously. Is it expected to have this new error path? Same old code but with latest stable compiler https://godbolt.org/z/Ed8fzrKcT

 call    qword, ptr, [rip, +, _ZN5alloc7raw_vec17capacity_overflow17h752bfcb61e0e0e00E@GOTPCREL]

Not quite sure if it is due to the len capacity debug assert.

@GoldsteinE
Copy link
Contributor Author

GoldsteinE commented Jul 17, 2022

@pickfire It’s not caused by this PR: nightly on Godbolt calls alloc::raw_vec::capacity_overflow both for Vec<T> and Box<[T]>.

https://godbolt.org/z/r4hn66dsM

@GoldsteinE
Copy link
Contributor Author

GoldsteinE commented Jul 17, 2022

Wait a sec. I wonder how it passed tests on my side.

Oh, just missing a couple #[cfg(_)]s.

library/alloc/src/vec/mod.rs Outdated Show resolved Hide resolved
library/alloc/src/boxed.rs Outdated Show resolved Hide resolved
@rust-log-analyzer

This comment has been minimized.

@pickfire
Copy link
Contributor

pickfire commented Jul 18, 2022

@pickfire It’s not caused by this PR: nightly on Godbolt calls alloc::raw_vec::capacity_overflow both for Vec and Box<[T]>.

Is that a regression because both beta and stable (1.62.0) is still fine?

@GoldsteinE
Copy link
Contributor Author

@pickfire This changed from beta to nightly, but I don’t think it’s a significant regression: the difference is like 10 instructions with 1 branch.

@GoldsteinE
Copy link
Contributor Author

I think ad3a791 introduced this branch and it seems deliberate.

@GoldsteinE
Copy link
Contributor Author

@the8472 Are your concerns lifted?

@the8472
Copy link
Member

the8472 commented Jul 31, 2022

Yes, #99790 should solve it.

@pickfire
Copy link
Contributor

pickfire commented Aug 1, 2022

@the8472 Is it necessary to have the extra capacity overflow panic check? Can we somehow remove it?

@JohnCSimon
Copy link
Member

triage:

@the8472 Is it necessary to have the extra capacity overflow panic check? Can we somehow remove it?

@GoldsteinE - can you answer this?

@GoldsteinE
Copy link
Contributor Author

I’m not sure. I’d rather refrain from reverting someone else’s changes without fully understanding why are they made. Anyway, the branch to the panic is needed somewhere, so I don’t think it should have a noticeable performance impact.

@the8472
Copy link
Member

the8472 commented Oct 9, 2022

@the8472 Is it necessary to have the extra capacity overflow panic check? Can we somehow remove it?

You mean because the FromIterator specialization calls Extend which panics again? Yes, that code could be extracted into an unsafe function that requires the promise that sufficient capacity is available and then does the iteration without further length checks and then it could be called directly by the FromIterator and Extend impls so that FromIterator doesn't have to go through Extend anymore.

I think we had that at some point, maybe it was removed due to different performance issues? Or maybe that was in a PR that didn't get merged.

@pickfire
Copy link
Contributor

pickfire commented Oct 9, 2022

I think it should be fine, the question I asked regarding the panic wasn't introduced in this PR, was introduced in another PR.

@GoldsteinE
Copy link
Contributor Author

@m-ou-se Sorry to ping you, but what’s the state of this?

@the8472 the8472 assigned the8472 and unassigned m-ou-se Dec 31, 2022
Copy link
Member

@the8472 the8472 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a correctness perspective this should be fine once the with_capacity guarantees have been updated. But it relies a lot on the high-level API (FromIterator) doing exactly the right thing. I don't see us changing this anytime soon but I would feel better if we had an lower-level building block we could call instead which has exactly the right properties.

The Extend version of the TrustedLen specialization has already been extracted into a separate method:

// specific extend for `TrustedLen` iterators, called both by the specializations
// and internal places where resolving specialization makes compilation slower
#[cfg(not(no_global_oom_handling))]
fn extend_trusted(&mut self, iterator: impl iter::TrustedLen<Item = T>) {

The same could be done with the FromIterator specialization:

impl<T, I> SpecFromIterNested<T, I> for Vec<T>
where
I: TrustedLen<Item = T>,
{
fn from_iter(iterator: I) -> Self {

Then we would have a method that the Box specialization can call directly and put the guarantees on its documentation.

library/alloc/src/vec/mod.rs Outdated Show resolved Hide resolved
library/alloc/src/vec/mod.rs Outdated Show resolved Hide resolved
@GoldsteinE
Copy link
Contributor Author

Thanks for the review, I’ll look into possibility of using different APIs soon.

@the8472
Copy link
Member

the8472 commented Jan 14, 2023

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 14, 2023
@GoldsteinE
Copy link
Contributor Author

FromIterator specialization for Vec<_> does the wrong thing, since it uses .extend_trusted(), which uses .reserve() instead of .reserve_exact(). I think I’ll just add additional method to Vec<_>.

@GoldsteinE
Copy link
Contributor Author

On the other hand, I could just make .extend_trusted() use .reserve_exact() or add a compile-time boolean flag that switches this behaviour.

@GoldsteinE
Copy link
Contributor Author

I realized that just not using Vec makes implementation a lot simpler.

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 14, 2023
@GoldsteinE
Copy link
Contributor Author

Oh well, ok, no it’s not, we lose another optimization that way.

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 14, 2023
@rust-log-analyzer
Copy link
Collaborator

The job mingw-check failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
configure: rust.debug-assertions := True
configure: rust.overflow-checks := True
configure: llvm.assertions      := True
configure: dist.missing-tools   := True
configure: build.configure-args := ['--enable-sccache', '--disable-manage-submodu ...
configure: writing `config.toml` in current directory
configure: 
configure: run `python /checkout/x.py --help`
Attempting with retry: make prepare
---
........................................................................................ 616/651
...................................
failures:

error: test failed, to rerun pass `-p alloc --test collectionstests`
---- slice::test_in_place_iterator_specialization stdout ----
thread 'slice::test_in_place_iterator_specialization' panicked at 'assertion failed: `(left == right)`
  left: `0x7f12040014c0`,
 right: `0x7f1204001940`', library/alloc/tests/slice.rs:1407:5

failures:
    slice::test_in_place_iterator_specialization

@Dylan-DPC
Copy link
Member

@GoldsteinE any updates on this?

@GoldsteinE
Copy link
Contributor Author

@Dylan-DPC I'm kinda stuck. I don't see a way to do it without losing any optimizations or relying on behavior of Vec's FromIterator.

@the8472
Copy link
Member

the8472 commented Jan 22, 2023

I do not understand the issue from your previous comments. My last suggested action was to extract the TrustedLen specialization for FromIterator into a method and then have the vec and box specializations call that method.
Which part of that is causing trouble?

@GoldsteinE
Copy link
Contributor Author

@the8472 The problem is that Vec’s FromIterator specialization branches twice:

  1. If the iterator is something like vec::IntoIter<_>, then .collect() is free and just returns the original Vec<_>.
  2. Otherwise, if it’s TrustedLen, then we allocate the exact storage for it.
    So in this code:
let mut vec = Vec::with_capacity(1000);
vec.push(1);
let x: Box<[_]> = vec.into_iter().collect();

which specialization should be used? We currently have test that we use the first one if possible, but it’s not exact size, so we’ll need to shrink it, potentially reallocating.

@the8472
Copy link
Member

the8472 commented Mar 14, 2023

Well, we can have multiple specializations for box, just as we do for Vec.
One for Vec::IntoIter which can check if the allocation fits perfectly and can be recycled and otherwise call a shared I: TrustedLen method then a more general one for I: TrustedLen that also calls that method.

Having a branch that may or may not reallocate on the vec -> box path seems fine.

@bors
Copy link
Contributor

bors commented Apr 14, 2023

☔ The latest upstream changes (presumably #110331) made this pull request unmergeable. Please resolve the merge conflicts.

@JohnCSimon
Copy link
Member

@GoldsteinE
Ping from triage: I'm closing this due to inactivity, Please reopen when you are ready to continue with this.
Note: if you are going to continue please open the PR BEFORE you push to it, else you won't be able to reopen - this is a quirk of github.
Thanks for your contribution.

@rustbot label: +S-inactive

@JohnCSimon JohnCSimon closed this Jun 17, 2023
@rustbot rustbot added the S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. label Jun 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Better codegen for FromIterator Box<[T]>