-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Attempt to use the high part of the size_hint
in collect
(again)
#137908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,21 +22,39 @@ where | |
// empty, but the loop in extend_desugared() is not going to see the | ||
// vector being full in the few subsequent loop iterations. | ||
// So we get better branch prediction. | ||
let mut vector = match iterator.next() { | ||
None => return Vec::new(), | ||
Some(element) => { | ||
let (lower, _) = iterator.size_hint(); | ||
let initial_capacity = | ||
cmp::max(RawVec::<T>::MIN_NON_ZERO_CAP, lower.saturating_add(1)); | ||
let mut vector = Vec::with_capacity(initial_capacity); | ||
unsafe { | ||
// SAFETY: We requested capacity at least 1 | ||
ptr::write(vector.as_mut_ptr(), element); | ||
vector.set_len(1); | ||
} | ||
vector | ||
} | ||
let (low, high) = iterator.size_hint(); | ||
let Some(first) = iterator.next() else { | ||
return Vec::new(); | ||
}; | ||
// `push`'s growth strategy is (currently) to double the capacity if | ||
// there's no space available, so it can have up to 50% "wasted" space. | ||
// Thus if the upper-bound on the size_hint also wouldn't waste more | ||
// than that, just allocate it from the start. (After all, it's silly | ||
// to allocate 254 for a hint of `(254, Some(255)`.) | ||
let initial_capacity = { | ||
// This is written like this to not overflow on any well-behaved iterator, | ||
// even things like `repeat_n(val, isize::MAX as usize + 10)` | ||
// where `low * 2` would need checking. | ||
// A bad (but safe) iterator might have `low > high`, but if so it'll | ||
// produce a huge `extra` that'll probably fail the following check. | ||
let hint = if let Some(high) = high | ||
&& let extra = high - low | ||
&& extra < low | ||
{ | ||
high | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we cap this at There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that it would only ever matter for things that produce exactly Said otherwise, this only uses the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I get that, but I still wonder if such "exactly I feel like the doubling logic should cap itself too, but that's a separate conversation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What if we only try the high capacity first when it's in (low, low*2], or else fall back to just low. |
||
} else { | ||
low | ||
}; | ||
cmp::max(RawVec::<T>::MIN_NON_ZERO_CAP, hint) | ||
}; | ||
let mut vector = Vec::with_capacity(initial_capacity); | ||
// SAFETY: We requested capacity at least MIN_NON_ZERO_CAP, which | ||
// is never zero, so there's space for at least one element. | ||
unsafe { | ||
ptr::write(vector.as_mut_ptr(), first); | ||
vector.set_len(1); | ||
} | ||
|
||
// must delegate to spec_extend() since extend() itself delegates | ||
// to spec_from for empty Vecs | ||
<Vec<T> as SpecExtend<T, I>>::spec_extend(&mut vector, iterator); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
//@ compile-flags: -Copt-level=3 | ||
#![crate_type = "lib"] | ||
|
||
#[no_mangle] | ||
pub fn should_use_low(a: [i32; 10], b: [i32; 100], p: fn(i32) -> bool) -> Vec<i32> { | ||
// CHECK-LABEL: define void @should_use_low | ||
// CHECK: call{{.+}}dereferenceable_or_null(40){{.+}}@__rust_alloc( | ||
a.iter().copied().chain(b.iter().copied().filter(|x| p(*x))).collect() | ||
} | ||
|
||
#[no_mangle] | ||
pub fn should_use_high(a: [i32; 100], b: [i32; 10], p: fn(i32) -> bool) -> Vec<i32> { | ||
// CHECK-LABEL: define void @should_use_high | ||
// CHECK: call{{.+}}dereferenceable_or_null(440){{.+}}@__rust_alloc( | ||
a.iter().copied().chain(b.iter().copied().filter(|x| p(*x))).collect() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the other comment still applies that some
size_hint
s may be better after the firstnext
-- or do you disagree?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been meaning to come back and give that a shot. While it's certainly true, I'm also skeptical how valuable it is, since the usual case is things like
flat_map
that almost never have a good hint anyway -- and when they do, like flattening an iterator over arrays, it doesn't need the first one. But can try it.(It makes me tempted to have a
next_with_suggested_reserve -> Option<(NonZero<usize>, Item)>
, too, but that's a bigger conversation.)