-
Notifications
You must be signed in to change notification settings - Fork 1k
Fix "Incorrect Behavior of Collecting a filtered iterator to a BooleanArray" #8543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ed len iterators Use BooleanBuilder in FromIterator Add BooleanAdapter Add BooleanArray::from_trusted_len_iter
|
@alamb Could you run the |
mbrobbel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my M1 Pro:
BooleanArray::from_iter
[26.169 µs 26.194 µs 26.222 µs]
BooleanArray::from_trusted_len_iter
[9.6375 µs 9.6519 µs 9.6659 µs]
@mbrobbel Thanks for the review and running the benchmarks! Interessting how the performance differs on an ARM chip. Here are my numbers (Ryzen 3900X). |
|
I think the benchmark results for impl<Ptr: std::borrow::Borrow<Option<bool>>> FromIterator<Ptr> for BooleanArray {
fn from_iter<I: IntoIterator<Item = Ptr>>(iter: I) -> Self {
let iter = iter.into_iter();
let capacity = match iter.size_hint() {
(lower, Some(upper)) if lower == upper => lower,
_ => 0
};
let mut builder = BooleanBuilder::with_capacity(capacity);
builder.extend(iter.map(|item| *item.borrow()));
builder.finish()
}
}This is probably much slower than the current incorrect implementation, but any optimizations to |
39af718 to
fb8626a
Compare
fb8626a to
f7bd3c5
Compare
|
@jhorstmann Thanks for your input. That was a flaw in my benchmarks you're right. I've adapted them to now use a I really like the idea of re-using I've adapted the implementation so you can take a look. |
|
With the new With the old |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤔 The only one that looks suspicious is Int64Array::from_trusted_len_ter getting significantly slower I will rerun to double check |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤔 Int64Array::from_trusted_len_iter 2.78 52.5±0.34µs ? ?/sec 1.00 18.9±0.04µs ? ?/sec |
…ter`" This reverts commit 78700b4.
|
🤖 |
|
🤖: Benchmark completed Details
|
Thanks! I think this is in-line with what I measured locally. |
Yes, I agree with your analysis -- and I think it is ok to take a hit of performance to make it correct 🤣 |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @tobixdev -- this code looks very nice and consistent with PrimitiveArray to me
@jhorstmann and @mbrobbel perhaps you can give this PR a final review before we merge as well to make sure we haven't missed anything
|
Looks good. This clearly fixes the incorrect behavior, and adding |
Yes, I agree that it would be great to get similar / better performance without any heap allocations. Thanks for creating the issue! |
|
I think all comments are resolved and some of them will be tackled in the follow-up PR / issue. Thanks for your input! |
Which issue does this PR close?
Rationale for this change
Fix the bug and align
BooleanArray::from_itertoPrimitiveArray::from_iterIn
BooleanArray::from_iter:Collecting to a
Vecand then usingfrom_trusted_len_iterwas almost double as fast as usingBooleanBufferBuilderon my machine.What changes are included in this PR?
BooleanArray::from_iterto fix the wrong behaviorBooleanArray::from_trusted_len_iterfor a more performant version (The old version ofBooleanArray::from_iter, just with unsafe flavor ofbit_util::set_bit_raw)BooleanAdapter, inspired byNativeAdapterfrom thePrimitiveArray. This allows also doingBooleanArray::from_iter([true, false].into_iter()).Are these changes tested?
BooleanArray::from_trusted_len_iterdirectly (oldBooleanArray::from_iteralso cover it indirectly)[false, true, ...](noOption)Are there any user-facing changes?
BooleanArray::from_iterhas a "slight" performance regression that users could observe.BooleanArrayBooleanArray::from_trusted_len_iter