-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use partition for bool sort #448
Conversation
I will try and review this tomorrow |
403fe20
to
b3e3dcd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does indeed appear that sort_by
allocates auxiliary memory according to the documentation: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.sort_by
I tested the performance of this change using the benchmark in #457
I ran the following tests:
(arrow_dev) alamb@ip-10-0-0-124:~/Software/arrow-rs$ cargo bench -p arrow --bench sort_kernel -- 'bool sort' --save-baseline master
(arrow_dev) alamb@ip-10-0-0-124:~/Software/arrow-rs$ cargo bench -p arrow --bench sort_kernel -- 'bool sort' --save-baseline bool-partition
Critcmp implies there isn't much difference with this implementation vs what is on master
critcmp master bool-partition
group bool-partition master
----- -------------- ------
bool sort 2^12 1.02 422.0±16.63µs ? ?/sec 1.00 414.9±15.86µs ? ?/sec
bool sort nulls 2^12 1.01 416.4±16.02µs ? ?/sec 1.00 410.6±9.14µs ? ?/sec
alamb@ip-10-0-0-124:~/Software/arrow-rs$
So looks good to me 👍
i guess 1024 is too short an array so it's dominated by the memory allocation rather than sorting |
I also was under the impression that the partition sort was aiming to improve memory usage rather than speed |
Both but the memory consumption is not so obvious unless partition in place is stabilized which is currently in nightly |
Ok, I'll increase the size of the sort benchmark and see if I can see any difference |
I finally got a chance to re-run the tests Using 2^14 = 16384
So looks like a nice change to me Here is the test change in case anyone else wants to try: diff --git a/arrow/benches/sort_kernel.rs b/arrow/benches/sort_kernel.rs
index f9f5f24c1..f517afede 100644
--- a/arrow/benches/sort_kernel.rs
+++ b/arrow/benches/sort_kernel.rs
@@ -80,9 +80,9 @@ fn add_benchmark(c: &mut Criterion) {
b.iter(|| bench_sort(&arr_a, &arr_b, None))
});
- let arr_a = create_bool_array(2u64.pow(12) as usize, false);
- let arr_b = create_bool_array(2u64.pow(12) as usize, false);
- c.bench_function("bool sort 2^12", |b| {
+ let arr_a = create_bool_array(2u64.pow(14) as usize, false);
+ let arr_b = create_bool_array(2u64.pow(14) as usize, false);
+ c.bench_function("bool sort 2^14", |b| {
b.iter(|| bench_sort(&arr_a, &arr_b, None))
}); Sorry for the delay @jimexist -- it has been a busy few days for me |
* optimize boolean sort using parition * add docs
* optimize boolean sort using parition * add docs
Which issue does this PR close?
Closes #447
Rationale for this change
when no
limit
is present, the best way to sort boolean array is to partition in place, the second best way is to partition.we are currently using stable sort which will allocate O(n/2) additional memory so this new change might be faster because even if partition_in_place is not used (due to nightly features), the memory allocated is also linear, resulting similar memory overhead, and the sorting happens quicker;
however if the original sort had been unstable (quicksort, in place), this change is not so obvious to be faster, because for quick sort the first pivot sweep would most definitely have the booleans separated.
What changes are included in this PR?
Are there any user-facing changes?