-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fuse grouped aggregate and filter operators for improved performance #5944
Comments
@Dandandan @alamb wdyt? |
One other source of inefficiency (and should be relatively easy to change) is that currently we output the entire See: |
In q1 this would remove |
I remind seeing some issue/papers about a similar approach before to this, maybe those were shared by @alamb ? |
I think this basic idea is called a "selection vector" in the literature -- and as you hint at, it is not quite the same as the null mask as it has different semantics. One approach might be to add another enum type to After @tustvold 's recent work in Arrow, I think this would just be a https://docs.rs/arrow/latest/arrow/buffer/struct.BooleanBuffer.html and should be straightforward to use. To really take advantage of a selection vector, however, the underlying compute kernels need to be updated to know how to ignore the selection vectors (and likely only do so when they are sparse)
While not exactly the same, @yjshen 's has been workking to add filtering to the aggregate input here, which is similar: #5868 |
apache/arrow-rs#3620 may be related |
I'm curious about this, in what situation would the nullability or not of a non-selected value matter? It is just going to be discarded regardless? See apache/arrow-rs#3620 |
I did some research / prototyping on this idea. I used this logic to create a selection vector from a predicate bit mask: let num_true = predicate.true_count();
let mut b = Int32Builder::with_capacity(num_true);
for i in 0..predicate.len() {
if predicate.value(i) {
b.append_value(i as i32);
}
}
let offsets = b.finish(); The selection vector can then be used in a naive sum aggregate like this: let mut sum = 0;
for i in 0..selection_vector.len() {
sum += array.value(selection_vector.value(i) as usize);
} If we ignore the cost of creating the selection vector, then this approach is faster than filtering the batch first. However, the cost of creating the selection vector is similar to filtering a batch in some cases, such as when we are aggregating a single column. I'm less excited about using selection vectors for aggregates at this point. |
Is your feature request related to a problem or challenge?
When we perform a grouped aggregate on a filtered input (such as with TPC-H q1), the filter operator performs two main tasks:
I wonder if we would see a significant performance improvement if we could avoid creating the filtered batches in this case.
One idea would be to create the filtered batches by copying the arrays and mutating the validity bitmap to hide the rows that are filtered out. This would potentially change the semantics in some cases though so we can probably only do this under certain conditions.
Another idea is to update the aggregate logic to perform the predicate evaluation and then use the resulting bitmap to determine which rows to accumulate.
Describe the solution you'd like
I am working on a small prototype of this, outside of DataFusion, that I will share once the code is less embarrassing.
Describe alternatives you've considered
It would be worth seeing how other engines handle this.
Additional context
No response
The text was updated successfully, but these errors were encountered: