-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add was_valid parameter to NullState callbacks #11592
Conversation
/// Check if the accumulated value for the group at the given `index` is valid, | ||
/// meaning that there was at least one value passing the filter for this group. | ||
pub fn is_valid(&self, index: usize) -> bool { | ||
self.seen_values.get_bit(index) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I can't use it in accumulate
due to mutable vs immutable borrow. This API is pretty difficult to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the signature of accumulate
:
pub fn accumulate<T, F>(
&mut self,
group_indices: &[usize],
values: &PrimitiveArray<T>,
opt_filter: Option<&BooleanArray>,
total_num_groups: usize,
mut value_fn: F,
) where
T: ArrowPrimitiveType + Send,
F: FnMut(usize, T::Native) + Send
But I can't use is_null
inside value_fn
because the NullState
is already borrowed as mutable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally BooleanBufferBuilder::set_bit
should return the previous value and we can pass it along to the callback. That's in arrow I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still think we should add this API @joroKr21 ? or perhaps we should mark the PR as draft while we work on other options?
Yeah good point, I converted it to a draft. I'm not sure what to do. Adding a boolean flag to the callback is not great either... |
This reverts commit a44dd81.
@alamb I implemented the version with an additional callback parameter, LMK what you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good to me @joroKr21
I think this PR needs:
- Update the docs
- A functional test
- Run the clickbench performance benchmarks to ensure we don't see a regression
I can help with the benchmarks if necessary
cc @Dandandan
|group_index, new_value| { | ||
let value = &mut self.values[group_index]; | ||
(self.prim_fn)(value, new_value); | ||
|group_index, was_valid, new_value| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add a test that covered this, if possible
@@ -132,7 +132,7 @@ impl NullState { | |||
mut value_fn: F, | |||
) where | |||
T: ArrowPrimitiveType + Send, | |||
F: FnMut(usize, T::Native) + Send, | |||
F: FnMut(usize, bool, T::Native) + Send, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please update the documentation to reflect this new argument and explains what it means
13601c8
to
477eada
Compare
/benchmark |
|group_index, new_value| { | ||
let value = &mut self.values[group_index]; | ||
(self.prim_fn)(value, new_value); | ||
|group_index, was_valid, new_value| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the existing implementation have to change, or do they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps not, although conceptually an accumulator is just a map-reduce-map via some monoid or semigroup. The current implementation supports only monoids (need an empty value) but it doesn't support semigroups (no empty value). We could use this for something like FirstValue
or AnyValue
but I guess we could also implement it specifically for that use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could even drop the initial value parameter since it doesn't matter in this case.
@alamb the benchmark didn't trigger on comment |
Sure, it might be some time until I do that though |
I don't like this API at all |
Which issue does this PR close?
Closes #11591.
Rationale for this change
Provide more flexibility for implementing
GroupsAccumulator
which often make use ofNullState
.What changes are included in this PR?
Add
was_valid
parameter toNullState
callbacks. In this way implementations can handle nulls differently.Are these changes tested?
Yes, extended existing tests.
Are there any user-facing changes?
Yes, changes the signatures of
NullState::accumulate
andNullState::accumulate_boolean
.