-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
This is related to
- Consolidate bitwise operation implementations #8806
- Hard to use
PrimitiveArray::unary_mut,PrimitiveArray:try_unary_mut, etc #8808
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
It is common to apply successive filtering operations that progressively narrow a bitmap with rows that pass a predicate -- for example a AND b AND c AND d AND ...
This would be evaluated typically like
t1 = a AND b
t2 = t1 AND c
t3 = t2 AND d
Where t1, t2, and t3 are newly allocated arrays; t3 holds the final output
For a typical array size of 8192 elements, t1, t2 and t3 are eack 1KB allocations
The current boolean kernels in arrow pretty much require allocating a new buffer as they take a reference
Now that @rluvaton has added optimized code to apply mutable bitwise operations in the following PR
I think we are in the position to also provide the same operations on BooleanArray
Describe the solution you'd like
I would like a way to update the contents of a BooleanArray in place (assuming there is no other reference to the backing array)
Describe alternatives you've considered
I propose following the API of PrimitiveArray, such as PrimitiveArray::unary and PrimitiveArray::unary_mut
So that would mean adding new functions to BooleanArray, like
impl BooleanArray {
...
// apply a bitwise operation to this BooleanArray, using u64 operations
// returning a new BooleanArray
fn bitwise_unary(&self, mut op:F) -> Self
where
F: FnMut(u64) -> u64 {.. }
// Try to apply a bitwise operation to this BooleanArray in place, using u64 operations
// If the underlying buffer is shared, returns Err<self>
fn bitwise_unary_mut(self, mut op:F) -> Result<Self, Self>
where
F: FnMut(u64) -> u64 {.. }
// Try to apply a bitwise operation to this BooleanArray in place, using u64 operations
// If the underlying buffer is shared, returns a new
// BooleanArray
fn bitwise_unary_mut_or_clone(self, mut op:F) -> Self
where
F: FnMut(u64) -> u64 {.. }And we would also add similar functions for binary_bitwise functions (that takes a second array argument and right_offset_in_bits)
These APIs are likely fairly straightforward if we implement the plan in
As then the new methods on BooleanArray would just call the relevant underlying buffer methods
Additional context