Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrow compute kernel regards selection vector #4095

Closed
yjshen opened this issue Apr 16, 2023 · 3 comments
Closed

Arrow compute kernel regards selection vector #4095

yjshen opened this issue Apr 16, 2023 · 3 comments
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog

Comments

@yjshen
Copy link
Member

yjshen commented Apr 16, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

It would be great if the arrow compute kernel could regard a selection vector. Then users won't need to create slices over the existing arrays before applying computations with the compute kernel.

Describe the solution you'd like

A set of new compute APIs that takes a selection vector as the second argument.

For example:

pub fn sum<T: ArrowNumericType>(array: &PrimitiveArray<T>, selection_vector: &BooleanArray) -> Option<T::Native>
where
    T::Native: ArrowNativeTypeOp,
{

Describe alternatives you've considered

Additional context

apache/datafusion#6003 as a use case where the selection vector would be beneficial.
apache/datafusion#5944 for a similar proposal.

@yjshen yjshen added the enhancement Any new improvement worthy of a entry in the changelog label Apr 16, 2023
@tustvold
Copy link
Contributor

For primitives at least, I would expect a filter followed by the existing sum kernel to be very competitive, as it has been my experience that LLVM struggles to vectorise operations involving bitmasks.

We should definitely benchmark any new kernels against this "naive" approach

@tustvold
Copy link
Contributor

tustvold commented Jun 14, 2023

I'm going to close this as a duplicate of #3620 to allow this to be discussed in a single location

@alamb alamb added the arrow Changes to the arrow crate label Jun 16, 2023
@alamb
Copy link
Contributor

alamb commented Jun 16, 2023

label_issue.py automatically added labels {'arrow'} from #4393

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

3 participants