Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement filter kernel specially for FixedSizeByteArray #6153

Closed
alamb opened this issue Jul 29, 2024 · 3 comments · Fixed by #6178
Closed

Implement filter kernel specially for FixedSizeByteArray #6153

alamb opened this issue Jul 29, 2024 · 3 comments · Fixed by #6178
Assignees
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog

Comments

@alamb
Copy link
Contributor

alamb commented Jul 29, 2024

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
We are trying to store UUIDs in arrow Arrays and one obvious thing to do is use FixedSizeByteArray

However @samuelcolvin did some experiments (see apache/datafusion#11170) and found that non obviously using Decimal128 was actually faster than FixedSizeByteArray

One reason for this may be that FixedSizeByteArray does not have special case handling in the filter kernel

match predicate.strategy {
IterationStrategy::None => Ok(new_empty_array(values.data_type())),
IterationStrategy::All => Ok(values.slice(0, predicate.count)),
// actually filter
_ => downcast_primitive_array! {
values => Ok(Arc::new(filter_primitive(values, predicate))),
DataType::Boolean => {
let values = values.as_any().downcast_ref::<BooleanArray>().unwrap();
Ok(Arc::new(filter_boolean(values, predicate)))
}
DataType::Utf8 => {
Ok(Arc::new(filter_bytes(values.as_string::<i32>(), predicate)))
}
DataType::LargeUtf8 => {
Ok(Arc::new(filter_bytes(values.as_string::<i64>(), predicate)))
}
DataType::Utf8View => {
Ok(Arc::new(filter_byte_view(values.as_string_view(), predicate)))
}
DataType::Binary => {
Ok(Arc::new(filter_bytes(values.as_binary::<i32>(), predicate)))
}
DataType::LargeBinary => {
Ok(Arc::new(filter_bytes(values.as_binary::<i64>(), predicate)))
}
DataType::BinaryView => {
Ok(Arc::new(filter_byte_view(values.as_binary_view(), predicate)))
}
DataType::RunEndEncoded(_, _) => {
downcast_run_array!{
values => Ok(Arc::new(filter_run_end_array(values, predicate)?)),
t => unimplemented!("Filter not supported for RunEndEncoded type {:?}", t)
}
}
DataType::Dictionary(_, _) => downcast_dictionary_array! {
values => Ok(Arc::new(filter_dict(values, predicate))),
t => unimplemented!("Filter not supported for dictionary type {:?}", t)
}
_ => {
let data = values.to_data();
// fallback to using MutableArrayData
let mut mutable = MutableArrayData::new(
vec![&data],
false,
predicate.count,
);
match &predicate.strategy {
IterationStrategy::Slices(slices) => {
slices
.iter()
.for_each(|(start, end)| mutable.extend(0, *start, *end));
}
_ => {
let iter = SlicesIterator::new(&predicate.filter);
iter.for_each(|(start, end)| mutable.extend(0, start, end));
}
}
let data = mutable.freeze();
Ok(make_array(data))
}
},
}

Describe the solution you'd like

  1. Add special case code for FixedSizeByteArray
  2. Add benchmark showing it is faster
  3. Add unit tests for functional test coverage

Describe alternatives you've considered

Additional context
This was pointed out by @samuelcolvin on apache/datafusion#11170 (comment)

@alamb alamb added the enhancement Any new improvement worthy of a entry in the changelog label Jul 29, 2024
@chloro-pn
Copy link
Contributor

This requirement doesn't seem difficult to implement. If no one is available, please assign it to me. : )

@alamb
Copy link
Contributor Author

alamb commented Jul 31, 2024

Thanks @chloro-pn - that would be super helpful 🙏

@alamb
Copy link
Contributor Author

alamb commented Aug 31, 2024

label_issue.py automatically added labels {'arrow'} from #6186

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants