-
Notifications
You must be signed in to change notification settings - Fork 875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split out arrow-string (#2594) #3295
Conversation
use std::sync::Arc; | ||
|
||
use regex::Regex; | ||
/// Perform SQL `array ~ regex_array` operation on [`StringArray`] / [`LargeStringArray`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions are moved from comparison.rs
@@ -23,1227 +23,75 @@ | |||
//! [here](https://doc.rust-lang.org/stable/core/arch/) for more information. | |||
//! | |||
|
|||
use crate::array::*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The like and regex kernels are moved into arrow-string. The remaining kernels will be moved into an arrow-ord crate in a follow up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will be left in arrow-compute 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing 🎉
Edit: well nothing once I also split out the arithmetic kernels, the end goal is the top-level arrow is just a re-export of other crates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a fine improvement to me. Thank you @tustvold
I recommend running benchmarks prior to merging this, just to make sure there aren't any cross crate inlining issues
@@ -245,11 +245,10 @@ mod tests { | |||
macro_rules! length_binary_helper { | |||
($offset_ty: ty, $result_ty: ty, $kernel: ident, $value: expr, $expected: expr) => {{ | |||
let array = GenericBinaryArray::<$offset_ty>::from($value); | |||
let result = $kernel(&array)?; | |||
let result = $kernel(&array).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a drive by cleanup to use unwrap
rather than Error
in the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it means you get an actual backtrace as opposed to some random error from somewhere 😆
@@ -23,1227 +23,75 @@ | |||
//! [here](https://doc.rust-lang.org/stable/core/arch/) for more information. | |||
//! | |||
|
|||
use crate::array::*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will be left in arrow-compute 🤔
@@ -258,6 +258,7 @@ Rust Arrow Crates: | |||
(cd arrow-array && cargo publish) | |||
(cd arrow-select && cargo publish) | |||
(cd arrow-cast && cargo publish) | |||
(cd arrow-string && cargo publish) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for this
So no changes outside the noise threshold |
Which issue does this PR close?
Part of #2594
Rationale for this change
Splits out string kernels into a crate called
arrow-string
. Whilst these are primarily concerned with UTF8 strings, some also handle binary arrays. I therefore went witharrow-string
instead ofarrow-str
asstr
is very specifically just UTF-8 in Rust, whereas the general concept of a string extends to binary strings. Or something like that...What changes are included in this PR?
Are there any user-facing changes?