-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add stringview
support to encode
and decode
and bit_length
#13332
Conversation
@timsaucer Do you think testing is necessary here? Can add if so. Also, are you able to show how to replicate the optimizer error for |
Wonderful. Thank you. I'll try to review tomorrow. I did dive into the ConcatFunc and I found (and resolved) two issues:
I'll put up the PR today or tomorrow to fix both of these, with unit tests. |
ConcatFunc: #13346 |
Will add tests tomorrow. |
Thank you @jonathanc-n -- I marked this PR as draft as you seem to be planning to add more tests. Please mark it as ready for review with it is ready for another look Thanks a lot for sorting this out |
@timsaucer Will push some tests, however after going through it, I changed this pr to only deal with support stringview for |
I updated array_to_string and string_to_array in a separate PR - it's a bit more extensive than what this PR contains. |
Is this an arrow-rs issue or is this #13363 ? |
Possibly both. I think arrow-cast is missing the _,Utf8View for primitive & temporal casting here: https://github.com/apache/arrow-rs/blob/1d580ec1996e75bb9be4c1c880871d0dd2718ad4/arrow-cast/src/cast/mod.rs#L1465 This PR in arrow-rs I think may fix the issue |
stringview
for functions to prevent regressionstringview
support to encode
stringview
support to encode
stringview
support to encode
and decode
stringview
support to encode
and decode
stringview
support to encode
and decode
and bit_length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jonathanc-n @Omega359 and @timsaucer
I updated the title of this PR to reflect what it currently contains and it now looks good to me
Please let me know if I missed anything
@@ -224,6 +224,7 @@ fn encode_process(value: &ColumnarValue, encoding: Encoding) -> Result<ColumnarV | |||
ColumnarValue::Array(a) => match a.data_type() { | |||
DataType::Utf8 => encoding.encode_utf8_array::<i32>(a.as_ref()), | |||
DataType::LargeUtf8 => encoding.encode_utf8_array::<i64>(a.as_ref()), | |||
DataType::Utf8View => encoding.encode_utf8_array::<i32>(a.as_ref()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this means the output type will be Utf8 but for this type i think that is totally fine (there is no need to preserve the view encoding)
Thanks again @jonathanc-n |
Which issue does this PR close?
Closes #13330 .
Rationale for this change
Add string view to some functions to prevent regression during @timsaucer df python migration.
What changes are included in this PR?
Added support for utf8view to
bit_length
,array_to_string
, andencode
Are these changes tested?