-
Notifications
You must be signed in to change notification settings - Fork 842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support binary data type in build_struct_array
.
#702
Support binary data type in build_struct_array
.
#702
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #702 +/- ##
=======================================
Coverage 82.45% 82.46%
=======================================
Files 168 168
Lines 47397 47419 +22
=======================================
+ Hits 39083 39103 +20
- Misses 8314 8316 +2 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Out of curiosity, isn't json always valid utf8?
@@ -668,7 +668,7 @@ mod tests { | |||
"entry", | |||
DataType::Struct(vec![ | |||
Field::new("key", DataType::Utf8, false), | |||
Field::new("key", DataType::Int32, true), | |||
Field::new("value", DataType::Int32, true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An oversight on my end, should have been "value"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, learning from previous PR and found this typo ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -668,7 +668,7 @@ mod tests { | |||
"entry", | |||
DataType::Struct(vec![ | |||
Field::new("key", DataType::Utf8, false), | |||
Field::new("key", DataType::Int32, true), | |||
Field::new("value", DataType::Int32, true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An oversight on my end, should have been "value"
rows.iter() | ||
.map(|row| { | ||
let maybe_value = row.get(field.name()); | ||
maybe_value.and_then(|value| value.as_str()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought value.as_str()
would return an error if the binary data can't be converted to UTF8, but then we probably wouldn't be able to read the file in that case.
Just an observation, no action needed
Yeah, I've only ever base64 encoded to a string, so I wonder if there'd be some data loss if we don't do some conversion to a string representation |
I pushed a change to fix a |
Thank you all! Feeling excited to become a new contributor. |
Looks great -- thank you again @zijie0 -- I'll plan to include this as part of arrow 5.3.0 (eta release in about 10 days time) |
* Support binary data type in `build_struct_array`. * Modify test case. * cargo fmt Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
I think writer needs to take care of the encoding and the reader needs to decode accordingly. For example encoding arbitrary binary into base64 is one way to do this. Another way is to encode binary as 2 bytes unicode code point. Whatever the encoding/decoding strategy gets picked, the writer is responsible to encode the binary into valid utf8 string. So it's really just a convention that the writer and reader needs to follow. |
* Support binary data type in `build_struct_array`. (#702) * Support binary data type in `build_struct_array`. * Modify test case. * cargo fmt Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Remove accidentally included test Co-authored-by: Yuan Zhou <myoilbox@gmail.com>
Which issue does this PR close?
Closes #701
Rationale for this change
We encounter this issue in delta-rs project where
partitionValues
in delta spec could contain binary type values.What changes are included in this PR?
Add binary type support and corresponding test case.
Are there any user-facing changes?
No.