-
Notifications
You must be signed in to change notification settings - Fork 875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ipc schema custom_metadata serialization #3282
Conversation
I'm kinda hesitant on the test changes honestly, since having the pyarrow bytes statically hardcoded doesn't seem ideal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this
arrow-ipc/src/convert.rs
Outdated
let ipc2 = crate::root_as_message(&bytes).unwrap(); | ||
let schema2 = ipc2.header_as_schema().unwrap(); | ||
|
||
// can't compare schema directly as though is same message, the underlying bytes seem to differ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is likely because flatbuffers aren't a canonical representation, that is there are multiple legal ways to represent the same data
// the schema is: Field("field1", DataType::UInt32, false) | ||
// Bytes of a schema generated via following python code, using pyarrow 10.0.1: | ||
// | ||
// import pyarrow as pa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -63,7 +63,9 @@ pub fn schema_to_fb_offset<'a>( | |||
|
|||
let mut builder = crate::SchemaBuilder::new(fbb); | |||
builder.add_fields(fb_field_list); | |||
builder.add_custom_metadata(fb_metadata_list); | |||
if !custom_metadata.is_empty() { | |||
builder.add_custom_metadata(fb_metadata_list); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW ideally we would not create the fb_metadata_list
if not needed, as effectively this encodes an empty list, but then doesn't reference it from a flatbuffer Table. In practice this just means the generated flatbuffer has an orphaned 4 bytes of "padding"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll change it to do that
Which issue does this PR close?
Closes #250.
Rationale for this change
Thought meant to address #250, it is outdated in that the error doesn't seem to be due to empty children array anymore, but persists with empty custom_metadata (at schema level, not field level).
Have updated the bytes generated from pyarrow as those were from a very old version, and also made adjustments to how bytes are generated from Rust too, to make it more dynamic.
What changes are included in this PR?
Fix test
Fix ipc serialization of schema to fb for empty custom_metadata field to ensure it serializes as a None/null instead of an empty array (since pyarrow appears to do the same)
Are there any user-facing changes?