-
Notifications
You must be signed in to change notification settings - Fork 819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support to read/write parquet for FixedSizeList type #4226
Conversation
Signed-off-by: yah01 <yah2er0ne@outlook.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this, left a correctness comment, but like where this is headed.
Additionally I think this could do with a roundtrip test, along with a test where the parquet data contains list elements of the wrong size (which should error)
Ok(Arc::new(result_array)) | ||
let result_array: ArrayRef = match *list_data.data_type() { | ||
ArrowType::FixedSizeList(_, _) => { | ||
Arc::new(FixedSizeListArray::from(list_data)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct, you are constructing a FixedSizeListArray from an ArrayData that contains an offsets buffer. In particular this code needs to verify that the values array is the right length. I'll see if I can't finish up #3879 so that you can make use of that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually significantly more complicated, as the null padding logic is different for fixed size types. I'll have a think about how this could be supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually significantly more complicated, as the null padding logic is different for fixed size types. I'll have a think about how this could be supported
Thank you @tustvold, I know not much about arrow implementation, please let me know if it's too complex so that I can't handle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you should padding some null data? As https://arrow.apache.org/docs/format/Columnar.html#fixed-size-list-layout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add it to my list, its not as trivial as I had first thought 😅
Which issue does this PR close?
Closes #4214
Rationale for this change
As mentioned in the issue
What changes are included in this PR?
FixedSizeList
FixedSizeList
data for ArrowWriterListArray
toFixedSizeList
for ArrowReaderAre there any user-facing changes?
No