Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support to read/write parquet for FixedSizeList type #4226

Closed
wants to merge 1 commit into from

Conversation

yah01
Copy link

@yah01 yah01 commented May 16, 2023

Which issue does this PR close?

Closes #4214

Rationale for this change

As mentioned in the issue

What changes are included in this PR?

  • Build levels for FixedSizeList
  • Impl write FixedSizeList data for ArrowWriter
  • Impl read ListArray to FixedSizeList for ArrowReader

Are there any user-facing changes?

No

@github-actions github-actions bot added arrow Changes to the arrow crate parquet Changes to the parquet crate labels May 16, 2023
Signed-off-by: yah01 <yah2er0ne@outlook.com>
@github-actions github-actions bot removed the arrow Changes to the arrow crate label May 16, 2023
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this, left a correctness comment, but like where this is headed.

Additionally I think this could do with a roundtrip test, along with a test where the parquet data contains list elements of the wrong size (which should error)

Ok(Arc::new(result_array))
let result_array: ArrayRef = match *list_data.data_type() {
ArrowType::FixedSizeList(_, _) => {
Arc::new(FixedSizeListArray::from(list_data))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct, you are constructing a FixedSizeListArray from an ArrayData that contains an offsets buffer. In particular this code needs to verify that the values array is the right length. I'll see if I can't finish up #3879 so that you can make use of that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually significantly more complicated, as the null padding logic is different for fixed size types. I'll have a think about how this could be supported

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually significantly more complicated, as the null padding logic is different for fixed size types. I'll have a think about how this could be supported

Thank you @tustvold, I know not much about arrow implementation, please let me know if it's too complex so that I can't handle

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add it to my list, its not as trivial as I had first thought 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support writing FixedSizeList to Parquet
3 participants