Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support writing FixedSizeList to Parquet #4214

Closed
yah01 opened this issue May 14, 2023 · 4 comments · Fixed by #4267
Closed

Support writing FixedSizeList to Parquet #4214

yah01 opened this issue May 14, 2023 · 4 comments · Fixed by #4267
Labels
enhancement Any new improvement worthy of a entry in the changelog help wanted parquet Changes to the parquet crate

Comments

@yah01
Copy link

yah01 commented May 14, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Have array data which are all with fixed size, when I tried to write the RecordBatch with FixedSizeList column, it failed and reported:

"NYI("Datatype FixedSizeList(Field { name: \"item\", data_type: Float32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 128) is not yet supported")"

Describe the solution you'd like
I have searched this error and found at:

d => Err(nyi_err!("Datatype {} is not yet supported", d)),

It looks like not hard to support FixedSizeList just like List (not sure), the only thing need to do is constructing the LevelInfoBuilder for FixedSizeList.

Describe alternatives you've considered
No

Additional context
I'm happy to take it if it's simple as I described above

@yah01 yah01 added the enhancement Any new improvement worthy of a entry in the changelog label May 14, 2023
@tustvold
Copy link
Contributor

I think this should be relatively straightforward, as far as anything involving dremel is straightforward. This series of posts may be helpful for whoever picks this one up - https://arrow.apache.org/blog/2022/10/05/arrow-parquet-encoding-part-1/

@tustvold tustvold self-assigned this May 22, 2023
@alamb alamb changed the title Support to write FixedSizeList Support writing FixedSizeList to Parquet May 22, 2023
@sydduckworth
Copy link

@tustvold I'd be happy to take over the implementation for this (assuming you're not already working on it). I believe I have a working proof of concept, I just need to integrate it into the codebase and add tests.

@tustvold
Copy link
Contributor

Always happy to review PRs, FWIW I think the hard part is the read side

@tustvold tustvold removed their assignment May 23, 2023
@tustvold tustvold added the parquet Changes to the parquet crate label Jun 2, 2023
@tustvold
Copy link
Contributor

tustvold commented Jun 2, 2023

label_issue.py automatically added labels {'parquet'} from #4226

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog help wanted parquet Changes to the parquet crate
Projects
None yet
3 participants