Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Parquet] Fix schema conversion from two-level encoding nested list #43994

Open
wgtmac opened this issue Sep 6, 2024 · 1 comment
Open

Comments

@wgtmac
Copy link
Member

wgtmac commented Sep 6, 2024

Describe the bug, including details regarding any error messages, version, and platform.

I have seen an issue when reading a Parquet file created by Hudi. There is a nesting list as below:

  optional group a (LIST) {
    repeated group array (LIST) {
      repeated int32 array;
    }
  }

The C++ parquet reader infers its schema as array<struct<array:array<int>>>. The root cause is here:

// We distinguish the special case that we have
//
// required/optional group name=whatever {
// repeated group name=array or $SOMETHING_tuple {
// required/optional TYPE item;
// }
// }

I think we need to regard them as a nesting two-level list, meaning that the correct interpretation is array<array<int>>.

Component(s)

C++, Parquet

@mapleFU
Copy link
Member

mapleFU commented Sep 6, 2024

array<array<int>>?

wgtmac added a commit to wgtmac/arrow that referenced this issue Sep 20, 2024
wgtmac added a commit to wgtmac/arrow that referenced this issue Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants