-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Add variant type support to ParquetTypeVisitor #14588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implement variant(GroupType) method in ParquetTypeVisitor and all subclasses to enable proper handling of Parquet variant logical types during schema conversion and manipulation operations.
|
cc @aihuaxu |
aihuaxu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments. Otherwise looks good.
parquet/src/main/java/org/apache/iceberg/parquet/ParquetTypeVisitor.java
Outdated
Show resolved
Hide resolved
parquet/src/test/java/org/apache/iceberg/parquet/TestParquetSchemaUtil.java
Outdated
Show resolved
Hide resolved
parquet/src/test/java/org/apache/iceberg/parquet/TestParquetSchemaUtil.java
Outdated
Show resolved
Hide resolved
- Replace variant spec version with constant - Add variant tests
parquet/src/test/java/org/apache/iceberg/parquet/TestParquetSchemaUtil.java
Outdated
Show resolved
Hide resolved
aihuaxu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
|
@tmater Could you please back-port this fix to 1.10.x? Thanks! |
* Parquet: Add variant type support to ParquetTypeVisitor Implement variant(GroupType) method in ParquetTypeVisitor and all subclasses to enable proper handling of Parquet variant logical types during schema conversion and manipulation operations. * Address review comments - Replace variant spec version with constant - Add variant tests * Column id reordering * Address review comments
|
Thank you for the reviews @huaxingao, @aihuaxu! Created a cherry-pick PR to 1.10.x: #14624 |
* Parquet: Add variant type support to ParquetTypeVisitor Implement variant(GroupType) method in ParquetTypeVisitor and all subclasses to enable proper handling of Parquet variant logical types during schema conversion and manipulation operations. * Address review comments - Replace variant spec version with constant - Add variant tests * Column id reordering * Address review comments
Summary
Adds variant type support to
ParquetTypeVisitorand all its subclasses to enable proper handling of Parquet variant logical types during schema operations.Background
This issue surfaced when using
ParquetUtil.footerMetrics(), which callsconvertAndPrune()on the Parquet schema.TestVariantMetricsuseswriteParquet()and callsParquetMetrics.metrics()directly, which bypasses the schema conversion path and didn't expose this gap. Without thevariant()method implementations, variant fields were being skipped during schema conversion, which then caused an NPE inTypeWithSchemaVisitorwhen it tried to process the variant field that was missing from the converted schema.Changes
variant(GroupType)method toParquetTypeVisitorbase classvariant()in allParquetTypeVisitorsubclasses:MessageTypeToType- converts Parquet variant to IcebergVariantTypeApplyNameMapping- applies name mappings to variant fieldsParquetSchemaUtil.HasIds- checks for field IDs in variant typesRemoveIds- removes IDs from variant schemastestVariantTypeConversion()inTestParquetSchemaUtilTesting
New test validates schema conversion from Parquet variant logical type to Iceberg
VariantType.