Skip to content

Conversation

@tmater
Copy link
Contributor

@tmater tmater commented Nov 14, 2025

Summary

Adds variant type support to ParquetTypeVisitor and all its subclasses to enable proper handling of Parquet variant logical types during schema operations.

Background

This issue surfaced when using ParquetUtil.footerMetrics(), which calls convertAndPrune() on the Parquet schema. TestVariantMetrics uses writeParquet() and calls ParquetMetrics.metrics() directly, which bypasses the schema conversion path and didn't expose this gap. Without the variant() method implementations, variant fields were being skipped during schema conversion, which then caused an NPE in TypeWithSchemaVisitor when it tried to process the variant field that was missing from the converted schema.

Changes

  • Add variant(GroupType) method to ParquetTypeVisitor base class
  • Implement variant() in all ParquetTypeVisitor subclasses:
    • MessageTypeToType - converts Parquet variant to Iceberg VariantType
    • ApplyNameMapping - applies name mappings to variant fields
    • ParquetSchemaUtil.HasIds - checks for field IDs in variant types
    • RemoveIds - removes IDs from variant schemas
  • Add test testVariantTypeConversion() in TestParquetSchemaUtil

Testing

New test validates schema conversion from Parquet variant logical type to Iceberg VariantType.

Implement variant(GroupType) method in ParquetTypeVisitor and all
subclasses to enable proper handling of Parquet variant logical types
during schema conversion and manipulation operations.
@huaxingao
Copy link
Contributor

cc @aihuaxu

Copy link
Contributor

@aihuaxu aihuaxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. Otherwise looks good.

 - Replace variant spec version with constant
 - Add variant tests
@tmater tmater requested a review from aihuaxu November 17, 2025 16:26
@tmater tmater requested a review from huaxingao November 18, 2025 10:02
Copy link
Contributor

@aihuaxu aihuaxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@huaxingao huaxingao merged commit b6b8926 into apache:main Nov 18, 2025
44 checks passed
@huaxingao
Copy link
Contributor

Thanks @tmater for the fix! Thanks @aihuaxu for the review!

@huaxingao huaxingao modified the milestone: Iceberg 1.10.1 Nov 18, 2025
@huaxingao
Copy link
Contributor

@tmater Could you please back-port this fix to 1.10.x? Thanks!

@tmater tmater deleted the variant_metrics branch November 19, 2025 09:24
tmater added a commit to tmater/iceberg that referenced this pull request Nov 19, 2025
* Parquet: Add variant type support to ParquetTypeVisitor

Implement variant(GroupType) method in ParquetTypeVisitor and all
subclasses to enable proper handling of Parquet variant logical types
during schema conversion and manipulation operations.

* Address review comments

 - Replace variant spec version with constant
 - Add variant tests

* Column id reordering

* Address review comments
@tmater
Copy link
Contributor Author

tmater commented Nov 19, 2025

Thank you for the reviews @huaxingao, @aihuaxu!

Created a cherry-pick PR to 1.10.x: #14624

huaxingao pushed a commit that referenced this pull request Nov 19, 2025
* Parquet: Add variant type support to ParquetTypeVisitor

Implement variant(GroupType) method in ParquetTypeVisitor and all
subclasses to enable proper handling of Parquet variant logical types
during schema conversion and manipulation operations.

* Address review comments

 - Replace variant spec version with constant
 - Add variant tests

* Column id reordering

* Address review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants