fix: Reproduce nested partition columns pruning data validation failure #17759

vinishjail97 · 2025-12-31T01:35:26Z

Describe the issue this Pull Request addresses

There's a change in behavior for for SparkHoodieTableFileIndex since 0.14.1. The StructType(partitionFields) returned doesn't have the full path and causing the data validation failures. This behavior was changed as part of this PR https://github.com/apache/hudi/pull/9863/changes

Summary and Changelog

If there's a table with a nested partition column whose leaf name conflicts with another top level field the partitionedSchema passed to the new file group reader is incorrect. When I tried reverting the previous change found another issue where we are relying on HoodieSchemaConversionUtils.convertStructTypeToHoodieSchema to get requestedSchema in buildReaderWithPartitionValues but this fails because HoodieSchema doesn't like dots in the names.

Looking for guidance or feedback on how to read nested partition columns through parquet reader?

Impact

High

Risk Level

High

Documentation Update

None.

Contributor's checklist

Read through contributor's guide
Enough context is provided in the sections above
Adequate tests were added if applicable

hudi-bot · 2025-12-31T02:57:41Z

CI report:

e2d3fd6 Azure: FAILURE

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

Handle nested map and array columns in MDT

e2d3fd6

github-actions bot added the size:S PR with lines of changes in (10, 100] label Dec 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Reproduce nested partition columns pruning data validation failure #17759

fix: Reproduce nested partition columns pruning data validation failure #17759

vinishjail97 commented Dec 31, 2025

Uh oh!

hudi-bot commented Dec 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: Reproduce nested partition columns pruning data validation failure #17759

Are you sure you want to change the base?

fix: Reproduce nested partition columns pruning data validation failure #17759

Conversation

vinishjail97 commented Dec 31, 2025

Describe the issue this Pull Request addresses

Summary and Changelog

Impact

Risk Level

Documentation Update

Contributor's checklist

Uh oh!

hudi-bot commented Dec 31, 2025

CI report:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants