Skip to content

Conversation

@vinishjail97
Copy link
Contributor

Describe the issue this Pull Request addresses

There's a change in behavior for for SparkHoodieTableFileIndex since 0.14.1. The StructType(partitionFields) returned doesn't have the full path and causing the data validation failures. This behavior was changed as part of this PR https://github.com/apache/hudi/pull/9863/changes

Summary and Changelog

If there's a table with a nested partition column whose leaf name conflicts with another top level field the partitionedSchema passed to the new file group reader is incorrect. When I tried reverting the previous change found another issue where we are relying on HoodieSchemaConversionUtils.convertStructTypeToHoodieSchema to get requestedSchema in buildReaderWithPartitionValues but this fails because HoodieSchema doesn't like dots in the names.

Looking for guidance or feedback on how to read nested partition columns through parquet reader?

Impact

High

Risk Level

High

Documentation Update

None.

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:S PR with lines of changes in (10, 100] label Dec 31, 2025
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S PR with lines of changes in (10, 100]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants