Skip to content

Comments

fix: fall back to Spark when Parquet field ID matching is enabled in native_datafusion#3415

Merged
andygrove merged 3 commits intoapache:mainfrom
andygrove:fix-3316-field-id-fallback
Feb 6, 2026
Merged

fix: fall back to Spark when Parquet field ID matching is enabled in native_datafusion#3415
andygrove merged 3 commits intoapache:mainfrom
andygrove:fix-3316-field-id-fallback

Conversation

@andygrove
Copy link
Member

Summary

Closes #3316

Test plan

  • Spark SQL tests in ParquetFieldIdIOSuite pass with native_datafusion enabled in CI

🤖 Generated with Claude Code

andygrove and others added 3 commits February 5, 2026 16:29
…native_datafusion

When `spark.sql.parquet.fieldId.read.enabled` is true, native_datafusion
reads columns by name/position rather than Parquet field IDs, producing
wrong results. Detect this config and fall back to Spark.

Closes apache#3316

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Spark's TestSQLContext overrides PARQUET_FIELD_ID_READ_ENABLED to true
for all tests, so checking only the config caused every native_datafusion
scan to fall back in CI. Now also check ParquetUtils.hasFieldIds() on the
required schema, matching Spark's own pattern in ParquetRowConverter.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@andygrove andygrove marked this pull request as ready for review February 6, 2026 03:57
Copy link
Contributor

@mbutrovich mbutrovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. Are we filing follow up issues for the fallbacks?

@andygrove andygrove merged commit 58cf6e1 into apache:main Feb 6, 2026
109 checks passed
@andygrove andygrove deleted the fix-3316-field-id-fallback branch February 6, 2026 15:34
@andygrove
Copy link
Member Author

Makes sense to me. Are we filing follow up issues for the fallbacks?

#3434

andygrove added a commit to andygrove/datafusion-comet that referenced this pull request Feb 6, 2026
…#3415

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
andygrove added a commit to andygrove/datafusion-comet that referenced this pull request Feb 6, 2026
Row index tests (apache#3317) and field ID tests (apache#3316) are both fixed
upstream in apache#3414 and apache#3415, so no additional test ignores are needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[native_datafusion] [Spark SQL Tests] Parquet field ID matching not supported

2 participants