fix: fall back to Spark when Parquet field ID matching is enabled in native_datafusion by andygrove · Pull Request #3415 · apache/datafusion-comet

andygrove · 2026-02-05T23:30:09Z

Summary

Add a check in nativeDataFusionScan() to detect when spark.sql.parquet.fieldId.read.enabled is true and fall back to Spark, since native DataFusion reads columns by name/position rather than Parquet field IDs
Remove the 6 IgnoreCometNativeDataFusion test annotations for [native_datafusion] [Spark SQL Tests] Parquet field ID matching not supported #3316 from the Spark 3.5.8 diff (ParquetFieldIdIOSuite)

Test plan

Spark SQL tests in ParquetFieldIdIOSuite pass with native_datafusion enabled in CI

🤖 Generated with Claude Code

…native_datafusion When `spark.sql.parquet.fieldId.read.enabled` is true, native_datafusion reads columns by name/position rather than Parquet field IDs, producing wrong results. Detect this config and fall back to Spark. Closes apache#3316 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Spark's TestSQLContext overrides PARQUET_FIELD_ID_READ_ENABLED to true for all tests, so checking only the config caused every native_datafusion scan to fall back in CI. Now also check ParquetUtils.hasFieldIds() on the required schema, matching Spark's own pattern in ParquetRowConverter. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…lback # Conflicts: # dev/diffs/3.5.8.diff

mbutrovich

Makes sense to me. Are we filing follow up issues for the fallbacks?

andygrove · 2026-02-06T15:35:55Z

Makes sense to me. Are we filing follow up issues for the fallbacks?

#3434

…#3415 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Row index tests (apache#3317) and field ID tests (apache#3316) are both fixed upstream in apache#3414 and apache#3415, so no additional test ignores are needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

andygrove and others added 3 commits February 5, 2026 16:29

Merge remote-tracking branch 'apache/main' into fix-3316-field-id-fal…

8f28e29

…lback # Conflicts: # dev/diffs/3.5.8.diff

andygrove marked this pull request as ready for review February 6, 2026 03:57

mbutrovich approved these changes Feb 6, 2026

View reviewed changes

andygrove merged commit 58cf6e1 into apache:main Feb 6, 2026
109 checks passed

andygrove deleted the fix-3316-field-id-fallback branch February 6, 2026 15:34

andygrove added a commit to andygrove/datafusion-comet that referenced this pull request Feb 6, 2026

fix: remove ParquetFieldIdIOSuite ignore annotations, fixed in apache…

15b0eb5

…#3415 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix: fall back to Spark when Parquet field ID matching is enabled in native_datafusion#3415

fix: fall back to Spark when Parquet field ID matching is enabled in native_datafusion#3415
andygrove merged 3 commits intoapache:mainfrom
andygrove:fix-3316-field-id-fallback

andygrove commented Feb 5, 2026

Uh oh!

mbutrovich left a comment

Uh oh!

Uh oh!

andygrove commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

andygrove commented Feb 5, 2026

Summary

Test plan

Uh oh!

mbutrovich left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andygrove commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants