Summary
5 Spark SQL tests fail because native_datafusion doesn't respect Parquet field ID matching.
Failing Tests
ParquetFieldIdIOSuite: "Parquet reads infer fields using field ids correctly"
ParquetFieldIdIOSuite: "absence of field ids"
ParquetFieldIdIOSuite: "SPARK-38094: absence of field ids: reading nested schema"
ParquetFieldIdIOSuite: "multiple id matches"
ParquetFieldIdIOSuite: "read parquet file without ids"
ParquetFieldIdIOSuite: "global read/write flag should work correctly"
Root Cause
native_datafusion reads columns by name/position rather than Parquet field IDs, producing wrong results when spark.sql.parquet.fieldId.read.enabled is true.
Possible Fix
In CometScanRule.nativeDataFusionScan(), detect when spark.sql.parquet.fieldId.read.enabled is true and fall back to native_iceberg_compat.
Related
Discovered in CI for #3307 (enable native_datafusion in auto scan mode).