-
Notifications
You must be signed in to change notification settings - Fork 38
Closed
Description
If I change the following
| .sql("select product_id, sum(amount) from warehouse.test.orders group by product_id order by product_id") |
to be a select * instead of an aggregation, i see the following panic
thread 'test_equality_delete' panicked at datafusion_iceberg/tests/equality_delete.rs:182:10:
Failed to execute select query: Context("SanityCheckPlan", Plan("Plan: [\"SortExec: expr=[id@0 ASC NULLS LAST], preserve_partitioning=[false]\", \" DataSourceExec: file_groups={0 groups: []}, projection=[id, customer_id, product_id, date, amount], file_type=parquet\"] does not satisfy distribution requirements: SinglePartition. Child-0 output partitioning: UnknownPartitioning(0)"))
The plan for that looks like this
"| initial_physical_plan | UnionExec |",
"| | ProjectionExec: expr=[id@0 as id, customer_id@1 as customer_id, product_id@2 as product_id, date@3 as date, amount@4 as amount] |",
"| | HashJoinExec: mode=CollectLeft, join_type=RightAnti, on=[(id@0, id@0), (customer_id@1, customer_id@1), (product_id@2, product_id@2), (date@3, date@3)] |",
"| | DataSourceExec: file_groups={1 group: [[test/orders/data/date_day=18262/64b47434-6d07-11f0-88f8-de51894a27a1.parquet]]}, projection=[id, customer_id, product_id, date, date_day], file_type=parquet |",
"| | DataSourceExec: file_groups={1 group: [[test/orders/data/date_day=18262/64b02528-6d07-11f0-88f7-6d5546fe4045.parquet]]}, projection=[id, customer_id, product_id, date, amount], file_type=parquet |",
"| | ProjectionExec: expr=[id@0 as id, customer_id@1 as customer_id, product_id@2 as product_id, date@3 as date, amount@4 as amount] |",
"| | HashJoinExec: mode=CollectLeft, join_type=RightAnti, on=[(id@0, id@0), (customer_id@1, customer_id@1), (product_id@2, product_id@2), (date@3, date@3)] |",
"| | DataSourceExec: file_groups={1 group: [[test/orders/data/date_day=18294/64b48f32-6d07-11f0-88f9-bf3bba452bfc.parquet]]}, projection=[id, customer_id, product_id, date, date_day], file_type=parquet |",
"| | DataSourceExec: file_groups={1 group: [[test/orders/data/date_day=18294/64af9cb6-6d07-11f0-88f6-1d36c1f3beb3.parquet]]}, projection=[id, customer_id, product_id, date, amount], file_type=parquet |",
"| | DataSourceExec: file_groups={0 groups: []}, projection=[id, customer_id, product_id, date, amount], file_type=parquet |",
"| | |",
That last DataSourceExec: file_groups={0 groups: []} is the cause of the problem, and it happens due to the fact that in this test all data file groups have matching equality delete groups, meaning that once all of those are paired up for a (anti) join, there are no more file groups left when constructing the other plan here
iceberg-rust/datafusion_iceberg/src/table.rs
Lines 856 to 866 in ceb696f
| let file_scan_config = FileScanConfigBuilder::new(object_store_url, file_schema, file_source) | |
| .with_file_groups(file_groups) | |
| .with_statistics(statistics) | |
| .with_projection(projection) | |
| .with_limit(limit) | |
| .with_table_partition_cols(table_partition_cols) | |
| .build(); | |
| let other_plan = ParquetFormat::default() | |
| .create_physical_plan(session, file_scan_config) | |
| .await?; |
Consequently that last no-op plan is added, but since it has wrong partitioning it causes the sanity check panic.
Metadata
Metadata
Assignees
Labels
No labels