-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Fix array types coercion: preserve child element nullability for list types #17306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix array types coercion: preserve child element nullability for list types #17306
Conversation
8b478f8 to
bf70047
Compare
joroKr21
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch 👍
… types (#96) UPSTREAM NOTE: This was submitted upstream and should be available in DF50 apache#17306
|
@alamb, @jayzhan211 - Could you please take a look at this change and see if we can merge it? Without this change, we encounter the |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… types (#96) UPSTREAM NOTE: This was submitted upstream and should be available in DF50 apache#17306
… types (#96) UPSTREAM NOTE: This was submitted upstream and should be available in DF50 apache#17306
… types (#96) UPSTREAM NOTE: This was submitted upstream and should be available in DF50 apache#17306
* fix: Ensure only tables or aliases that exist are projected (#52) fix: More dangling references (#54) UPSTREAM NOTE: This PR was attempted to be upstreamed in apache#13405 - but it was not accepted due to the complexity it brought. Phillip needs to figure out what a good solution that solves our problem and can be upstreamed is. * Support for metadata columns (`location`, `size`, `last_modified`) in ListingTableProvider (#74) UPSTREAM NOTE: This PR was attempted to be upstreamed but was not accepted. Needs to be applied manually apache#15181 * Infer placeholder datatype for `Expr::InSubquery` (#80) UPSTREAM NOTE: Upstream PR has been created but not merged yet. Should be available in DF49 apache#15980 * Infer placeholder datatype after `LIMIT` clause as `DataType::Int64` (#81) UPSTREAM NOTE: Upstream PR has been created but not merged yet. Should be available in DF49 apache#15980 * Do not double alias Exprs UPSTREAM NOTE: This was attempted to be fixed with apache#15008 but was closed This is the tracking issue on DataFusion: apache#14895 Do not double alias Exprs * Add prefix to location metadata column (#82) UPSTREAM NOTE: This will not be upstreamed as is. * Infer placeholder types for CASE expressions (#87) UPSTREAM NOTE: This has not been submitted upstream yet. * Expand `infer_placeholder_types` to infer all possible placeholder types based on their expression (#88) UPSTREAM NOTE: This has not been submitted upstream yet. * Fix `Expr::infer_placeholder_types` inference to not fail (#89) UPSTREAM NOTE: This has not been submitted upstream yet. * cherry-pick parquet patch (#94) * Fix array types coercion: preserve child element nullability for list types (#96) UPSTREAM NOTE: This was submitted upstream and should be available in DF50 apache#17306 * Expand `infer_placeholder_types` to infer all possible placeholder types based on their expression (#88) UPSTREAM NOTE: This has not been submitted upstream yet. * do not enforce type guarantees on all Expr traversed in infer_placeholder_types (#97) * Use UDTF function args in `LogicalPlan::TableScan` name (#98) * use UDTF function args in LogicalPlan::TableScan name * update test snapshots * Implement timestamp_cast_dtype for SqliteDialect (#99) * Use text for sqlite timestamp * Add test * Custom timestamp format for DuckDB (#102) * Revert "cherry-pick parquet patch (#94)" This reverts commit d780cc2. * Support ExprNamed arguments to Scalar UDFs (#104) * support ExprNamed until 17379 ships * add same exprnamed lifting to udtf * resolve projection against `ListingTable` table_schema incl. partition columns (#106) * fix: Ensure ListingTable partitions are pruned when filters are not used (#108) * fix: Prune partitions when no filters are defined * fix: Backport for DF49: * review: Address comments * FileScanConfig: Preserve schema metadata across serde boundary (#107) * FileScanConfig: preserve schema metadata across serde boundary * add test * Merge conflict fixes UPSTREAM NOTE: this should not be upstreamed. This contains conflict fixes from various cherry-picks and differences in v50. * update arrow-rs fork UPSTREAM NOTE: this should not be upstreamed --------- Co-authored-by: Phillip LeBlanc <phillip@leblanc.tech> Co-authored-by: Kevin Zimmerman <4733573+kczimm@users.noreply.github.com> Co-authored-by: sgrebnov <sergei.grebnov@gmail.com> Co-authored-by: jeadie <jack@spice.ai> Co-authored-by: Jack Eadie <jack.eadie0@gmail.com> Co-authored-by: Viktor Yershov <krinart@gmail.com> Co-authored-by: Viktor Yershov <viktor@spice.ai> Co-authored-by: David Stancu <david@spice.ai>
Which issue does this PR close?
Fixes #17305
What changes are included in this PR?
PR updates
array_valid_typesimplementation to preserve nullability information for nested fields. It follows up on the change/improvement in the linked item to restore the previous behavior:#15149 (comment)
Are these changes tested?
Yes, the existing unit test was expanded to cover nullability.
Are there any user-facing changes?
No. This only prevents schema mismatch errors during execution, as described in the linked bug report.