Skip to content

Conversation

@sgrebnov
Copy link
Member

@sgrebnov sgrebnov commented Aug 23, 2025

Which issue does this PR close?

Fixes #17305

What changes are included in this PR?

PR updates array_valid_typesimplementation to preserve nullability information for nested fields. It follows up on the change/improvement in the linked item to restore the previous behavior:

#15149 (comment)

Are these changes tested?

Yes, the existing unit test was expanded to cover nullability.

Are there any user-facing changes?

No. This only prevents schema mismatch errors during execution, as described in the linked bug report.

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Aug 23, 2025
@sgrebnov sgrebnov force-pushed the sgrebnov/upstream-fix-type-coercion branch from 8b478f8 to bf70047 Compare August 23, 2025 22:07
Copy link
Contributor

@joroKr21 joroKr21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch 👍

sgrebnov added a commit to spiceai/datafusion that referenced this pull request Aug 25, 2025
… types (#96)

UPSTREAM NOTE: This was submitted upstream and should be available in DF50

apache#17306
@sgrebnov
Copy link
Member Author

sgrebnov commented Sep 8, 2025

@alamb, @jayzhan211 - Could you please take a look at this change and see if we can merge it? Without this change, we encounter the column types must match schema types error described in #17305.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sgrebnov and @joroKr21 -- this makes esense to me

@sgrebnov
Copy link
Member Author

sgrebnov commented Sep 8, 2025

Thank you @alamb and @joroKr21 for review!

@alamb alamb merged commit a96dcc8 into apache:main Sep 9, 2025
27 checks passed
Jeadie pushed a commit to spiceai/datafusion that referenced this pull request Sep 9, 2025
… types (#96)

UPSTREAM NOTE: This was submitted upstream and should be available in DF50

apache#17306
Jeadie pushed a commit to spiceai/datafusion that referenced this pull request Sep 12, 2025
… types (#96)

UPSTREAM NOTE: This was submitted upstream and should be available in DF50

apache#17306
peasee pushed a commit to spiceai/datafusion that referenced this pull request Oct 27, 2025
… types (#96)

UPSTREAM NOTE: This was submitted upstream and should be available in DF50

apache#17306
peasee added a commit to spiceai/datafusion that referenced this pull request Oct 27, 2025
* fix: Ensure only tables or aliases that exist are projected (#52)
fix: More dangling references (#54)

UPSTREAM NOTE: This PR was attempted to be upstreamed in apache#13405 - but it was not accepted due to the complexity it brought. Phillip needs to figure out what a good solution that solves our problem and can be upstreamed is.

* Support for metadata columns (`location`, `size`, `last_modified`)  in ListingTableProvider (#74)

UPSTREAM NOTE: This PR was attempted to be upstreamed but was not accepted. Needs to be applied manually
apache#15181

* Infer placeholder datatype for `Expr::InSubquery` (#80)

UPSTREAM NOTE: Upstream PR has been created but not merged yet. Should be available in DF49
apache#15980

* Infer placeholder datatype after `LIMIT` clause as `DataType::Int64` (#81)

UPSTREAM NOTE: Upstream PR has been created but not merged yet. Should be available in DF49
apache#15980

* Do not double alias Exprs

UPSTREAM NOTE: This was attempted to be fixed with
apache#15008 but was closed

This is the tracking issue on DataFusion:
apache#14895
Do not double alias Exprs

* Add prefix to location metadata column (#82)

UPSTREAM NOTE: This will not be upstreamed as is.

* Infer placeholder types for CASE expressions (#87)

UPSTREAM NOTE: This has not been submitted upstream yet.

* Expand `infer_placeholder_types` to infer all possible placeholder types based on their expression (#88)

UPSTREAM NOTE: This has not been submitted upstream yet.

* Fix `Expr::infer_placeholder_types` inference to not fail (#89)

UPSTREAM NOTE: This has not been submitted upstream yet.

* cherry-pick parquet patch (#94)

* Fix array types coercion: preserve child element nullability for list types (#96)

UPSTREAM NOTE: This was submitted upstream and should be available in DF50

apache#17306

* Expand `infer_placeholder_types` to infer all possible placeholder types based on their expression (#88)

UPSTREAM NOTE: This has not been submitted upstream yet.

* do not enforce type guarantees on all Expr traversed in infer_placeholder_types (#97)

* Use UDTF function args in `LogicalPlan::TableScan` name (#98)

* use UDTF function args in LogicalPlan::TableScan name

* update test snapshots

* Implement timestamp_cast_dtype for SqliteDialect (#99)

* Use text for sqlite timestamp

* Add test

* Custom timestamp format for DuckDB (#102)

* Revert "cherry-pick parquet patch (#94)"

This reverts commit d780cc2.

* Support ExprNamed arguments to Scalar UDFs (#104)

* support ExprNamed until 17379 ships

* add same exprnamed lifting to udtf

* resolve projection against `ListingTable` table_schema incl. partition columns (#106)

* fix: Ensure ListingTable partitions are pruned when filters are not used (#108)

* fix: Prune partitions when no filters are defined

* fix: Backport for DF49:

* review: Address comments

* FileScanConfig: Preserve schema metadata across serde boundary (#107)

* FileScanConfig: preserve schema metadata across serde boundary

* add test

* Merge conflict fixes

UPSTREAM NOTE: this should not be upstreamed. This contains conflict fixes from various cherry-picks and differences in v50.

* update arrow-rs fork

UPSTREAM NOTE: this should not be upstreamed

---------

Co-authored-by: Phillip LeBlanc <phillip@leblanc.tech>
Co-authored-by: Kevin Zimmerman <4733573+kczimm@users.noreply.github.com>
Co-authored-by: sgrebnov <sergei.grebnov@gmail.com>
Co-authored-by: jeadie <jack@spice.ai>
Co-authored-by: Jack Eadie <jack.eadie0@gmail.com>
Co-authored-by: Viktor Yershov <krinart@gmail.com>
Co-authored-by: Viktor Yershov <viktor@spice.ai>
Co-authored-by: David Stancu <david@spice.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Array types coercion does not preserve child element nullability for list types

3 participants