Skip to content

Conversation

@findepi
Copy link
Member

@findepi findepi commented Sep 11, 2025

The planner takes into account the return type a function promises to return. It even passes it back on invoke as a reminder/convenience. Verify that each function delivers on the promise.

@findepi findepi marked this pull request as draft September 11, 2025 09:24
@github-actions github-actions bot added the logical-expr Logical plan and expressions label Sep 11, 2025
@findepi
Copy link
Member Author

findepi commented Sep 11, 2025

This failed locally. I hope the CI will fail too. If not, need to update how the check is conditioned.

@findepi
Copy link
Member Author

findepi commented Sep 11, 2025

External error: 8 errors in file /Users/runner/work/datafusion/datafusion/datafusion/sqllogictest/test_files/spark/array/array.slt

1. query failed: DataFusion error: Execution error: Function 'array' returned value of type 'List(Field { name: "element", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })' while the following type was promised at planning time and expected: 'List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })'
[SQL] SELECT array(1, 2, 3);
at /Users/runner/work/datafusion/datafusion/datafusion/sqllogictest/test_files/spark/array/array.slt:18


2. query failed: DataFusion error: Execution error: Function 'array' returned value of type 'List(Field { name: "element", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })' while the following type was promised at planning time and expected: 'List(Field { name: "item", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })'
[SQL] SELECT array('a', 'b');
at /Users/runner/work/datafusion/datafusion/datafusion/sqllogictest/test_files/spark/array/array.slt:24


3. query failed: DataFusion error: Execution error: Function 'array' returned value of type 'List(Field { name: "item", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })' while the following type was promised at planning time and expected: 'List(Field { name: "element", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })'
[SQL] SELECT array();
at /Users/runner/work/datafusion/datafusion/datafusion/sqllogictest/test_files/spark/array/array.slt:30


4. query failed: DataFusion error: Execution error: Function 'array' returned value of type 'List(Field { name: "item", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })' while the following type was promised at planning time and expected: 'List(Field { name: "element", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })'
[SQL] SELECT array(), array(array());
at /Users/runner/work/datafusion/datafusion/datafusion/sqllogictest/test_files/spark/array/array.slt:35


5. query failed: DataFusion error: Execution error: Function 'array' returned value of type 'List(Field { name: "element", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })' while the following type was promised at planning time and expected: 'List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })'
[SQL] SELECT array(1, NULL, 3);

7. query failed: DataFusion error: Execution error: Function 'parse_url' returned value of type 'Utf8' while the following type was promised at planning time and expected: 'Utf8View'
[SQL] SELECT parse_url('http://userinfo@spark.apache.org/path?query=1#Ref'::string, 'REF'::string);
at /Users/runner/work/datafusion/datafusion/datafusion/sqllogictest/test_files/spark/url/parse_url.slt:48


8. query failed: DataFusion error: Execution error: Function 'parse_url' returned value of type 'Utf8' while the following type was promised at planning time and expected: 'Utf8View'
[SQL] SELECT parse_url('http://userinfo@spark.apache.org/path?query=1#Ref'::string, 'PROTOCOL'::string);
at /Users/runner/work/datafusion/datafusion/datafusion/sqllogictest/test_files/spark/url/parse_url.slt:53


9. query failed: DataFusion error: Execution error: Function 'parse_url' returned value of type 'Utf8' while the following type was promised at planning time and expected: 'Utf8View'
[SQL] SELECT parse_url('http://userinfo@spark.apache.org/path?query=1#Ref'::string, 'FILE'::string);
at /Users/runner/work/datafusion/datafusion/datafusion/sqllogictest/test_files/spark/url/parse_url.slt:58


10. query failed: DataFusion error: Execution error: Function 'parse_url' returned value of type 'Utf8' while the following type was promised at planning time and expected: 'Utf8View'
[SQL] SELECT parse_url('http://userinfo@spark.apache.org/path?query=1#Ref'::string, 'AUTHORITY'::string);
at /Users/runner/work/datafusion/datafusion/datafusion/sqllogictest/test_files/spark/url/parse_url.slt:63


... other 1 errors in /Users/runner/work/datafusion/datafusion/datafusion/sqllogictest/test_files/spark/url/parse_url.slt not shown ...

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) ffi Changes to the ffi crate and removed sqllogictest SQL Logic Tests (.slt) labels Sep 11, 2025
The default branch picks `Int32` when all arguments are Null-typed, so
it's applicable to no-args just as good.
The planner takes into account the return type a function promises to
return. It even passes it back on invoke as a reminder/convenience.
Verify that each function delivers on the promise.
@findepi findepi force-pushed the findepi/check-func-return-type branch from 15319d6 to dbeb79d Compare September 11, 2025 14:48
@github-actions github-actions bot added the spark label Sep 11, 2025
@findepi findepi force-pushed the findepi/check-func-return-type branch from dbeb79d to 25078a4 Compare September 11, 2025 14:49
@findepi findepi marked this pull request as ready for review September 11, 2025 14:53
@alamb alamb changed the title Check function return value's type Add assertion that ScalarUDFImpl implementation is consistent with declared return type Sep 12, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @findepi ! Given we found several other bugs with this assertion I think the value is clear 👍

I had some small style suggestions, but nothing I think is needed

Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me!

@findepi findepi merged commit a431bf7 into apache:main Sep 12, 2025
28 checks passed
@findepi findepi deleted the findepi/check-func-return-type branch September 12, 2025 14:31
github-merge-queue bot pushed a commit that referenced this pull request Nov 11, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #18597

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

A check is recently added to `invoke_with_args` that checks for the
output type of the result with the expected output type from the UDF -
#17515. Because the fast path
misses adding the timezone, the assertion added in this PR fails.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Include timezone information in the fast path.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Yes, added a unit test

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->
No

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
hareshkh added a commit to hareshkh/datafusion that referenced this pull request Nov 11, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#18597

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

A check is recently added to `invoke_with_args` that checks for the
output type of the result with the expected output type from the UDF -
apache#17515. Because the fast path
misses adding the timezone, the assertion added in this PR fails.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Include timezone information in the fast path.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Yes, added a unit test

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->
No

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
xudong963 pushed a commit that referenced this pull request Nov 12, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123. -->

- Closes #18597

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes. -->

A check is recently added to `invoke_with_args` that checks for the
output type of the result with the expected output type from the UDF -
#17515. Because the fast path
misses adding the timezone, the assertion added in this PR fails.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Include timezone information in the fast path.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Yes, added a unit test and SLT test

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->
No

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ffi Changes to the ffi crate logical-expr Logical plan and expressions spark

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants