Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inconsistencies in type coercion logic #3419

Open
Tracked by #2355
andygrove opened this issue Sep 9, 2022 · 6 comments
Open
Tracked by #2355

Fix inconsistencies in type coercion logic #3419

andygrove opened this issue Sep 9, 2022 · 6 comments
Labels
bug Something isn't working optimizer Optimizer rules

Comments

@andygrove
Copy link
Member

andygrove commented Sep 9, 2022

Describe the bug
The TypeCoercion rule incorrectly uses coerce_types (which should probably be renamed to get_binary_op_type because it determines the output type of a binary expression) to determine the types to case the lhs and rhs expressions.

I think we need a new and similar method for determining the common type to case inputs to, for binary expressions.

Maybe we need to introduce signatures for binary ops so this can be consolidated more, similar to how we handle UDF signatrures.

To Reproduce
Steps to reproduce the behavior:

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

@andygrove andygrove added bug Something isn't working optimizer Optimizer rules labels Sep 9, 2022
@andygrove andygrove self-assigned this Sep 9, 2022
@liukun4515
Copy link
Contributor

Do we need make align the type coercion with arrow?
The cast system should not be same with arrow, because the datafusion is the SQL system and has some special semantics than the arrow lib
@andygrove

@andygrove
Copy link
Member Author

I agree that there will be differences. See the conversation at #3407 (comment) for one example where we have to work around a difference and this example is possibly due to a bug in DataFusions logic

@andygrove andygrove changed the title DataFusion coercion logic is different from Arrow's Fix inconsistencies in type coercion logic Sep 9, 2022
@andygrove
Copy link
Member Author

I updated the title and description now that I understand this more.

@alamb
Copy link
Contributor

alamb commented Oct 3, 2022

Maybe we could return a struct like

enum NeededCoercion {
  /// cast inputs to these datatypes
  coerce_inputs: Vec<Datatype>,
  /// the resulting output will be this datatype
  output_type: DataType
}

Or something 🤔

@HaoYang670
Copy link
Contributor

HaoYang670 commented Nov 10, 2022

Hi, could someone help to explain what the coerce_types does. Does it coerce the lhs and rhs types, or calculate the return type of the operator?

@alamb
Copy link
Contributor

alamb commented Nov 10, 2022

Hi, could someone help to explain what the coerce_types does. Does it coerce the lhs and rhs types, or calculate the return type of the operator?

I think this is a reasonable description of coercion:

https://github.com/apache/arrow-datafusion/blob/9c24a79118c0ae0aee480bf039385a06d240b499/datafusion/expr/src/type_coercion.rs#L18-L32

tustvold added a commit to tustvold/arrow-datafusion that referenced this issue Jun 27, 2023
tustvold added a commit that referenced this issue Jun 28, 2023
* Cleanup type coercion (#3419)

* Further fixes

* Tweak doc

* Review feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working optimizer Optimizer rules
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants