Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support FixedSizeList type coercion #8902

Merged
merged 24 commits into from
Feb 1, 2024

Conversation

Weijun-H
Copy link
Member

@Weijun-H Weijun-H commented Jan 18, 2024

Which issue does this PR close?

Parts #6560
Parts #8249

Rationale for this change

When the argument has a FixedSizeList, type coercion is applied to the array function. This pull request only affects the function that uses two arguments, which are array_element, array_has, array_positions, array_prepend, array_remove, array_remove_all , and array_has.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Jan 18, 2024
@Weijun-H Weijun-H changed the title Minor: Add support for parsing FixedSizeList type Minor: support FixedSizeList type coercion Jan 18, 2024
@Weijun-H
Copy link
Member Author

Weijun-H commented Jan 18, 2024

Stalled by #8344

}
BuiltinScalarFunction::ArrayPrepend => {
Signature::element_and_array(self.volatility())
}
BuiltinScalarFunction::ArrayRepeat => Signature::any(2, self.volatility()),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the functions including array_repeat, array_union, array_intersect, and array_except are also two-argument, Signature::array_and_element could not handle NULL well for them. The following PR could continue to work on this issue.

@Weijun-H Weijun-H marked this pull request as ready for review January 21, 2024 02:51
BuiltinScalarFunction::ArrayExcept => Signature::any(2, self.volatility()),
BuiltinScalarFunction::Flatten => Signature::any(1, self.volatility()),
BuiltinScalarFunction::ArrayHasAll
| BuiltinScalarFunction::ArrayHasAny
| BuiltinScalarFunction::ArrayHas => Signature::any(2, self.volatility()),
| BuiltinScalarFunction::ArrayHas => {
Signature::array_and_element(self.volatility())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the example that works after this change.
I did not find one in array.slt

One of the example is array_has(list, null)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array_has_any and array_has_any cannot use array_and_element, because their signature is array and array.

@@ -311,6 +329,8 @@ fn coerced_from<'a>(
Utf8 | LargeUtf8 => Some(type_into.clone()),
Null if can_cast_types(type_from, type_into) => Some(type_into.clone()),

List(_) if matches!(type_from, FixedSizeList(_, _)) => Some(type_into.clone()),
Copy link
Contributor

@jayzhan211 jayzhan211 Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be easier if we run Fixedsizelist to List like coerce_arguments_for_fun before this signature type coercion?

Copy link
Member Author

@Weijun-H Weijun-H Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unable to decide between coerce_arguments_for_fun and coerce_arguments_for_signature. Are there specific benefits for each?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One is coerced based on function, another is based on signature. If we target to convert from fixedsizelist to list, it seems coerce_arguments_for_fun is more suitable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we can have fixed size list to list conversion in one place.
Currently, it converts both in here and coerce_arguments_for_fun, right? Is it possible to have it once?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move coerce_arguments_for_fun before signature? Convert FixedSizeList to List before signature coercion. Then we don't need to deal with FixedSizeList

Copy link
Member Author

@Weijun-H Weijun-H Jan 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can.

In ExprSchemale(https://github.com/apache/arrow-datafusion/blob/a7cdf605a1e23608e51889988c239613c80fb671/datafusion/expr/src/expr_schema.rs#L91-L105)
It forces the function return type consistent, but I am not sure of its purpose. Maybe @2010YOUY01 could explain it more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we dont need this anymore, input types should be handled later on via coerce_arguments_for_signature. They are doing the same things

Copy link
Contributor

@jayzhan211 jayzhan211 Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Weijun-H Weijun-H marked this pull request as draft January 24, 2024 04:19
@github-actions github-actions bot added the optimizer Optimizer rules label Jan 24, 2024
@Weijun-H Weijun-H marked this pull request as ready for review January 24, 2024 11:48
Comment on lines +119 to +120
/// Specifies Signatures for array functions
ArraySignature(ArrayFunctionSignature),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future extension

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is a good change, though a change of the API

// Cast Fixedsizelist to List for array functions
if *fun == BuiltinScalarFunction::MakeArray {
// coerce the fixed size list to list for all array fucntions
if fun.name().contains("array") {
Copy link
Contributor

@jayzhan211 jayzhan211 Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two problems that I think we should stick to the previous checking.

  1. name() takes the first aliases of the function, so if there we forgot to have first aliases starts with array, it fails.
  2. We just need to care about make_array only, then other functions should get non-fixedsizelsit as input.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we only need to care about make_array? If we put coerce_arguments_for_fun in front of coerce_arguments_for_signature, it should cast all FixedSizeList to List regarding array functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because any kind of array functions includes make_array(). Like array_append(cast(fixedsizelist, list), element). we dont need to check again with array_append.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it is difficult for FixedSizeList type coercion. My idea is to do these staff in coerce_arguments_for_signature based on different ArraySignature(ArrayFunctionSignature), which will benefit future maintenance and development. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I know the reason, it seems we still need to handle fixedsizelist for other array functions because we can pass Fixedsizelist directly to them with arrow_cast.

In this case, I agree we should handle them in coerce_arguments_for_signature. And, coerce_arguments_for_fun can be removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep coerce_arguments_for_fun until all FixedSizeList type coercion work is done.

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Weijun-H
Copy link
Member Author

@jayzhan211 Thank you for your review

@alamb
Copy link
Contributor

alamb commented Jan 30, 2024

Thank you @Weijun-H and @jayzhan211 -- I plan to review this tomorrow if possible. I know it has been outstanding for quite a while

@alamb alamb changed the title Minor: support FixedSizeList type coercion Support FixedSizeList type coercion Jan 31, 2024
@alamb alamb added the api change Changes the API exposed to users of the crate label Jan 31, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Weijun-H -- really nice

Also, thank you @jayzhan211 for the review. It was wonderful to review this PR after you and @Weijun-H have worked on it and find the structure in such good shape

Comment on lines +119 to +120
/// Specifies Signatures for array functions
ArraySignature(ArrayFunctionSignature),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is a good change, though a change of the API

@@ -263,6 +283,33 @@ impl Signature {
volatility,
}
}
/// Specialized Signature for ArrayAppend and similar functions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup

datafusion/common/src/utils.rs Show resolved Hide resolved
@@ -77,6 +77,19 @@ AS
FROM arrays
;

#TODO: create FixedSizeList with NULL column
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we maybe track this TODO with a separate ticket?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracked by #9094

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@alamb alamb merged commit 038763c into apache:main Feb 1, 2024
22 checks passed
@alamb
Copy link
Contributor

alamb commented Feb 1, 2024

Thanks again @Weijun-H !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants