-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support FixedSizeList
Type Coercion
#9108
Conversation
2556f04
to
c66d8dd
Compare
c66d8dd
to
27e665d
Compare
68f8d21
to
2f477b2
Compare
query error | ||
select array_ndims(1) | ||
|
||
query error | ||
select array_ndims(null) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow PostgreSQL
b621437
to
03f44f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Weijun-H -- this is looking good. I have a question about the inner functions and the need for allow_null_coercion
datafusion/expr/src/signature.rs
Outdated
@@ -117,7 +122,8 @@ pub enum TypeSignature { | |||
/// is `OneOf(vec![Any(0), VariadicAny])`. | |||
OneOf(Vec<TypeSignature>), | |||
/// Specifies Signatures for array functions | |||
ArraySignature(ArrayFunctionSignature), | |||
/// Boolean value specifies whether null type coercion is allowed | |||
ArraySignature(ArrayFunctionSignature, bool), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expand this comment to explain what null type coercion is and when it would make sense notn to be allowed?
I don't understand what it is used for 🤔 it seems to me like null coercion should always be allowed
datafusion/expr/src/signature.rs
Outdated
pub fn get_type_signature( | ||
&self, | ||
current_types: &[DataType], | ||
allow_null_coercion: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we are passing allow_null_coercion here, maybe it would make sense to have allow_null_coercion to be part of each array enum? Perhaps something like
impl ArrayFunctionSignature {
ArrayAndElement(allow_null_coercion),
...
Array(allow_null_coercion)
}
But again I am not sure about the need for allow_null_coercion 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allow_null_coercion
determines if the signature can allow a null
type. For example, empty(NULL)
accepts NULL
while array_dims(Null)
will return error
@@ -80,73 +78,6 @@ fn get_valid_types( | |||
signature: &TypeSignature, | |||
current_types: &[DataType], | |||
) -> Result<Vec<Vec<DataType>>> { | |||
fn array_append_or_prepend_valid_types( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see -- so one part of this PR is extracting this function into ArrayFunctionSignature ?
datafusion/expr/src/signature.rs
Outdated
current_types: &[DataType], | ||
allow_null_coercion: bool, | ||
) -> Result<Vec<Vec<DataType>>> { | ||
fn array_append_or_prepend_valid_types( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any reason to create these inner functions, rather than actual functions (or maybe just inline the body of the functions into the match
statement below)?
I find the extra level of indenting somewhat confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is only for the match
statement below
c9f0b9b
to
9b22063
Compare
datafusion/expr/src/signature.rs
Outdated
}; | ||
|
||
// We follow Postgres on `array_append(Null, T)`, which is not valid. | ||
if array_type.eq(&DataType::Null) && !allow_null_coercion { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need allow_null_coercion
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parameter is to determine if the signature could accept NULL
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any array function's signature is array_and_element(true, )
? If not, it is better not to introduce allow_null_coercion
for array_and_element
now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
array_has
, array_positions
, array_remove
and array_remove_all
should use it.
D SELECT list_contains(null, 1);
┌────────────────────────┐
│ list_contains(NULL, 1) │
│ int32 │
├────────────────────────┤
│ │
└────────────────────────┘
D SELECT list_position(null, 1);
┌────────────────────────┐
│ list_position(NULL, 1) │
│ int32 │
├────────────────────────┤
│ │
└────────────────────────┘
**Schema (PostgreSQL v15)**
---
**Query #1**
SELECT array_remove(NULL, 2);
| array_remove |
| ------------ |
| |
This comment was marked as outdated.
This comment was marked as outdated.
datafusion/expr/src/signature.rs
Outdated
Ok(vec![vec![array_type]]) | ||
} | ||
DataType::Null => Ok(vec![vec![array_type.clone()]]), | ||
_ => Ok(vec![vec![DataType::List(Arc::new(Field::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should return error if not List
f0f9f7a
to
a9623fb
Compare
@@ -433,6 +435,7 @@ pub fn array_element(args: &[ArrayRef]) -> Result<ArrayRef> { | |||
let indexes = as_int64_array(&args[1])?; | |||
general_array_element::<i64>(array, indexes) | |||
} | |||
DataType::Null => Ok(args[0].clone()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to handle null here (any other array functions)? Is it like pre-checking? If it is, note that array_element(null, )
should be an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need functions like pre-checking
, or validation
for checking the input of the Array function when we move to UDF-based implementation 🤔 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think handling null here makes it confusing where it is nulls handled, is it handled incoercion
step coerce signature
or here (array_expressions). The test in array.slt only process once. We can only know it is handled correctly, but unsure which one does the job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think handling null here makes it confusing where it is nulls handled, is it handled in
coercion
stepcoerce signature
or here (array_expressions). The test in array.slt only process once. We can only know it is handled correctly, but unsure which one does the job.
This is what I am considering now, in coercion
step could filter if the function accepts the NULL
type while in array_expressions
it determines what the function should return @jayzhan211 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in coercion step could filter if the function accepts the NULL type while in array_expressions it determines what the function should return
Does it mean we still need to do the null handling in array_expression
? Take array_append
for example, if we find null, we convert it to i64(None) and process it later on.
ideally, nulls should be handled in type coercion. And, array_expression
can be simple that just focuses on processing the result and not handling type or signature-related issues.
If doing the checking again in array_expression
is just because it is a public function, I think we should come out with a way that the checking can be somewhat reused in both places. Before that, maybe we just leave the checking in coercion
step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in coercion step could filter if the function accepts the NULL type while in array_expressions it determines what the function should return
Does it mean we still need to do the null handling in
array_expression
? Takearray_append
for example, if we find null, we convert it to i64(None) and process it later on.ideally, nulls should be handled in type coercion. And,
array_expression
can be simple that just focuses on processing the result and not handling type or signature-related issues.
It would be ideal if we could handle null in one place. However, array functions have different results for NULL
. For instance, array_append(null, 1)
returns an error, whereas array_has(null, 1)
returns null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice that in postgres, array_append(null, 1)
return {1}, array_append(null, ARRAY[1])
is error. 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can ignore nulls as the first argument since we are not coercing them to any other type. Unlike, array_append(arr, null) as the second argument, in this case, we will convert to other primitives.
array()
for fixed list to list type coercion only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe our primary focus within TypeCoercionRewriter should be strictly on type coercion. If encountering a null value that isn't being cast to another type, it should be disregarded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe our primary focus within TypeCoercionRewriter should be strictly on type coercion. If encountering a null value that isn't being cast to another type, it should be disregarded.
I cannot agree more with that. I should focus on the type coercion, and leave NULL
aside now. So I will remove allow_null_coercion
and don't check null
now.
datafusion/expr/src/signature.rs
Outdated
| DataType::LargeList(_) | ||
| DataType::FixedSizeList(_, _) => { | ||
let array_type = coerced_fixed_size_list_to_list(array_type); | ||
Ok(vec![vec![array_type, DataType::Int64]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it can be further simplied if there is a function that process coerced_fixed_size_list_to_list
and return array_type
. Then can be used both in array_and_index
and array
datafusion/expr/src/signature.rs
Outdated
/// `current_types` - The data types of the arguments | ||
/// `allow_null_coercion` - Whether null type coercion is allowed | ||
/// Returns the valid types for the function signature | ||
pub fn get_type_signature( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I don't think we need to place all signature functions into get_type_signature
.
Instead, having separate signature functions. And, collect what we need in the matching function.
This way the function can be reused easily. For example,
fn array() return array_type.
match {
signature::array => { vec![vec![array()]] }
signature::array_and_index => { vec![vec![array(), i64]] }
}
d2c1182
to
a7f121e
Compare
datafusion/expr/src/signature.rs
Outdated
_ => Ok(vec![vec![]]), | ||
} | ||
} | ||
fn array_and_index(current_types: &[DataType]) -> Result<Vec<Vec<DataType>>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn array_and_index(current_types: &[DataType]) -> Result<Vec<Vec<DataType>>> { | |
fn array(array_type: &DataType) -> Option<DataType> { | |
match array_type { | |
DataType::List(_) | |
| DataType::LargeList(_) | |
| DataType::FixedSizeList(_, _) => { | |
let array_type = coerced_fixed_size_list_to_list(array_type); | |
Some(array_type) | |
} | |
_ => None, | |
} | |
} | |
match self { | |
ArrayFunctionSignature::ArrayAndElement => { | |
array_append_or_prepend_valid_types(current_types, true) | |
} | |
ArrayFunctionSignature::ElementAndArray => { | |
array_append_or_prepend_valid_types(current_types, false) | |
} | |
ArrayFunctionSignature::ArrayAndIndex => { | |
if current_types.len() != 2 { | |
return Ok(vec![vec![]]); | |
} | |
let array_type = array(¤t_types[0]); | |
if let Some(array_type) = array_type { | |
Ok(vec![vec![array_type, DataType::Int64]]) | |
} else { | |
Ok(vec![]) | |
} | |
} | |
ArrayFunctionSignature::ArrayAndElementAndOptionalIndex => { | |
array_element_and_optional_index(current_types) | |
} | |
ArrayFunctionSignature::Array => { | |
if current_types.len() != 1 { | |
return Ok(vec![vec![]]); | |
} | |
let array_type = array(¤t_types[0]); | |
if let Some(array_type) = array_type { | |
Ok(vec![vec![array_type]]) | |
} else { | |
Ok(vec![]) | |
} | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to return the data type for array
instead of long but useless Result<Vec<Vec<DataType>>>
datafusion/expr/src/signature.rs
Outdated
} | ||
|
||
impl std::fmt::Display for ArrayFunctionSignature { | ||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | ||
match self { | ||
ArrayFunctionSignature::ArrayAndElement => { | ||
write!(f, "array, element") | ||
write!(f, "ArrayAndElement") | ||
} | ||
ArrayFunctionSignature::ArrayAndElementAndOptionalIndex => { | ||
write!(f, "array, element, [index]") |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
datafusion/expr/src/signature.rs
Outdated
} | ||
|
||
impl std::fmt::Display for ArrayFunctionSignature { | ||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | ||
match self { | ||
ArrayFunctionSignature::ArrayAndElement => { | ||
write!(f, "array, element") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @Weijun-H
datafusion/expr/src/signature.rs
Outdated
} | ||
array(¤t_types[0]).map_or_else( | ||
|| Ok(vec![vec![]]), | ||
|array_type| Ok(vec![vec![array_type, DataType::Int64]]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@Weijun-H, thanks again! |
Which issue does this PR close?
Follow #8902
Rationale for this change
For future maintenance, extend
Array
which specialized Signature for ArrayEmpty and similar functionsWhat changes are included in this PR?
Signature::array
Are these changes tested?
Are there any user-facing changes?