Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Rename array() function to make_array(), extend array[] #3122

Closed
wants to merge 1 commit into from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Aug 13, 2022

Draft until:

Which issue does this PR close?

Closes #3115
Closes #3123

Rationale for this change

The core rationale is that this change is needed to support the next release of sqlparser-rs (which includes apache/datafusion-sqlparser-rs#566) which makes it impossible to use array as a function

More details here: apache/datafusion-sqlparser-rs#566 (comment)

The array() function is Spark syntax.

Postgres syntax is array[] and postgres actually treats the array keyword specially (it can not be used as a function)

alamb=# select array(1,2);
ERROR:  syntax error at or near "1"
LINE 1: select array(1,2);

What changes are included in this PR?

  1. Rename array() to make_array() (so it doesn't conflict with postgres)
  2. Update SQL planner to support array[] call make_array for array[] syntax
  3. Require all arguments to make_array and array[] be the same type (as arrays have a single type for all elements)
  4. Tests for same

Are there any user-facing changes?

  1. array() is now written make_array()
  2. array[] supports columns rather than just constants (e.g. array[column, column, ..] now works)
  3. make_array() will error if the types are not exactly the same

@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Physical Expressions sql SQL Planner labels Aug 13, 2022
@alamb alamb force-pushed the alamb/no_function_named_array branch from 5dd736e to a882e34 Compare August 15, 2022 22:34
@alamb alamb force-pushed the alamb/no_function_named_array branch from a882e34 to 65ab517 Compare August 15, 2022 22:42
/// Create an array of FixedSizeList from a set of individual Arrays
/// where each element in the output FixedSizeList is the result of
/// concatenating the corresponding values in the input Arrays
macro_rules! make_fixed_size_list {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed this to be more descriptive of what it does (there are too may arrays in DataFusion already ;)

"| [aaa, 3] |",
"+--------------------------------------+",
"+----------+",
"| array |",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using array for the column name is consistent with postgres

@@ -111,8 +111,8 @@ async fn query_concat() -> Result<()> {
Ok(())
}

#[tokio::test]
async fn query_array() -> Result<()> {
// Return a session context with table "test" registered with 2 columns
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this file illustrate the change in behavior that this PR makes

#[tokio::test]
async fn query_array() {
let ctx = array_context();
let sql = "SELECT array[c1, cast(c2 as varchar)] FROM test";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior to this PR, this query would error (only constants were supported in array[] syntax. cc @ovr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

async fn query_make_array_scalar() {
let ctx = SessionContext::new();

let sql = "SELECT make_array(1, 2, 3);";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the equivalent of SELECT array(1,2, 3) on master

array_expressions::SUPPORTED_ARRAY_TYPES.to_vec(),
fun.volatility(),
),
BuiltinScalarFunction::MakeArray => Signature::variadic_equal(fun.volatility()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires all arguments to make_array be the same type

/// Create an array of FixedSizeList from a set of individual Arrays
/// where each element in the output FixedSizeList is the result of
/// concatenating the corresponding values in the input Arrays
macro_rules! make_fixed_size_list {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply renamed this macro to something that shows more of what it is doing

}
let fun = BuiltinScalarFunction::MakeArray;
// follow postgres convention and name result "array"
Ok(Expr::ScalarFunction { fun, args }.alias("array"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the rewrite in the planner to write array[] syntax to a make_array function call

// HashSet doesn't guarantee order
assert_contains!(
err.to_string(),
r#"Arrays with different types are not supported: "#
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is moved to function.rs

@alamb alamb force-pushed the alamb/no_function_named_array branch from 65ab517 to 23845c8 Compare August 16, 2022 10:17
@alamb alamb mentioned this pull request Aug 18, 2022
5 tasks
@andygrove
Copy link
Member

@alamb I believe this is the next PR in line to be merged into sqlparser-0.21

@andygrove
Copy link
Member

@alamb I believe this is the next PR in line to be merged into sqlparser-0.21

Never mind. These changes are not required in order to upgrade.

@alamb
Copy link
Contributor Author

alamb commented Aug 18, 2022

Update on this PR: I think I may have gotten too ambitious in this PR: array() somewhat confusingly returns a FixedSizedList and array[] returns a List.

Maybe we can just settle for updating the name of the array function to make_array 🤔 Will sleep on it

@andygrove
Copy link
Member

Update on this PR: I think I may have gotten too ambitious in this PR: array() somewhat confusingly returns a FixedSizedList and array[] returns a List.

Maybe we can just settle for updating the name of the array function to make_array thinking Will sleep on it

I went ahead and created a PR just to do the rename: #3199

@alamb
Copy link
Contributor Author

alamb commented Aug 19, 2022

Thanks @andygrove -- I am going to abandon this PR for now. I learned a lot but I don't have time to figure out the implementation details of ListArrays more ;)

@alamb alamb closed this Aug 19, 2022
@izveigor izveigor mentioned this pull request Jun 6, 2023
@alamb alamb deleted the alamb/no_function_named_array branch August 8, 2023 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Physical Expressions sql SQL Planner
Projects
None yet
3 participants