Skip to content

Conversation

@r1b
Copy link
Contributor

@r1b r1b commented Nov 2, 2025

Which issue does this PR close?

Rationale for this change

See linked issue above

What changes are included in this PR?

  • Add validation when planning CreateFunction
    • Enforce consistent parameter style (positional or named)
    • Non-default params cannot follow default params
  • CreateFunction parameter names are now preserved from the parse tree
  • If we encounter a named parameter when constructing a Placeholder
    • We try to rewrite this to a positional parameter from the available param types
    • If no matching param type is found, report an error
  • Update ScalarFunctionWrapper to handle defaults
    • Preserve the parsed defaults
    • Generate all valid signatures for all possible combinations of arguments
    • Fall back to default expr when no matching argument is provided

Are these changes tested?

Yes, see added / adjusted unit tests

Are there any user-facing changes?

Yes, if the approach here is acceptable we should update the SQL UDF examples in datafusion-examples/examples/function_factory.rs (TODO).

Also, note that due to ambiguity between PREPARE and CREATE FUNCTION param context one error message now has reduced fidelity.

Invalid placeholder, not a number: $foo

This can now be triggered either by using named params in a prepared statement or when referencing an undefined named param in a SQL UDF. New message:

Unknown placeholder: $foo

Finally, there are two additional user-facing errors when planning CreateFunction:

  • When named / positional parameter styles are mixed
  • When non-default arguments follow default arguments

@github-actions github-actions bot added sql SQL Planner core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Nov 2, 2025
Comment on lines +1263 to +1274
// Validate parameter style
if let Some(ref fields) = arg_types {
let count_positional =
fields.iter().filter(|f| f.name() == "").count();
if !(count_positional == 0 || count_positional == fields.len()) {
return plan_err!("All function arguments must use either named or positional style.");
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note I'm not sure this is actually necessary, but it made it easier to reason about the changes. If we think this is valuable I can look at relaxing this constraint.

Comment on lines +126 to +128
// FIXME: This branch is shared by params from PREPARE and CREATE FUNCTION, but
// only CREATE FUNCTION currently supports named params. For now, we rewrite
// these to positional params.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note I explored doing this without rewriting to positional params, but I couldn't see a path forward without either:

  • Adding a way to distinguish between prepared statement vs SQL UDF context
  • Supporting named params in prepared statements (not currently supported in sqlparser AFAICT)

@r1b
Copy link
Contributor Author

r1b commented Nov 2, 2025

Test failure seems like a flake:

<...>
1. query result mismatch:
[SQL] SELECT shuffle(['a', 'b', 'c', 'd', 'e', 'f']) != ['a', 'b', 'c', 'd', 'e', 'f'];;
[Diff] (-expected|+actual)
-   true
+   false
at /__w/datafusion/datafusion/datafusion/sqllogictest/test_files/spark/array/shuffle.slt:36
<...>

@Jefffrey Jefffrey changed the title feat: support named arguments, defaults in udfs feat: support named variables & defaults for CREATE FUNCTION Nov 5, 2025
@r1b r1b force-pushed the feat/udf-named-params-defaults branch from 93314e6 to 8ddaae0 Compare November 5, 2025 23:50
@r1b
Copy link
Contributor Author

r1b commented Nov 5, 2025

Thanks for the feedback @Jefffrey, updated

@r1b r1b requested a review from Jefffrey November 6, 2025 00:19
assert!(expected.starts_with(&err.strip_backtrace()));
Ok(())
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test case with positional parameter that is not existing, e.g. $5 where there are less than 5 arguments.
I have the feeling that it will fail with index out of bounds error at https://github.com/apache/datafusion/pull/18450/files#diff-647d2e08b4d044bf63b35f9e23092ba9673b80b1568e8f3abffd7f909552ea1aR999

You need to add a check similar to if placeholder_position < defaults.len() {...} around it and return an error in the else clause

Copy link
Contributor Author

@r1b r1b Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, addressed in 0330adb.

This case also revealed that the DEFAULT syntax is broken for positional params. I switched to = syntax and added a test that illustrates the problem in a97ddb5.

Ref: https://github.com/apache/datafusion-sqlparser-rs/blob/308a7231bcbc5c1c8ab71fe38f17b1a21632a6c6/src/parser/mod.rs#L5536

EDIT: It seems that = is the "canonical" syntax in any case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed a fix for the DEFAULT syntax bug upstream: apache/datafusion-sqlparser-rs#2091

@Jefffrey Jefffrey added this pull request to the merge queue Nov 12, 2025
Merged via the queue into apache:main with commit 62f5cd6 Nov 12, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate sql SQL Planner sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

It would be great if we could use variable names instead of $1 and support default values in CREATE FUNCTION

3 participants