18450: feat: support named variables & defaults for CREATE FUNCTION#5
18450: feat: support named variables & defaults for CREATE FUNCTION#5martin-augment wants to merge 4 commits intomainfrom
CREATE FUNCTION#5Conversation
WalkthroughAdds support for default arguments in user-defined scalar functions with validation ensuring non-default arguments don't follow defaults and argument styles remain consistent. Expands placeholder handling to support both named and positional formats, updating error messages accordingly. Changes
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| "Function argument {} not provided, argument missing!", | ||
| placeholder.id | ||
| )?, | ||
| } |
There was a problem hiding this comment.
Bug: Index Bounds in Placeholder Defaults Safety Trap
Potential index out of bounds when accessing defaults[placeholder_position]. If a placeholder references a parameter position that exceeds the number of function parameters (e.g., $5 when only 2 parameters are defined), placeholder_position will be >= defaults.len(), causing a panic. The code should check placeholder_position < defaults.len() before accessing the array.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The Cursor AI reviewer is correct that a index out of bounds error will be raised when the the SQL DDL statement uses a positional placeholder, like $5, when there are less than 5 function parameters. The finding prevents a runtime panic and abort of the application.
Code Review: Named Parameters and Default Arguments for CREATE FUNCTIONThank you for this PR! This is a substantial feature addition that adds named parameters and default arguments to CREATE FUNCTION. Strengths
Potential Bugs1. Bounds Check Missing in replacement FunctionLocation: datafusion/core/tests/user_defined/user_defined_scalar_functions.rs:999 If placeholder_position >= defaults.len(), this will panic with an index out of bounds error. 2. Nullary Signature Edge CaseLocation: datafusion/core/tests/user_defined/user_defined_scalar_functions.rs:1041-1042 The condition 'if n == 0' inside the loop starting at 'pos' can never be true when pos > 0. Design ConcernsNamed Parameter Rewriting is FragileLocation: datafusion/sql/src/expr/value.rs:126-138 The FIXME comment indicates this is a temporary solution. Rewriting named placeholders to positional ones could cause issues with error messages. SummaryOverall Assessment: Solid feature implementation with good test coverage. Main concerns:
Action Items
Great work on this feature! |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
datafusion/core/tests/user_defined/user_defined_scalar_functions.rs(7 hunks)datafusion/sql/src/expr/value.rs(2 hunks)datafusion/sql/src/statement.rs(2 hunks)datafusion/sql/tests/cases/params.rs(1 hunks)datafusion/sqllogictest/test_files/prepare.slt(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
datafusion/sql/src/expr/value.rs (1)
datafusion/expr/src/expr.rs (1)
new_with_field(1171-1173)
datafusion/core/tests/user_defined/user_defined_scalar_functions.rs (3)
datafusion/expr/src/expr_fn.rs (4)
signature(484-486)signature(598-600)signature(691-693)placeholder(125-130)datafusion-examples/examples/function_factory.rs (1)
replacement(161-183)datafusion/expr-common/src/signature.rs (2)
one_of(1162-1168)exact(1118-1124)
datafusion/sql/src/statement.rs (2)
datafusion/ffi/src/udaf/mod.rs (2)
args(468-474)arg_types(310-314)datafusion/sql/src/relation/mod.rs (1)
args(165-177)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: claude-review
- GitHub Check: cargo check datafusion features
- GitHub Check: verify benchmark results (amd64)
- GitHub Check: cargo check datafusion-common features
- GitHub Check: cargo check datafusion-substrait features
- GitHub Check: cargo test (macos-aarch64)
- GitHub Check: cargo test (macos-aarch64)
| Transformed::yes(args[placeholder_position].clone()) | ||
| } else { | ||
| exec_err!( | ||
| "Function argument {} not provided, argument missing!", | ||
| placeholder.id | ||
| )? | ||
| match defaults[placeholder_position] { | ||
| Some(ref default) => Transformed::yes(default.clone()), | ||
| None => exec_err!( | ||
| "Function argument {} not provided, argument missing!", | ||
| placeholder.id | ||
| )?, | ||
| } |
There was a problem hiding this comment.
Guard defaults lookup to avoid panic on invalid placeholders.
If the SQL body references $N where N exceeds the declared parameter count, placeholder_position indexes past defaults, causing an immediate panic instead of a user-facing error. Previously this returned a clean “argument missing” error; now it will crash. Please restore graceful handling by checking the bounds before indexing.
- if placeholder_position < args.len() {
- Transformed::yes(args[placeholder_position].clone())
- } else {
- match defaults[placeholder_position] {
- Some(ref default) => Transformed::yes(default.clone()),
- None => exec_err!(
- "Function argument {} not provided, argument missing!",
- placeholder.id
- )?,
- }
- }
+ if placeholder_position < args.len() {
+ Transformed::yes(args[placeholder_position].clone())
+ } else if placeholder_position < defaults.len() {
+ match defaults[placeholder_position] {
+ Some(ref default) => Transformed::yes(default.clone()),
+ None => exec_err!(
+ "Function argument {} not provided, argument missing!",
+ placeholder.id
+ )?,
+ }
+ } else {
+ exec_err!(
+ "Function argument {} referenced in the body exceeds the declared parameter list ({} arguments)",
+ placeholder.id,
+ defaults.len()
+ )?
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Transformed::yes(args[placeholder_position].clone()) | |
| } else { | |
| exec_err!( | |
| "Function argument {} not provided, argument missing!", | |
| placeholder.id | |
| )? | |
| match defaults[placeholder_position] { | |
| Some(ref default) => Transformed::yes(default.clone()), | |
| None => exec_err!( | |
| "Function argument {} not provided, argument missing!", | |
| placeholder.id | |
| )?, | |
| } | |
| Transformed::yes(args[placeholder_position].clone()) | |
| } else if placeholder_position < defaults.len() { | |
| match defaults[placeholder_position] { | |
| Some(ref default) => Transformed::yes(default.clone()), | |
| None => exec_err!( | |
| "Function argument {} not provided, argument missing!", | |
| placeholder.id | |
| )?, | |
| } | |
| } else { | |
| exec_err!( | |
| "Function argument {} referenced in the body exceeds the declared parameter list ({} arguments)", | |
| placeholder.id, | |
| defaults.len() | |
| )? | |
| } |
🤖 Prompt for AI Agents
In datafusion/core/tests/user_defined/user_defined_scalar_functions.rs around
lines 997 to 1005, the code indexes defaults[placeholder_position] without
checking bounds which can panic when a SQL body references $N beyond the
declared parameter count; add a bounds check before accessing defaults: if
placeholder_position is within defaults.len() then use
defaults[placeholder_position] as before, otherwise return the same
exec_err!("Function argument {} not provided, argument missing!",
placeholder.id)? to preserve the previous graceful error path instead of
allowing a panic.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct that a index out of bounds error will be raised when the the SQL DDL statement uses a positional placeholder, like $5, when there are less than 5 function parameters. The finding prevents a runtime panic and abort of the application.
There was a problem hiding this comment.
@martin-augment, thank you for confirming this finding! You're absolutely right that this would cause a runtime panic when a SQL function body references a placeholder beyond the declared parameter count (e.g., $5 when only 3 parameters are defined). The bounds check is critical to ensure users get a clear error message instead of crashing the application.
The suggested fix properly handles all three cases:
- Provided argument: Use the actual argument value
- Missing argument with default: Use the default expression
- Missing argument beyond parameter list: Return a clear error message
This is especially important for SQL UDFs where the function body is user-provided SQL text and typos or logic errors could easily reference non-existent parameters.
✏️ Learnings added
Learnt from: martin-augment
Repo: martin-augment/datafusion PR: 5
File: datafusion/core/tests/user_defined/user_defined_scalar_functions.rs:997-1005
Timestamp: 2025-11-06T13:37:45.993Z
Learning: In datafusion/core/tests/user_defined/user_defined_scalar_functions.rs, when implementing ScalarFunctionWrapper::replacement, always bounds-check placeholder_position against defaults.len() before indexing into the defaults array to prevent panics when SQL UDF bodies reference placeholders like $N where N exceeds the declared parameter count.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
value:useful; category:bug; feedback: The Claude AI reviewer is correct that a |
18450: To review by AI
Note
Adds named parameters and default arguments to
CREATE FUNCTION, enforces argument style/order validation, enhances placeholder parsing (including named placeholders), and updates UDF signatures and tests.$a,$b).Signature::one_ofwith varying arity based on defaults; store defaults for substitution.simplify(); error if required arg missing.Unknown placeholder.CREATE FUNCTIONand arity handling.Written by Cursor Bugbot for commit 8ddaae0. This will update automatically on new commits. Configure here.