Skip to content

18450: feat: support named variables & defaults for CREATE FUNCTION#5

Open
martin-augment wants to merge 4 commits intomainfrom
pr-18450-2025-11-06-09-51-03
Open

18450: feat: support named variables & defaults for CREATE FUNCTION#5
martin-augment wants to merge 4 commits intomainfrom
pr-18450-2025-11-06-09-51-03

Conversation

@martin-augment
Copy link
Owner

@martin-augment martin-augment commented Nov 6, 2025

18450: To review by AI


Note

Adds named parameters and default arguments to CREATE FUNCTION, enforces argument style/order validation, enhances placeholder parsing (including named placeholders), and updates UDF signatures and tests.

  • CREATE FUNCTION / UDF:
    • Support named arguments in function definitions and bodies (e.g., $a, $b).
    • Support default argument values; validate that non-defaults don’t follow defaults.
    • Disallow mixing named and positional styles; validate argument style consistency.
    • Generate Signature::one_of with varying arity based on defaults; store defaults for substitution.
    • Substitute missing args with defaults during simplify(); error if required arg missing.
  • SQL parsing/placeholders:
    • Accept named placeholders; map to positional when parameter types are known; improved error: Unknown placeholder.
    • Include argument names in parameter fields for function bodies.
  • Tests:
    • Add tests for named/default arguments in CREATE FUNCTION and arity handling.
    • Update PREPARE/placeholder error expectations in unit and sqllogictest files.

Written by Cursor Bugbot for commit 8ddaae0. This will update automatically on new commits. Configure here.

@coderabbitai
Copy link

coderabbitai bot commented Nov 6, 2025

Walkthrough

Adds support for default arguments in user-defined scalar functions with validation ensuring non-default arguments don't follow defaults and argument styles remain consistent. Expands placeholder handling to support both named and positional formats, updating error messages accordingly.

Changes

Cohort / File(s) Summary
UDF Default Arguments Support
datafusion/core/tests/user_defined/user_defined_scalar_functions.rs
Introduces defaults: Vec<Option<Expr>> field to ScalarFunctionWrapper. Updates replacement method signature to accept defaults parameter. Modifies TryFrom<CreateFunction> to compute defaults and dynamic signatures. Adds tests for named arguments, default arguments, and mixed argument styles.
Placeholder Support Expansion
datafusion/sql/src/expr/value.rs
Extends create_placeholder_expr to support both named (e.g., $foo) and positional (e.g., $1) placeholders. When encountering invalid numeric placeholders, attempts named placeholder resolution. Updates error handling and documentation.
Function Definition Validation
datafusion/sql/src/statement.rs
Enforces that non-default function arguments cannot appear after default arguments. Disallows mixed positional and named argument styles in function definitions. Refactors argument-type mapping to preserve parameter names in Field construction.
Test Error Message Updates
datafusion/sql/tests/cases/params.rs, datafusion/sqllogictest/test_files/prepare.slt
Updates expected error message from "Invalid placeholder, not a number: $foo" to "Unknown placeholder: $foo" in test expectations.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-18450-2025-11-06-09-51-03

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

"Function argument {} not provided, argument missing!",
placeholder.id
)?,
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Index Bounds in Placeholder Defaults Safety Trap

Potential index out of bounds when accessing defaults[placeholder_position]. If a placeholder references a parameter position that exceeds the number of function parameters (e.g., $5 when only 2 parameters are defined), placeholder_position will be >= defaults.len(), causing a panic. The code should check placeholder_position < defaults.len() before accessing the array.

Fix in Cursor Fix in Web

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Cursor AI reviewer is correct that a index out of bounds error will be raised when the the SQL DDL statement uses a positional placeholder, like $5, when there are less than 5 function parameters. The finding prevents a runtime panic and abort of the application.

@claude
Copy link

claude bot commented Nov 6, 2025

Code Review: Named Parameters and Default Arguments for CREATE FUNCTION

Thank you for this PR! This is a substantial feature addition that adds named parameters and default arguments to CREATE FUNCTION.

Strengths

  1. Comprehensive Test Coverage: Multiple test cases covering named arguments, default arguments with various arities, and error cases.

  2. Good Error Validation: Proper validation for non-default arguments placement, consistent argument style, and unknown placeholders.

  3. Backward Compatibility: Named args and defaults are optional.

Potential Bugs

1. Bounds Check Missing in replacement Function

Location: datafusion/core/tests/user_defined/user_defined_scalar_functions.rs:999

If placeholder_position >= defaults.len(), this will panic with an index out of bounds error.

2. Nullary Signature Edge Case

Location: datafusion/core/tests/user_defined/user_defined_scalar_functions.rs:1041-1042

The condition 'if n == 0' inside the loop starting at 'pos' can never be true when pos > 0.

Design Concerns

Named Parameter Rewriting is Fragile

Location: datafusion/sql/src/expr/value.rs:126-138

The FIXME comment indicates this is a temporary solution. Rewriting named placeholders to positional ones could cause issues with error messages.

Summary

Overall Assessment: Solid feature implementation with good test coverage.

Main concerns:

  1. Critical: Fix bounds checking in replacement function
  2. Important: Clarify/fix Nullary signature generation logic
  3. Nice-to-have: Address technical debt around named parameter rewriting

Action Items

  • Fix bounds checking in replacement function (user_defined_scalar_functions.rs:999)
  • Review Nullary signature generation logic (user_defined_scalar_functions.rs:1041-1042)
  • Add documentation comments for the defaults field
  • Consider adding edge case tests

Great work on this feature!

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. No suggestions at this time.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 25b1fcd and 8ddaae0.

📒 Files selected for processing (5)
  • datafusion/core/tests/user_defined/user_defined_scalar_functions.rs (7 hunks)
  • datafusion/sql/src/expr/value.rs (2 hunks)
  • datafusion/sql/src/statement.rs (2 hunks)
  • datafusion/sql/tests/cases/params.rs (1 hunks)
  • datafusion/sqllogictest/test_files/prepare.slt (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
datafusion/sql/src/expr/value.rs (1)
datafusion/expr/src/expr.rs (1)
  • new_with_field (1171-1173)
datafusion/core/tests/user_defined/user_defined_scalar_functions.rs (3)
datafusion/expr/src/expr_fn.rs (4)
  • signature (484-486)
  • signature (598-600)
  • signature (691-693)
  • placeholder (125-130)
datafusion-examples/examples/function_factory.rs (1)
  • replacement (161-183)
datafusion/expr-common/src/signature.rs (2)
  • one_of (1162-1168)
  • exact (1118-1124)
datafusion/sql/src/statement.rs (2)
datafusion/ffi/src/udaf/mod.rs (2)
  • args (468-474)
  • arg_types (310-314)
datafusion/sql/src/relation/mod.rs (1)
  • args (165-177)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: claude-review
  • GitHub Check: cargo check datafusion features
  • GitHub Check: verify benchmark results (amd64)
  • GitHub Check: cargo check datafusion-common features
  • GitHub Check: cargo check datafusion-substrait features
  • GitHub Check: cargo test (macos-aarch64)
  • GitHub Check: cargo test (macos-aarch64)

Comment on lines 997 to +1005
Transformed::yes(args[placeholder_position].clone())
} else {
exec_err!(
"Function argument {} not provided, argument missing!",
placeholder.id
)?
match defaults[placeholder_position] {
Some(ref default) => Transformed::yes(default.clone()),
None => exec_err!(
"Function argument {} not provided, argument missing!",
placeholder.id
)?,
}
Copy link

@coderabbitai coderabbitai bot Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Guard defaults lookup to avoid panic on invalid placeholders.

If the SQL body references $N where N exceeds the declared parameter count, placeholder_position indexes past defaults, causing an immediate panic instead of a user-facing error. Previously this returned a clean “argument missing” error; now it will crash. Please restore graceful handling by checking the bounds before indexing.

-                    if placeholder_position < args.len() {
-                        Transformed::yes(args[placeholder_position].clone())
-                    } else {
-                        match defaults[placeholder_position] {
-                            Some(ref default) => Transformed::yes(default.clone()),
-                            None => exec_err!(
-                                "Function argument {} not provided, argument missing!",
-                                placeholder.id
-                            )?,
-                        }
-                    }
+                    if placeholder_position < args.len() {
+                        Transformed::yes(args[placeholder_position].clone())
+                    } else if placeholder_position < defaults.len() {
+                        match defaults[placeholder_position] {
+                            Some(ref default) => Transformed::yes(default.clone()),
+                            None => exec_err!(
+                                "Function argument {} not provided, argument missing!",
+                                placeholder.id
+                            )?,
+                        }
+                    } else {
+                        exec_err!(
+                            "Function argument {} referenced in the body exceeds the declared parameter list ({} arguments)",
+                            placeholder.id,
+                            defaults.len()
+                        )?
+                    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Transformed::yes(args[placeholder_position].clone())
} else {
exec_err!(
"Function argument {} not provided, argument missing!",
placeholder.id
)?
match defaults[placeholder_position] {
Some(ref default) => Transformed::yes(default.clone()),
None => exec_err!(
"Function argument {} not provided, argument missing!",
placeholder.id
)?,
}
Transformed::yes(args[placeholder_position].clone())
} else if placeholder_position < defaults.len() {
match defaults[placeholder_position] {
Some(ref default) => Transformed::yes(default.clone()),
None => exec_err!(
"Function argument {} not provided, argument missing!",
placeholder.id
)?,
}
} else {
exec_err!(
"Function argument {} referenced in the body exceeds the declared parameter list ({} arguments)",
placeholder.id,
defaults.len()
)?
}
🤖 Prompt for AI Agents
In datafusion/core/tests/user_defined/user_defined_scalar_functions.rs around
lines 997 to 1005, the code indexes defaults[placeholder_position] without
checking bounds which can panic when a SQL body references $N beyond the
declared parameter count; add a bounds check before accessing defaults: if
placeholder_position is within defaults.len() then use
defaults[placeholder_position] as before, otherwise return the same
exec_err!("Function argument {} not provided, argument missing!",
placeholder.id)? to preserve the previous graceful error path instead of
allowing a panic.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct that a index out of bounds error will be raised when the the SQL DDL statement uses a positional placeholder, like $5, when there are less than 5 function parameters. The finding prevents a runtime panic and abort of the application.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-augment, thank you for confirming this finding! You're absolutely right that this would cause a runtime panic when a SQL function body references a placeholder beyond the declared parameter count (e.g., $5 when only 3 parameters are defined). The bounds check is critical to ensure users get a clear error message instead of crashing the application.

The suggested fix properly handles all three cases:

  1. Provided argument: Use the actual argument value
  2. Missing argument with default: Use the default expression
  3. Missing argument beyond parameter list: Return a clear error message

This is especially important for SQL UDFs where the function body is user-provided SQL text and typos or logic errors could easily reference non-existent parameters.


✏️ Learnings added
Learnt from: martin-augment
Repo: martin-augment/datafusion PR: 5
File: datafusion/core/tests/user_defined/user_defined_scalar_functions.rs:997-1005
Timestamp: 2025-11-06T13:37:45.993Z
Learning: In datafusion/core/tests/user_defined/user_defined_scalar_functions.rs, when implementing ScalarFunctionWrapper::replacement, always bounds-check placeholder_position against defaults.len() before indexing into the defaults array to prevent panics when SQL UDF bodies reference placeholders like $N where N exceeds the declared parameter count.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@martin-augment
Copy link
Owner Author

1. Bounds Check Missing in replacement Function

Location: datafusion/core/tests/user_defined/user_defined_scalar_functions.rs:999

If placeholder_position >= defaults.len(), this will panic with an index out of bounds error.

value:useful; category:bug; feedback: The Claude AI reviewer is correct that a index out of bounds error will be raised when the the SQL DDL statement uses a positional placeholder, like $5, when there are less than 5 function parameters. The finding prevents a runtime panic and abort of the application.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments