-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CHORE]: move utf8 functions from daft-dsl to daft-functions #3101
base: main
Are you sure you want to change the base?
Conversation
src/daft-sql/src/modules/utf8.rs
Outdated
}; | ||
} | ||
|
||
macro_rules! utf8_function_three_arguments { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three macros are ugly, I haven't been able to find better solutions. I would appreciate any assistance or suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we created a few helper functions:
fn utf8_unary(
func: impl Fn(ExprRef) -> ExprRef,
sql_name: &str,
arg_name: &str,
inputs: &[sqlparser::ast::FunctionArg],
planner: &crate::planner::SQLPlanner,
) -> SQLPlannerResult<ExprRef> {
match inputs {
[input] => {
let input = planner.plan_function_arg(input)?;
Ok(func(input))
}
_ => invalid_operation_err!(
"invalid arguments for {sql_name}. Expected {sql_name}({arg_name})",
),
}
}
fn utf8_binary(...)
fn utf3_ternary(...)
that'd allow us to simplify the macros into a single one pretty easily. It also pulls the bulk of the code out of macro which will cut down on code generation during compiling.
with the helper functions, you can create a single macro that matches on all 3 patterns
macro_rules! utf8_function {
($name:ident, $sql_name:expr, $func:expr, $doc:expr, $arg_name:expr) => {
// ...
impl SQLFunction for $name {
fn to_expr(..) -> SQLPlannerResult<ExprRef> {
utf8_unary($func, $sql_name, $arg_name, inputs, planner)
}
// ...
}
};
($name:ident, $sql_name:expr, $func:expr, $doc:expr, $arg_name_1:expr, $arg_name_2:expr) => {
// ...
impl SQLFunction for $name {
fn to_expr(..) -> SQLPlannerResult<ExprRef> {
utf8_binary($func, $sql_name, $arg_name_1, $arg_name_2, inputs, planner)
}
// ...
}
};
($name:ident, $sql_name:expr, $func:expr, $doc:expr, $arg_name_1:expr, $arg_name_2:expr, $arg_name_3:expr) => {
// ...
impl SQLFunction for $name {
fn to_expr(..) -> SQLPlannerResult<ExprRef> {
utf8_ternary($func, $sql_name, $arg_name_1, $arg_name_2, $arg_name_3, inputs, planner)
}
// ...
}
};
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I have updated it based on your suggestions.
CodSpeed Performance ReportMerging #3101 will degrade performances by 21.32%Comparing Summary
Benchmarks breakdown
|
parent.add_fn("regexp_match", Match); | ||
parent.add_fn("regexp_extract", Extract(0)); | ||
parent.add_fn("regexp_extract_all", ExtractAll(0)); | ||
parent.add_fn("regexp_replace", Replace(true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@universalmind303 I think this should be a bug. The regexp_replace
set the regexp
to True, while the implementation is not.
@universalmind303 could you help to review this again? Thanks a lot. |
sorry for the delay. Ill take a look at it today! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ConeyLiu Looks good! Thanks for this! Once the conflicts are resolved, I can go ahead and merge this in!
Thanks @universalmind303 for your time. I have rebased it. |
@@ -1601,7 +1601,7 @@ def limit(self, num: int) -> "DataFrame": | |||
│ --- │ | |||
│ Int64 │ | |||
╞═══════╡ | |||
│ 1 │ f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small fix since this breaks the doc tests.
This refers to #2854 moving the UTF-8 functions from daft-dsl to daft-functions