Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CHORE]: move utf8 functions from daft-dsl to daft-functions #3101

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

ConeyLiu
Copy link
Contributor

This refers to #2854 moving the UTF-8 functions from daft-dsl to daft-functions

@github-actions github-actions bot added the chore label Oct 22, 2024
};
}

macro_rules! utf8_function_three_arguments {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three macros are ugly, I haven't been able to find better solutions. I would appreciate any assistance or suggestions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we created a few helper functions:

fn utf8_unary(
    func: impl Fn(ExprRef) -> ExprRef,
    sql_name: &str,
    arg_name: &str,
    inputs: &[sqlparser::ast::FunctionArg],
    planner: &crate::planner::SQLPlanner,
) -> SQLPlannerResult<ExprRef> {
    match inputs {
        [input] => {
            let input = planner.plan_function_arg(input)?;
            Ok(func(input))
        }
        _ => invalid_operation_err!(
            "invalid arguments for {sql_name}. Expected {sql_name}({arg_name})",
        ),
    }
}

fn utf8_binary(...) 
fn utf3_ternary(...)

that'd allow us to simplify the macros into a single one pretty easily. It also pulls the bulk of the code out of macro which will cut down on code generation during compiling.

with the helper functions, you can create a single macro that matches on all 3 patterns

macro_rules! utf8_function {
    ($name:ident, $sql_name:expr, $func:expr, $doc:expr, $arg_name:expr) => {
        // ...
        impl SQLFunction for $name {
            fn to_expr(..) -> SQLPlannerResult<ExprRef> {
                utf8_unary($func, $sql_name, $arg_name, inputs, planner)
            }
            // ...
        }
    };
    ($name:ident, $sql_name:expr, $func:expr, $doc:expr, $arg_name_1:expr, $arg_name_2:expr) => {
        // ...
        impl SQLFunction for $name {
            fn to_expr(..) -> SQLPlannerResult<ExprRef> {
                utf8_binary($func, $sql_name, $arg_name_1, $arg_name_2, inputs, planner)
            }
            // ...
        }
    };
    ($name:ident, $sql_name:expr, $func:expr, $doc:expr, $arg_name_1:expr, $arg_name_2:expr, $arg_name_3:expr) => {
        // ...
        impl SQLFunction for $name {
            fn to_expr(..) -> SQLPlannerResult<ExprRef> {
                utf8_ternary($func, $sql_name, $arg_name_1, $arg_name_2, $arg_name_3, inputs, planner)
            }
            // ...
        }
    };
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I have updated it based on your suggestions.

Copy link

codspeed-hq bot commented Oct 22, 2024

CodSpeed Performance Report

Merging #3101 will degrade performances by 21.32%

Comparing ConeyLiu:move-utf8 (030483f) with main (5c00dbc)

Summary

❌ 1 regressions
✅ 16 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main ConeyLiu:move-utf8 Change
test_show[100 Small Files] 40 ms 50.9 ms -21.32%

parent.add_fn("regexp_match", Match);
parent.add_fn("regexp_extract", Extract(0));
parent.add_fn("regexp_extract_all", ExtractAll(0));
parent.add_fn("regexp_replace", Replace(true));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@universalmind303 I think this should be a bug. The regexp_replace set the regexp to True, while the implementation is not.

@ConeyLiu
Copy link
Contributor Author

@universalmind303 could you help to review this again? Thanks a lot.

@universalmind303
Copy link
Collaborator

@universalmind303 could you help to review this again? Thanks a lot.

sorry for the delay. Ill take a look at it today!

Copy link
Collaborator

@universalmind303 universalmind303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ConeyLiu Looks good! Thanks for this! Once the conflicts are resolved, I can go ahead and merge this in!

@ConeyLiu
Copy link
Contributor Author

Thanks @universalmind303 for your time. I have rebased it.

@@ -1601,7 +1601,7 @@ def limit(self, num: int) -> "DataFrame":
│ --- │
│ Int64 │
╞═══════╡
│ 1 │ f
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small fix since this breaks the doc tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants