Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(rust): Too-strict SQL UDF schema validation #20202

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Shoeboxam
Copy link

@Shoeboxam Shoeboxam commented Dec 6, 2024

Closes #15155

Over-eager schema validation is unfortunately rendering user-defined SQL functions unusable. Not sure why registered UDFs need to hardcode/pre-specify the names and dtypes of prospective arguments-- the plugin can handle invalid calling patterns much more flexibly. Reducing the strictness also allows SQL plugins to be variadic, just like Python plugins are.

I wrote this PR with a different and simpler approach because the original PR from @uhlarmarek has stalled: #15159

The SQL functionality seems useful, but we can't move forward with it until we can register UDFs. Thanks for the great work as always!

@Shoeboxam Shoeboxam changed the title fix(rust): too-strict user-defined SQL UDF schema validation fix(rust): Too-strict user-defined SQL UDF schema validation Dec 6, 2024
@github-actions github-actions bot added fix Bug fix rust Related to Rust Polars and removed title needs formatting labels Dec 6, 2024
Copy link

codecov bot commented Dec 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.10%. Comparing base (9e32651) to head (d277416).
Report is 67 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #20202      +/-   ##
==========================================
- Coverage   79.63%   79.10%   -0.53%     
==========================================
  Files        1564     1572       +8     
  Lines      217804   219931    +2127     
  Branches     2477     2465      -12     
==========================================
+ Hits       173439   173985     +546     
- Misses      43796    45378    +1582     
+ Partials      569      568       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Shoeboxam Shoeboxam changed the title fix(rust): Too-strict user-defined SQL UDF schema validation fix(rust): Too-strict SQL UDF schema validation Dec 7, 2024
@ritchie46
Copy link
Member

What does this fix? This only removes checks. What is the issue at hand?

@Shoeboxam
Copy link
Author

Shoeboxam commented Dec 7, 2024

Imagine I want to define a UDF noise that adds noise from a normal distribution to its input. It would be natural to use in the following query:

SELECT noise(sum(a)), noise(sum(b)) FROM data

This is not possible with the current implementation, because noise can either be registered as a plugin that takes in a column named 'a' or as a plugin that takes in a column named 'b'. I have a collection of plugins for the Python and Rust APIs that work with arbitrary input names, input data types, and even number of arguments, but their common usage doesn't pass these SQL schema validation checks.

I don't see a reason to have these checks, but I also don't have a complete understanding of the Polars codebase, so there may be ramifications I'm missing.

@Shoeboxam
Copy link
Author

Shoeboxam commented Dec 7, 2024

Option 2: change the FunctionRegistry trait from

fn get_udf(&self, name: &str) -> PolarsResult<Option<UserDefinedFunction>>;

to:

    //                          ++++++++++++++++++++
    fn get_udf(&self, name: &str, fields: Vec<Field>) -> PolarsResult<Option<UserDefinedFunction>>;

So that the implementation could tailor the returned UserDefinedFunction to suit the expected schema.

I think fn register in FunctionRegistry could also be removed, as it's not used by Polars and the user defines and can build their registry type themselves.

If you'd rather, I could update the PR in this direction.

@Shoeboxam
Copy link
Author

Shoeboxam commented Dec 7, 2024

After thinking about it some more, I already bake schema validation logic into GetOutput, so keeping the existing validation (option 2) would be redundant. Unless GetOutput is really only ever called at runtime, I think the best approach would be to remove UserDefinedFunction.input_fields completely, and replace call with call_unchecked.

@Shoeboxam
Copy link
Author

@ritchie46: Updated to remove UserDefinedFunction.input_fields completely, based on above reasoning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

option to call udf without schema validation
2 participants