Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ScalarUDF in terms of ScalarUDFImpl trait #8713

Merged
merged 2 commits into from
Jan 8, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jan 1, 2024

Which issue does this PR close?

Closes #8712

Rationale for this change

ScalarUDF is a struct of function pointers for historical reasons. After #8578, ScalarUDFImpl is available, we can clean up its internal implementation to be in terms of that trait directly, simplifying the implementation greatly.

What changes are included in this PR?

  1. Rewrite ScalarUDF to have a single Arc<dyn ScalarUDFImpl field
  2. Add +Debug to ScalarUDFImpl which forces implementations to also derive Debug which I think is much better than the opaque FUNC that was there previously

Are these changes tested?

yes

Are there any user-facing changes?

New requirement to derive Debug for ScalarUDFImpl but I think that is good hygene anyways

@alamb alamb added the api change Changes the API exposed to users of the crate label Jan 1, 2024
@alamb alamb changed the title Implement ScalarUDF in terms of ScalarUDFImpl trait Implement ScalarUDF in terms of ScalarUDFImpl trait Jan 1, 2024
@@ -1948,6 +1948,7 @@ mod test {
);

// UDF
#[derive(Debug)]
Copy link
Contributor Author

@alamb alamb Jan 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of the API change -- any new impl of ScalarUDFImpl must also derive Debug -- note that ScalarUDFImpl was introduced in #8578 and not yet released so this is not a breaking change for released versions

/// 1. For simple (less performant) use cases, use [`create_udf`] and [`simple_udf.rs`].
///
/// 2. For advanced use cases, use [`ScalarUDFImpl`] and [`advanced_udf.rs`].
///
/// # API Note
Copy link
Contributor Author

@alamb alamb Jan 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I went through this PR, ScalarUDF is now basically a pass through wrapper to ScalarUDFImpl -- if we didn't want to maintain backwards compatibility we could probably simply remove the ScalarUDF struct and make it a trait, but I think that would be super disruptive to all exisiting users of DataFusion so I think we should avoid doing so unless absolutely necessary.

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions optimizer Optimizer rules labels Jan 1, 2024
@@ -292,3 +271,105 @@ pub trait ScalarUDFImpl {
&[]
}
}

/// ScalarUDF that adds an alias to the underlying function. It is better to
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somewhat boilerplate, but it is a pretty straightforward example of using Trait objects to extend functionality

/// [`create_udf`]: crate::expr_fn::create_udf
/// [`simple_udf.rs`]: https://github.com/apache/arrow-datafusion/blob/main/datafusion-examples/examples/simple_udf.rs
/// [`advanced_udf.rs`]: https://github.com/apache/arrow-datafusion/blob/main/datafusion-examples/examples/advanced_udf.rs
#[derive(Clone)]
#[derive(Debug, Clone)]
pub struct ScalarUDF {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API for ScalarUDF is not changed at all -- only its internal implementation

{
// TODO change the internal implementation to use the trait object
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

///
/// If you implement [`ScalarUDFImpl`] directly you should return aliases directly.
pub fn with_aliases(self, aliases: impl IntoIterator<Item = &'static str>) -> Self {
Self::new_from_impl(AliasedScalarUDFImpl::new(self, aliases))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somewhat of a hack (to have a wrapper around the scalar UDF). It may make more sense to simply remove the call to with_aliases -- however, since it was released in datafusion 34.0.0 -- https://docs.rs/datafusion/34.0.0/datafusion/physical_plan/udf/struct.ScalarUDF.html -- that would be a breaking API change.

We could deprecate the API 🤔

}
}

/// Implementation of [`ScalarUDFImpl`] that wraps the function style pointers of the older API
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically the old implementaiton of ScalarUDF moved into its own struct

@@ -36,7 +36,7 @@ pub fn create_physical_expr(

Ok(Arc::new(ScalarFunctionExpr::new(
fun.name(),
fun.fun().clone(),
fun.fun(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a drive by cleanup I noticed while working on the code

@alamb alamb merged commit 746988a into apache:main Jan 8, 2024
22 checks passed
@alamb
Copy link
Contributor Author

alamb commented Jan 8, 2024

Thank you for the review @crepererum

@alamb alamb deleted the alamb/clean_scalar_udf branch January 8, 2024 16:19
@alamb alamb mentioned this pull request Jan 2, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clean internal implementation of ScalarUDF to use ScalarUDFImpl (rather than the function pointers)
2 participants