-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Serde for ScalarUDF in Physical Expressions #9436
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @thinkharderdev
I began this PR for physical-expr and I have a question.
Would like to see your thought!
} else if let Some(expr) = expr.downcast_ref::<LikeExpr>() { | ||
} else { | ||
let mut buf = Vec::new(); | ||
// let _ = codec.try_encode_udf(, &mut buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question, how can I get a ScalarUDF
from a ScalarFunctionExpr
?
Given its current structure, I would think change the fields in ScalarFunctionExpr
may be a method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also tried using create_udf
, but it seems can not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this is what you mean in your below comment but I think we need to change
pub struct ScalarFunctionExpr {
fun: ScalarFunctionImplementation,
to
pub struct ScalarFunctionExpr {
fun: ScalarFunctionDefinition,
@alamb This would technically be a breaking change since ScalarFunctionExpr
has a public method
pub fn fun(&self) -> &ScalarFunctionImplementation {
&self.fun
}
We could technically do this is in a backwards compatible way I think by creating a corresponding ScalarFunctionImplementation
from the ScalarFunctionImplementation
or we could just change the public API. Seems to me like this would probably not be a very disruptive breaking change since ScalarFunctionImplementation
is just an opaque function pointer so unlikely this is being used in application code in a way that wouldn't be easily adaptable if it started returning &ScalarFunctionDefinition
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review! @thinkharderdev
Yeah, the change is what I thought. And I also think some other fields in ScalarFunctionExpr
can be removed like name
and return_type
.
Yeah, I think create a ScalarFunctionImplementation from ScalarFunctionDefinition is a method. Given the BuiltinScalarFunction
will be removed and we can get the inner
of a ScalarUDF which is a ScalarUDFImpl
, and we can do something like.
let captured = inner.clone();
Arc::new(move |args| captured.invoke(args))
I'd like to see the opinions from you and @alamb :)
}; | ||
|
||
/// Physical expression of a scalar function | ||
pub struct ScalarFunctionExpr { | ||
fun: ScalarFunctionImplementation, | ||
fun: ScalarFunctionDefinition, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the fun
here can be a ScalarFunctionDefinition
to get the ScalarUDF
from ScalarFunctionExpr
?
Also I searched the code base and found ScalarFunctionImplementation is only used for:
- BuiltinScalarFunction
- method like
create_udf
which I think will be deprecated in the future given the newScalarUDFImpl
way.
So I thinkScalarFunctionImplementation
will also be deleted in a day. 🤔 WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this should be ScalarFunctionDefinition
here. I don't think create_udf
will be deprecated but we have ScalarUdfLegacyWrapper
to wrap ScalarFunctionImplementation
and create a ScalarUDFImpl
out of it. So the new ScalarFunctionDefinition
would include anything that is represented as a ScalarFunctionImplementation
I think I need some feedback on #9436 (comment) and #9436 (comment) to continue this PR. 🤔 |
I think it is fine to replace
|
A question is, currently there're still BuiltInFunctions, should ScalarUDF be ScalarFunctionDefinition? |
Yeah I think that's correct |
I'm waiting on this #9537 to be merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @thinkharderdev
I finished the PR for Serde in PhysicalExpr
, though a little delay
Could you review it when you are available?
Thanks!
Arc::new(|_: &[ColumnarValue]| unimplemented!("not implemented")), | ||
ScalarFunctionDefinition::UDF(Arc::new(ScalarUDF::new_from_impl( | ||
DummyUDF::new(), | ||
))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since here ScalarFunctionExpr
needs a ScalarFunctionDefinition
, I created one for each. Don't know whether there's a better way to do it.
@@ -57,7 +58,7 @@ pub fn create_physical_expr( | |||
fun: &BuiltinScalarFunction, | |||
input_phy_exprs: &[Arc<dyn PhysicalExpr>], | |||
input_schema: &Schema, | |||
execution_props: &ExecutionProps, | |||
_execution_props: &ExecutionProps, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the ExecutionProps
param can be moved, but I'd like to do it in a follow-up PR because doing all in a PR may make me less easy to inspect error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird, seems like it never needed this param
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it doesn't need this param now, but before it needs this for BuiltInFunction to create physical function for some Date function like to_date. Since these functions has been ported, it will not need this param anymore.
let monotonicity = fun.monotonicity(); | ||
|
||
let fun_def = ScalarFunctionDefinition::BuiltIn(*fun); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I just pass the fun_def
, will call create_physical_fun
when execute
the PhysicalExpr
@@ -171,8 +171,17 @@ impl PhysicalExpr for ScalarFunctionExpr { | |||
.collect::<Result<Vec<_>>>()?, | |||
}; | |||
|
|||
let fun_implementation = match self.fun { | |||
ScalarFunctionDefinition::BuiltIn(ref fun) => create_physical_fun(fun)?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it will call create_physical_fun
to get the FunctionImplementation
let expr = parse_physical_expr(expr.as_ref(), registry, input_schema)?; | ||
let codec = DefaultPhysicalExtensionCodec {}; | ||
let expr = | ||
parse_physical_expr(expr.as_ref(), registry, input_schema, &codec)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I create DefaultPhysicalExtensionCodec
for cases that will not need ExtensionCodec
let udf = match &e.fun_definition { | ||
Some(buf) => codec.try_decode_udf(&e.name, buf)?, | ||
None => registry.udf(e.name.as_str())?, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the decode
logic
value: Arc<dyn PhysicalExpr>, | ||
codec: &dyn PhysicalExtensionCodec, | ||
) -> Result<protobuf::PhysicalExprNode, DataFusionError> { | ||
let expr = value.as_any(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied the method from LogicalExpr
@@ -665,6 +670,133 @@ fn roundtrip_scalar_udf() -> Result<()> { | |||
roundtrip_test_with_context(Arc::new(project), ctx) | |||
} | |||
|
|||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the test case, mainly copied from LogicalExpr
Serde test
I believe that CI fails because of some system glitch, so this PR is still ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work! I had a few nitpicks but otherwise I think this is ready
@@ -57,7 +58,7 @@ pub fn create_physical_expr( | |||
fun: &BuiltinScalarFunction, | |||
input_phy_exprs: &[Arc<dyn PhysicalExpr>], | |||
input_schema: &Schema, | |||
execution_props: &ExecutionProps, | |||
_execution_props: &ExecutionProps, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird, seems like it never needed this param
Yep, I re-ran the job and it worked |
Co-authored-by: Dan Harris <1327726+thinkharderdev@users.noreply.github.com>
Co-authored-by: Dan Harris <1327726+thinkharderdev@users.noreply.github.com>
Thanks @thinkharderdev for your review! |
Thank you for your work on this! |
Which issue does this PR close?
Closes #9428.
Rationale for this change
See the issue.
What changes are included in this PR?
To do
Are these changes tested?
Are there any user-facing changes?