-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move create_physical_expr
to physical-expr-common
#10074
Comments
create_physical_sort_exprs
when creating accumulator
for first/last
create_physical_sort_exprs
when creating accumulator
for first/lastcreate_physical_sort_exprs
when creating accumulator
for first/last
This comment was marked as outdated.
This comment was marked as outdated.
create_physical_sort_exprs
when creating accumulator
for first/lastAggregateUDFImpl
to functions-aggregate-common
to enable user-defined function on physical-expr
I play around with it and I think the idea of moving Back to the original issue. To support |
AggregateUDFImpl
to functions-aggregate-common
to enable user-defined function on physical-expr
create_physical_expr
to physical-expr-common
🤔 it seems like the core challenge is that the implementation of the aggregate functions -- specifically https://github.com/apache/arrow-datafusion/blob/671cef85c550969ab2c86d644968a048cb181c0c/datafusion/functions-aggregate/src/first_last.rs#L102-L117 need to have some way to create a Rather than moving Something like pub struct AccumulatorArgs<'a> {
...
/// Function for converting Exprs to PhysicalExprs, if needed
create_physical_expression: &dyn Fn(&Expr) -> Result<Arc<dyn PhysicalExpr>>; |
@alamb |
I think this is the key problem Can we pull |
Moving |
Hmm this is a tricky refactor -- it is like a ball knot in a piece of string -- we just need to keep tugging at it and at some point it will unravel. |
How about we introduce physical aggregate function trait, We create logical aggregate and handle optimization in logical layer. Create physical aggregate via |
Moving AggregateUDFImpl means that we need to move |
I'm thinking of moving The overall idea is to enable us to import common things to graph TD;
functions-aggregates-common-->expr-common;
functions-aggregates-common-->physical-expr-common;
functions-aggregates-->functions-aggregates-common;
functions-aggregates-->expr;
physical-expr-common-->expr-common;
expr-->expr-common;
expr-->functions-aggregates-common;
physical-expr-->physical-expr-common;
core-->functions-aggregates;
core-->physical-expr;
third-parties-aggregate --> functions-aggregates-common;
|
@alamb I try the idea in #10327 and it seems promising! I would like to split the large #10327 into several ones for easier review, before that I would like to know your thought about the design and the dependency in the above graph, does it looks good? My thought about categorize those crate With pub trait AggregateUDFImpl: Debug + Send + Sync {
fn reverse_expr(&self) -> Option<Arc<dyn AggregateExpr>> {
None
}
}
And, we import `functions-aggregate-common` to `expr`, we can have our new `AggregateUDF` in logical layer but user defined physical-expr interaction!
```rust
pub enum AggregateFunctionDefinition {
BuiltIn(aggregate_function::AggregateFunction),
/// Resolved to a user defined aggregate function
UDF(Arc<function::AggregateUDF>),
/// A aggregation function constructed with name. This variant can not be executed directly
/// and instead must be resolved to one of the other variants prior to physical planning.
Name(Arc<str>),
} |
In the chart you have in #10074 (comment) it seems like it will mean that I am sorry I am so slow to respond and getting lost here. Can we take a step back and help figure out what problem we are trying to solve Can you remind me why What I believe we are trying to do is to pull the aggregate functions out of datafusion-physical-expr I am sorry if you have explained this to me already. |
@alamb But, what if we allow We now differentiate common as the higher level crate. As long as we don't introduce things that depend on
This idea is what blocking us! Now, I would allow |
I see -- this makes sense. What I am worried about is that I get the sense that part of the problem is that I wonder if we could try starting with some of the simpler and less used aggregates like That might make it easier to incrementally discover how to structure the dependencies so we could extract the aggregates out of the core 🤔 |
Actually, that is the reason why I work on it first, to ensure the design is solid for most of the functions.
I also thought about this, but I don't think there is more we need to move the code into common, because I agree that the design deserves more discussion. I will work on |
That makes sense to me -- thank you. I will attempt to review those PRs in a timely manner so you can iterate more quickly. I often feel I am slowing you down as I can't review the PRs fast enough |
Is your feature request related to a problem or challenge?
https://github.com/apache/arrow-datafusion/blob/671cef85c550969ab2c86d644968a048cb181c0c/datafusion/functions-aggregate/src/first_last.rs#L102-L117
I found that we need to match
Alias
cases, it seems we should try to reusecreate_physical_sort_exprs
if possibleWe need to move
create_physical_expr
tophysical-expr-common
, and this might involve moving lots of things to physical-expr-common` too, not sure if it is reasonable yet.Describe the solution you'd like
Two places that we need
create_physical_expr
accumulator
withcreate_physical_sort_exprs
.reverse_expr
, convert logical aggr expr to physical aggr expr withcreate_physical_expr
, givenAggregateFunction
expect to getArc<dyn AggregateExpr>
Describe alternatives you've considered
Actually, we have what we need of physical sorting expressionordering_req
inAggregateFunctionExpr
already.But since
AccumulatorArgs
andAggregateUDFImpl
live indatafusion_expr
layer, so we can't get the value from it.I still think it would be nice to have a design that can deal with physical-expr for UDAF 😞🤔Should we move
AggregateUDFImpl
to a crate that imports bothdatafusion-expr
anddatafusion-physical-expr-common
?No response
Additional context
No response
The text was updated successfully, but these errors were encountered: