-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review use of logical expressions in physical AggregateFunctionExpr #11359
Comments
@ozankabak had asked for whether there is anyway to entirely getting rid of logical expressions in discord, so I think we can review about the challenge I had before. The reason why there are logical expressions in I think that it is also the main reason that blocking us. I propose to redesign about the role of each crate. To able to deal with physical concept for The crate graph is like graph TD;
functions-aggregates-common-->expr-common;
functions-aggregates-common-->physical-expr-common;
functions-aggregates-->functions-aggregates-common;
functions-aggregates-->expr;
physical-expr-common-->expr-common;
expr-->expr-common;
expr-->functions-aggregates-common;
physical-expr-->physical-expr-common;
core-->functions-aggregates;
core-->physical-expr;
third-parties-aggregate --> functions-aggregates-common;
The more detail of discussion before is in #10074 With this approach, function like @alamb I think we can review about this idea again, the previous concern is that
I think it is not an issue anymore. |
We have at least two such functions and it would be great to arrive at a design that eliminates such functions. Conceptually, we should have more info at the physical level (relative to logical level), so a refactor that makes |
I think we tend to split crate aggressively so others can only import the necessary crate they need. |
Splitting crates when beneficial is great, maybe we haven't arrived at the best design in terms of structure yet (as evidenced by this issue). I think we can use your previous work as a starting point to improve the structure and also resolve this current issue 🚀 |
I agree with @jayzhan211 that the core of the problem is that the user defined API for aggregates is in datafusion_expr so can only use However, the same basic problem could be claimed for It does feel like the way out of this is to make the spit between API and implementation more explicit. Maybe instead of So like
|
I agree we should document what the fields are used for now I personally recommend we finish #8708 before we try to do some other crate refactor. We are close with that one and once we have all the aggregates going through the same APIs I think we'll be in a better position to split things apart |
Cool, maybe I could think about pulling down functions trait from |
I think it is possible to have
Not sure about the reason for
The original idea of
The crate graph is like graph TD;
expr-api --> physical-expr-common;
physical-expr-common --> expr;
functions-aggregate --> expr-api;
third-parties-aggregate --> expr-api;
or graph TD;
expr-provider --> expr-functions;
expr-functions --> physical-expr-common;
physical-expr-common --> expr;
functions-aggregate --> expr-functions;
third-parties-aggregate --> expr-functions;
|
Expr::AggregateFunction(AggregateFunction)
pub struct AggregateFunction {
/// Name of the function
pub func: Arc<crate::AggregateUDF>,
..
}
pub struct AggregateUDF {
inner: Arc<dyn AggregateUDFImpl>,
} I guess #10327 is the only possible solution |
I found The crate graph is now like Since we would like to import Therefore, we come out several Physical expr like Column, Cast, Literal are moved back to physical-expr for now. graph TD;
physical-expr-common --> expr-common;
functions-aggregate-common --> physical-expr-common;
expr --> functions-aggregate-common;
physical-expr-functions-aggregate --> expr;
physical-expr --> expr;
functions-aggregate --> physical-expr;
|
Is your feature request related to a problem or challenge?
DataFusion 40.0.0 added a new
logical_args: Vec<Expr>
field toAggregateFunctionExpr
, which seems confusing, and there is no documentation in this struct that explains what this field is used for.In DataFusion Comet, we do not use DataFusion's logical plan or expressions because we are translating an Apache Spark physical plan into a DataFusion physical plan and therefore we have no logical expressions to pass into this new field.
I think at a minimum we should add some documentation around this new feature.
The text was updated successfully, but these errors were encountered: