-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC / Prototype user defined sql planner might look like #11168
Conversation
@@ -236,11 +237,28 @@ impl PlannerContext { | |||
} | |||
} | |||
|
|||
/// This trait allows users to customize the behavior of the SQL planner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to pull the custom logic for planning binary operators into a trait -- then we can extend the sql planner by providing trait instances
I think this is beneficial because:
- It avoids having a DataFusion
Operator
enum that only exists to be transformed later - It avoids the planner having hard coded assumptions about function names (like "array_append")
I think this kind of approach is what might close #10534
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this seems like a fair approach.
It's worth noting that we could do some of this with #11137 - e.g. we could move the array stuff into a ParseCustomOperator
, I don't really mind which approach we go with provided the flexibility is there.
@@ -1017,6 +979,71 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { | |||
} | |||
} | |||
|
|||
pub struct ArrayFunctionPlanner { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking is that this ArrayFunctionPlanner
would be in the array_functions crate and thus avoid the implicit dependency on names of functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise LGTM
SqlToRel { | ||
context_provider, | ||
options, | ||
normalizer: IdentNormalizer::new(normalize), | ||
planners: vec![], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just put the array_planner
in the vec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking was that I eventually wanted to move the definition into a different crate datafusion-functions-array
@@ -236,11 +237,28 @@ impl PlannerContext { | |||
} | |||
} | |||
|
|||
/// This trait allows users to customize the behavior of the SQL planner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this seems like a fair approach.
It's worth noting that we could do some of this with #11137 - e.g. we could move the array stuff into a ParseCustomOperator
, I don't really mind which approach we go with provided the flexibility is there.
Indeed -- thanks. Let's see if @jayzhan211 has any thoughts (I think it may be their night) |
It is a great idea overall! There are only two things that I think we could improve on
Given the current design, if we want to support for planner in self.planners.iter() {
if let Some(expr) =
planner.plan_binary_op(op.clone(), left.clone(), right.clone(), schema)?
{
return Ok(expr);
}
if let Some(expr) =
planner.plan_indices(expr....)?
{
return Ok(expr);
}
...
} We could define a more general function and enum that includes all the possible ast node. I'm not sure if this is general enough since these are the only two things that I know are rewritten to function in datafusion right now. // Node from sqlparser that are possible for rewrite
pub enum AstNode (or Operators) {
BinaryOperator(sqlparser::ast::BinaryOperator),
Subscript(sqlparser::ast::Subscript), // For FieldAccess rewrite
OR GetFieldAccess(GetFieldAccess) // We can use GetFieldAccess directly, if there is little possible that they need to customize Subscript to GetFieldAccess logic.
}
/// This trait allows users to customize the behavior of the SQL planner
pub trait UserDefinedPlanner {
fn plan(&self, ast_node: AstNode, exprs: Vec<Expr>, _schema: &DFSchema) -> Result<Option<Expr>> {
Ok(None)
}
} |
I find a way to avoid clone, returns tuple pub struct PlannedNode {
pub op: sqlparser::ast::BinaryOperator,
pub left: Expr,
pub right: Expr,
}
/// This trait allows users to customize the behavior of the SQL planner
pub trait UserDefinedPlanner {
/// Plan the binary operation between two expressions, return None if not possible
fn plan_binary_op(
&self,
planned_node: PlannedNode,
_schema: &DFSchema,
) -> Result<(Option<PlannedNode>, Option<Expr>)> {
Ok((Some(planned_node), None))
}
} fn build_logical_expr(
&self,
op: sqlparser::ast::BinaryOperator,
left: Expr,
right: Expr,
schema: &DFSchema,
) -> Result<Expr> {
let num_planners = self.planners.len();
// try extension planers
let mut planned_node = PlannedNode{op, left, right};
for (i, planner) in self.planners.iter().enumerate() {
match planner.plan_binary_op(planned_node, schema)? {
(None, Some(expr)) => return Ok(expr),
(Some(n), None) if i + 1 < num_planners => planned_node = n,
(Some(n), None) if i + 1 == num_planners => {
return Ok(Expr::BinaryExpr(BinaryExpr::new(
Box::new(n.left),
self.parse_sql_binary_op(n.op)?,
Box::new(n.right),
)))
}
_ => unreachable!(""),
}
}
unreachable!("")
} |
That is a good idea. I think we might be able to name the tuple with an explicitly datafusion/datafusion/expr/src/simplify.rs Lines 113 to 119 in c80da91
|
Yes, good idea
I agree there is a tradeoff between the number of functions in the trait vs how general this custom planning could be
Yeah, I am not sure how much more custom planning would be needed. I think part of most special syntaxes have embedded sub expressions, so Thus, while less flexible, I think adding a specific function for each type of SQL ast node that we permit customization of would be the easiest to understand and use |
Here is a proposal for how to proceed:
What do you think @jayzhan211 and @samuelcolvin ? If you think this is a reasonable idea, do either if you have the time to do 2? If not, I will find the time to bash it out |
I agree, we can start from this, it is easy to extend it in the future if there are more kinds of plan
I can work on this |
This sounds like a good plan. We (I) should also adapt/restart datafusion-contrib/datafusion-functions-json#22 to work with this API to prove it does what we need before merging. |
I agree -- sounds like a plan. Thank you both! I will make sure to review the code from @jayzhan211 as soon as possible so we can get this done quickly This is going to be great! |
I don't intend to merge this PR.
Which issue does this PR close?
Prototype of what the API described in #10534 might look like
Inspired by #11155 from @jayzhan211 and @samuelcolvin on #11137 there
Rationale for this change
I feel like our sql planner needs some way for it to be extended so that we can plan sql queries with operators that are not hard coded into the core engine. There are currently two different approaches
array @> array
andarray <@ array
in sql_expr_to_logical_expr #11155 from @jayzhan211 for operators --> functionsWhat changes are included in this PR?
Prototype out what a "user defined planner" might look like
Are these changes tested?
Are there any user-facing changes?