-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move Array specific rewrites to datafusion_functions_array
AnalyzerRule
#9557
Conversation
datafusion_functions_array
AnalyzerRule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @jayzhan211 -- this is looking really nice.
My biggest suggestion after seeing the code in this PR is about the API design (rather than exposing the entire AnalyzerRule maybe we can simply expose a RewriteRule
)
Let me know what you think. I can also find time to help with this project too if it doesn't make sense to you
@@ -2144,6 +2128,25 @@ impl FunctionRegistry for SessionState { | |||
}) | |||
} | |||
|
|||
/// Add `analyzer_rule` to the `index` of the list of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would a user know what index
to add I wonder 🤔
Also, if the built in analyzer rules were changed, then any hard coded index might become incorrect
Instead of this API, what if we followed the model of the physical_optimizers()
APIL #9575
Specifically, we added a SesssionContext::analyzer_rules()
API that returned all the existing rules? Using that and with_analyzer_rules
the user could put the rules wherever they wanted
For example
let ctx = SessionContext::new();
let mut analyzer_rules = ctx.analyzer_rules();
// insert new rule at index 3
analyzer_rules.insert(3, new_rule);
// update rules in session context
let ctx = ctx.with_analyzer_rules(analyzer_rules);
(I think adding access to the analyzer rules makes sense even if we go with a different approach for rewrite)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, the example above is easier for the user to overwrite the rules. Let me add analyzer_rules
to return the existing rules.
btw, is #9575 a correct link?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think i need to move analyzer_rules
and with_analyzer_rules
under FunctionRegistry, so it can be called in register_all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, is #9575 a correct link?
No sorry -- I meant to link to the SessionCOntext::optimizers()
API. sorry about that
@@ -2179,6 +2182,21 @@ impl FunctionRegistry for SessionState { | |||
fn deregister_udwf(&mut self, name: &str) -> Result<Option<Arc<WindowUDF>>> { | |||
Ok(self.window_functions.remove(name)) | |||
} | |||
|
|||
fn deregister_analyzer_rule(&mut self, index: usize) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we added SessionContext::analyzer_rules
I don't think this API is necessary -- instead the user could just override the existing list.
@@ -90,5 +92,19 @@ pub fn register_all(registry: &mut dyn FunctionRegistry) -> Result<()> { | |||
} | |||
Ok(()) as Result<()> | |||
})?; | |||
|
|||
let rules: Vec<(AnalyzerRuleRef, usize)> = vec![ | |||
// The order here should be adjusted based on the rules defined in the optimizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems quite brittle as it would potentially break if either the users introduced their own analyzer rules or we introduced new ones in DataFusion. I think the RewriteRule
API might avoid this potential problem.
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
let mut rules = registry.analyzer_rules(); | ||
// OperatorToFunction should be run before TypeCoercion, since it rewrite based on the argument types (List or Scalar), | ||
// and TypeCoercion may cast the argument types from Scalar to List. | ||
rules.insert(1, Arc::new(OperatorToFunction::new())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't notice that vec has insert
...
I change the design, feel free to improve or change the code if you have time to help, thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jayzhan211 - I will try and make a proposal shortly
@@ -2156,6 +2138,16 @@ impl FunctionRegistry for SessionState { | |||
}) | |||
} | |||
|
|||
/// Override the `AnalyzerRule`s optimizer plan rules. | |||
fn with_analyzer_rules(&mut self, rules: Vec<AnalyzerRuleRef>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
@jayzhan211 here is the alternate design I was hinting at -- #9583 Please let me know what you think if you have time |
Which issue does this PR close?
Closes #9519
Rationale for this change
What changes are included in this PR?
Use
LinkedList
for analyzer rules, unlike udfs, we need to preserve order and duplication is acceptable.Move AnalyzerRule into
datafusion_expr
, and register the rule into optimizer throughSessionState
Are these changes tested?
Are there any user-facing changes?