-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Examples: boundary analysis example for AND/OR
conjunctions
#14735
Examples: boundary analysis example for AND/OR
conjunctions
#14735
Conversation
AND/OR
conjunctions
This change adds a short section in the Query Optimizer page of the library guide that gives a brief overview of boundary analysis and cardinality estimation and their role during query optimization.
@alamb @berkaysynnada this is the follow up for #14688 |
Thanks @clflushopt -- I will review this one over the next few days hopefully |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like your explanations and example scenarios @clflushopt, thank you!
When we support OR operation, let's not forget to update the example :)
BTW, did you notice #14699 ? I believe we can switch to that API soon, and updating query-optimizer.md would be super-cool with this new framework (actually we need to, as there will not be Precision enum etc. anymore)
@@ -84,6 +84,9 @@ async fn main() -> Result<()> { | |||
// See how to analyze boundaries in different kinds of expressions. | |||
boundary_analysis_and_selectivity_demo()?; | |||
|
|||
// See how boundary analysis works for `AND` & `OR` conjunctions. | |||
bounday_analysis_in_conjuctions_demo()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bounday_analysis_in_conjuctions_demo()?; | |
boundary_analysis_in_conjuctions_demo()?; |
} | ||
/// This function shows how to think about and leverage the analysis API | ||
/// to infer boundaries in `AND` & `OR` conjunctions. | ||
fn bounday_analysis_in_conjuctions_demo() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn bounday_analysis_in_conjuctions_demo() -> Result<()> { | |
fn boundary_analysis_in_conjuctions_demo() -> Result<()> { |
@alamb @ozankabak do we plan to have open issues for the follow up changes described in #14699 ? I am specifically trying to figure out whether we should address those items as part of #3929 ? |
Indeed we will open an epic to plan the remaining statistics infrastructure work. I plan to put something together today/tomorrow. |
I am osrry -- I simply haven't had a chance to reivew this PR yet. I hope to get to there today or tomorrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @clflushopt and @berkaysynnada -- this is a great addition. I learned quite a bit 👍
|
||
#### `AnalysisContext` API | ||
|
||
The `AnalysisContext` serves as a shared knowledge base during expression evaluation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be cool to link to the rust docs for these structures too
For example
https://docs.rs/datafusion/latest/datafusion/physical_expr/struct.AnalysisContext.html
Maybe as a follow on PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 will do !
Which issue does this PR close?
Rationale for this change
The goal of this change is to supplement the example in #14688 with a more useful example that demonstrates the current state of boundary analysis for
AND
conjunctions; we also explain and show thatOR
conjunctions are not currently supported.What changes are included in this PR?
In
datafusion-examples/examples/expr_api.rs
there is now a new function that explains data flow during boundary analysis and selectivity calculation for the expressionage > 18 and <= 25
with some comments to explain whatthe logic under the hood does.
In a separate commit, we are also adding a brief explainer of the boundary analysis framework to the Query Optimizer section in the library guide. The goal is to have some supporting documentation that collects, summaries and distills
the information currently documented in the different pull requests and the original design doc (see #3929).
Are these changes tested?
The change re-uses parts of the API that are already covered by existing tests.
Are there any user-facing changes?
Yes, this introduces a new example.