Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples: boundary analysis example for AND/OR conjunctions #14735

Conversation

clflushopt
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

The goal of this change is to supplement the example in #14688 with a more useful example that demonstrates the current state of boundary analysis for AND conjunctions; we also explain and show that OR conjunctions are not currently supported.

What changes are included in this PR?

In datafusion-examples/examples/expr_api.rs there is now a new function that explains data flow during boundary analysis and selectivity calculation for the expression age > 18 and <= 25 with some comments to explain what
the logic under the hood does.

In a separate commit, we are also adding a brief explainer of the boundary analysis framework to the Query Optimizer section in the library guide. The goal is to have some supporting documentation that collects, summaries and distills
the information currently documented in the different pull requests and the original design doc (see #3929).

Are these changes tested?

The change re-uses parts of the API that are already covered by existing tests.

Are there any user-facing changes?

Yes, this introduces a new example.

@clflushopt clflushopt changed the title feat(examples): boundary analysis example for the case of conjunctions Examples: boundary analysis example for AND/OR conjunctions Feb 18, 2025
This change adds a short section in the Query Optimizer page of the
library guide that gives a brief overview of boundary analysis and
cardinality estimation and their role during query optimization.
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Feb 19, 2025
@clflushopt
Copy link
Contributor Author

@alamb @berkaysynnada this is the follow up for #14688

@alamb
Copy link
Contributor

alamb commented Feb 20, 2025

Thanks @clflushopt -- I will review this one over the next few days hopefully

Copy link
Contributor

@berkaysynnada berkaysynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like your explanations and example scenarios @clflushopt, thank you!

When we support OR operation, let's not forget to update the example :)

BTW, did you notice #14699 ? I believe we can switch to that API soon, and updating query-optimizer.md would be super-cool with this new framework (actually we need to, as there will not be Precision enum etc. anymore)

@@ -84,6 +84,9 @@ async fn main() -> Result<()> {
// See how to analyze boundaries in different kinds of expressions.
boundary_analysis_and_selectivity_demo()?;

// See how boundary analysis works for `AND` & `OR` conjunctions.
bounday_analysis_in_conjuctions_demo()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bounday_analysis_in_conjuctions_demo()?;
boundary_analysis_in_conjuctions_demo()?;

}
/// This function shows how to think about and leverage the analysis API
/// to infer boundaries in `AND` & `OR` conjunctions.
fn bounday_analysis_in_conjuctions_demo() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn bounday_analysis_in_conjuctions_demo() -> Result<()> {
fn boundary_analysis_in_conjuctions_demo() -> Result<()> {

@clflushopt
Copy link
Contributor Author

@alamb any suggestions on what to improve here vis-a-vis documentation or the example since #14699 has been merged ?

@clflushopt
Copy link
Contributor Author

@alamb @ozankabak do we plan to have open issues for the follow up changes described in #14699 ? I am specifically trying to figure out whether we should address those items as part of #3929 ?

@ozankabak
Copy link
Contributor

Indeed we will open an epic to plan the remaining statistics infrastructure work. I plan to put something together today/tomorrow.

@alamb
Copy link
Contributor

alamb commented Feb 25, 2025

I am osrry -- I simply haven't had a chance to reivew this PR yet. I hope to get to there today or tomorrow

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @clflushopt and @berkaysynnada -- this is a great addition. I learned quite a bit 👍


#### `AnalysisContext` API

The `AnalysisContext` serves as a shared knowledge base during expression evaluation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be cool to link to the rust docs for these structures too

For example

https://docs.rs/datafusion/latest/datafusion/physical_expr/struct.AnalysisContext.html

Maybe as a follow on PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 will do !

@alamb alamb merged commit 58330b6 into apache:main Feb 28, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants