Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make expression manipulation consistent and easier to use: combine/split filter conjunction, etc #3810

Merged
merged 3 commits into from
Oct 15, 2022

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Oct 12, 2022

DRAFT as it builds on #3809

Which issue does this PR close?

Closes #3808

Rationale for this change

The APIs for manipulating expressions were all over the map (sometimes return Vec, sometimes taking them as mut, owned, etc) as well as being inconsistently named and inconsistently tested.

In fact I couldn't find uncombine_filter (which is similar, but not the same as split_conjunction)

What changes are included in this PR?

Change the APIs to be consistently named and take reasonable arguments

  1. Change split_conjunction to return a Vec thus simplifying the API
  2. Rename split_filter to split_conjunction_owned to make it clear how it is related to split_conjunction
  3. Expand test coverage
  4. Rename combine_filter to conjunction
  5. Rename combine_filter_disjunction to disjunction
  6. Change APIs for conjunction and disjunction to avoid clones

I will highlight the changes inline.

Are there any user-facing changes?

If anyone uses these APIs they will need to change them, but hopfully

@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules labels Oct 12, 2022
@@ -766,7 +766,8 @@ mod tests {
if !actual.is_empty() {
actual += "\n";
}
actual += &format!("{}", plan.display_indent());
use std::fmt::Write as _;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clippy told me to do this

@@ -341,8 +341,7 @@ fn optimize(plan: &LogicalPlan, mut state: State) -> Result<LogicalPlan> {
}
LogicalPlan::Analyze { .. } => push_down(&state, plan),
LogicalPlan::Filter(Filter { input, predicate }) => {
let mut predicates = vec![];
utils::split_conjunction(predicate, &mut predicates);
let predicates = utils::split_conjunction(predicate);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is an example where the split_conjuntion API is easier to use now

@@ -175,7 +173,7 @@ fn optimize_where_in(
// build subquery side of join - the thing the subquery was querying
let subqry_alias = format!("__sq_{}", optimizer_config.next_id());
let mut subqry_plan = LogicalPlanBuilder::from((*subqry_input).clone());
if let Some(expr) = combine_filters(&other_subqry_exprs) {
if let Some(expr) = conjunction(other_subqry_exprs) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is an example of less copying (can use other_subqry_exprsdirectly)

@alamb alamb force-pushed the alamb/better_expr_api branch 2 times, most recently from ae47468 to 6e45c74 Compare October 12, 2022 18:59
/// Splits an owned conjunctive [`Expr`] such as `A AND B AND C` => `[A, B, C]`
///
/// See [`split_conjunction`] for more details.
pub fn split_conjunction_owned(expr: Expr) -> Vec<Expr> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function used to be called uncombine_filters which was not at all consistent with the split_conjunction that took a reference 🤯

@github-actions github-actions bot removed the logical-expr Logical plan and expressions label Oct 12, 2022
@alamb alamb force-pushed the alamb/better_expr_api branch from 6e45c74 to 0dd3db4 Compare October 12, 2022 19:34
@alamb alamb marked this pull request as ready for review October 12, 2022 19:34
@alamb alamb changed the title Improve the ergonomics of expression manipulation: combine/split filter conjunction, etc Make expression manipulation consistent and easier to use: combine/split filter conjunction, etc Oct 12, 2022
Copy link
Member

@jackwener jackwener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job👍

}
Expr::Alias(expr, _) => {
split_conjunction(expr, predicates);
Expr::Alias(expr, _) => split_conjunction_impl(expr, exprs),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

fn combine_zero_filters() {
let result = combine_filters(&[]);
assert_eq!(result, None);
fn test_split_conjunction() {
Copy link
Member

@jackwener jackwener Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, I think we can add test for conjunction(). At the same time, we can check the tree structure of this expression by using match.

expr is (A B) C;

/// using `match` to check.
match expr {
   And(
      And(
         B,
         C
      ),
      C
   )
}
[A, B, C ,D , E] -> (((A B) C) (D E)) is different from ((((A B) C) D) E).

we can see the result of conjunction() in the UT, ensure the result of tree structure of conjunction()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a great idea @jackwener -- thank you. I added this test 09f3cf4

As part of writing tests, I also found that the API for disjunction was slightly different than conjunction so I made them the same as well.

@alamb alamb merged commit fc5081d into apache:master Oct 15, 2022
@alamb alamb deleted the alamb/better_expr_api branch October 15, 2022 11:57
@ursabot
Copy link

ursabot commented Oct 15, 2022

Benchmark runs are scheduled for baseline = 0b90a8a and contender = fc5081d. fc5081d is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consolidate Expr manipulation code so it is more discoverable and make it easier to use
3 participants