Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add pattern for simplifying exprs like str ~ '^foo$' #6369

Merged
merged 2 commits into from
May 17, 2023

Conversation

wolffcm
Copy link
Contributor

@wolffcm wolffcm commented May 17, 2023

Which issue does this PR close?

Closes #6367.

Rationale for this change

This should be a performance improvement for queries that contain this pattern, and it may also be easier to push down predicates that use = or != rather than regexes.

It will also make it easier to find predicates that are looking for the empty string (an important case for IOx's InfluxQL support).

What changes are included in this PR?

This adds a pattern to ExprSimplifier to find regexes of the form ^foo$ and uses them to compare with a string literal instead of doing regex matching.

Are these changes tested?

I included unit tests with the other regex simplifications that are performed. I also tested manually with datafusion-cli.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the optimizer Optimizer rules label May 17, 2023
.all(|h| matches!(h.kind(), HirKind::Literal(_)))
}

/// extracts a string literal expression assuming that `is_anchored_literal()`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// extracts a string literal expression assuming that `is_anchored_literal()`
/// extracts a string literal expression assuming that [`is_anchored_literal`]

assert_change(
regex_not_match(col("c1"), lit("^foo$")),
col("c1").not_eq(lit("foo")),
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have another "unsupported" test here for (use assert_no_change, see other cases in this test):

  • ^foo|bar$ (OR within anchor)
  • ^(foo)(bar)$ (I think this is a concat or more than 3 elements)
  • ^ and $ (single element)
  • $^ and $foo^ (anchors flipped)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, also added a test that combines anchored regexes with OR-chains, since lower_simple is called recursively for that case.

@crepererum
Copy link
Contributor

crepererum commented May 17, 2023

To be merged once CI is green, thank you.

@@ -2453,6 +2474,19 @@ mod tests {
.and(not_like(col("c1"), "%bar%"))
.and(not_like(col("c1"), "%baz%")),
);
// both anchored expressions (translated to equality) and unanchored
assert_change(
regex_match(col("c1"), lit("foo|^x$|baz")),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a pretty neat rewrite

@alamb
Copy link
Contributor

alamb commented May 17, 2023

Thanks @wolffcm and @crepererum !

@alamb alamb merged commit 3e3e9b5 into apache:main May 17, 2023
_ => None,
}
}

fn lower_simple(mode: &OperatorMode, left: &Expr, hir: &Hir) -> Option<Expr> {
println!("Considering hir kind: mode {mode:?} hir: {hir:?}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this stray println while reviewing this code so I propose removing it: #6375

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support simplifying expressions like str ~ '^foo$'
3 participants