-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add pattern for simplifying exprs like str ~ '^foo$'
#6369
Conversation
.all(|h| matches!(h.kind(), HirKind::Literal(_))) | ||
} | ||
|
||
/// extracts a string literal expression assuming that `is_anchored_literal()` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// extracts a string literal expression assuming that `is_anchored_literal()` | |
/// extracts a string literal expression assuming that [`is_anchored_literal`] |
assert_change( | ||
regex_not_match(col("c1"), lit("^foo$")), | ||
col("c1").not_eq(lit("foo")), | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have another "unsupported" test here for (use assert_no_change
, see other cases in this test):
^foo|bar$
(OR within anchor)^(foo)(bar)$
(I think this is a concat or more than 3 elements)^
and$
(single element)$^
and$foo^
(anchors flipped)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, also added a test that combines anchored regexes with OR-chains, since lower_simple
is called recursively for that case.
To be merged once CI is green, thank you. |
@@ -2453,6 +2474,19 @@ mod tests { | |||
.and(not_like(col("c1"), "%bar%")) | |||
.and(not_like(col("c1"), "%baz%")), | |||
); | |||
// both anchored expressions (translated to equality) and unanchored | |||
assert_change( | |||
regex_match(col("c1"), lit("foo|^x$|baz")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a pretty neat rewrite
Thanks @wolffcm and @crepererum ! |
_ => None, | ||
} | ||
} | ||
|
||
fn lower_simple(mode: &OperatorMode, left: &Expr, hir: &Hir) -> Option<Expr> { | ||
println!("Considering hir kind: mode {mode:?} hir: {hir:?}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed this stray println
while reviewing this code so I propose removing it: #6375
Which issue does this PR close?
Closes #6367.
Rationale for this change
This should be a performance improvement for queries that contain this pattern, and it may also be easier to push down predicates that use
=
or!=
rather than regexes.It will also make it easier to find predicates that are looking for the empty string (an important case for IOx's InfluxQL support).
What changes are included in this PR?
This adds a pattern to
ExprSimplifier
to find regexes of the form^foo$
and uses them to compare with a string literal instead of doing regex matching.Are these changes tested?
I included unit tests with the other regex simplifications that are performed. I also tested manually with
datafusion-cli
.Are there any user-facing changes?
No.