-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ES|QL: Skip CASE function from InferIsNotNull rule checks #113123
Conversation
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Hi @astefan, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
@@ -106,6 +107,6 @@ private boolean doResolve(Expression exp, AttributeMap<Expression> aliases, Set< | |||
} | |||
|
|||
private static boolean skipExpression(Expression e) { | |||
return e instanceof Coalesce; | |||
return e instanceof Coalesce || e instanceof Case; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: in case any other change is required to the PR, maybe move the comment from the call site as function javadoc/comment.
public void testIsNotNullOnCoalesce() { | ||
var plan = localPlan(""" | ||
from test | ||
| where coalesce(emp_no, salary) is not null | ||
"""); | ||
|
||
var limit = as(plan, Limit.class); | ||
var filter = as(limit.child(), Filter.class); | ||
var inn = as(filter.condition(), IsNotNull.class); | ||
var coalesce = as(inn.children().get(0), Coalesce.class); | ||
assertThat(Expressions.names(coalesce.children()), contains("emp_no", "salary")); | ||
var source = as(filter.child(), EsRelation.class); | ||
} | ||
|
||
public void testIsNotNullOnExpression() { | ||
var plan = localPlan(""" | ||
from test | ||
| eval x = emp_no + 1 | ||
| where x is not null | ||
"""); | ||
|
||
var limit = as(plan, Limit.class); | ||
var filter = as(limit.child(), Filter.class); | ||
var inn = as(filter.condition(), IsNotNull.class); | ||
assertThat(Expressions.names(inn.children()), contains("x")); | ||
var eval = as(filter.child(), Eval.class); | ||
filter = as(eval.child(), Filter.class); | ||
inn = as(filter.condition(), IsNotNull.class); | ||
assertThat(Expressions.names(inn.children()), contains("emp_no")); | ||
var source = as(filter.child(), EsRelation.class); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(just checking: these are moved only, right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry. I forgot to add a comment after I created the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @astefan, it works and it fixes a problem, so it's a 👍 for me, but I'd like to have a bit more context (at least a comment) on the logic behind this functions blacklist
@@ -106,6 +107,6 @@ private boolean doResolve(Expression exp, AttributeMap<Expression> aliases, Set< | |||
} | |||
|
|||
private static boolean skipExpression(Expression e) { | |||
return e instanceof Coalesce; | |||
return e instanceof Coalesce || e instanceof Case; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to understand why these two functions are special, at least with a comment, and eventually extract a general rule (a flag interface or a method in the Expression class to identify the functions that fall into this category).
The only patterns I can see are:
- both have
Nullability.UNKNOWN
(but also BinaryLogic does, yet it's not listed here). The javadoc mentions unknown nullability as well:Simplify IsNotNull targets by resolving the underlying expression to its root fields with unknown nullability
... - both return the value of one of the expressions passed as params (but other functions do as well, eg.
GREATEST()
)
Can you please elaborate a bit on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After spending some time on this issue when it was initially reported, my thinking is the following:
- Nullability has nothing to do with this optimization. Nullability is most useful in the LogicalPlanOptimizer to shortcircuit folding to
null
expressions. At this point, in the local logical optimizer, the Nullability did its job. - the fact that Nullability is the same (and unknown) for these two functions (for
coalesce
it can also befalse
) is a fact indeed, but conceptually speaking I think it's not relevant - these two functions have special behavior in the sense that they can mess up the
is not null
predicate when it's used on them. They can accept complex expressions where the fields used can also be checked fornull
. For example,case (x is null,.....) is not null
cannot be transformed incase(....) is not null AND x is not null
because there is already an expression that contradictsx is null
.- I have tried to come up with a smarter way of evaluating the expressions inside a
case
and determine if the fields there end up asx is null
but it proved to be more complicated and I found this not worth the effort now. Maybe we can come back to this and be smarter aboutcase
andcoalesce
but we need a stronger reason to do it. Also, this bug is serious enough that it warrants this simple exclusion and fix.
- I have tried to come up with a smarter way of evaluating the expressions inside a
- the flag interface or a method in the Expression class at this point in the life of ES|QL (where we welcome any contribution from outside) is, imo, not warranted: we will make the behavior of a function more complex than it needs and it will confuse contributors even more. For now, I prefer to have this logic centralized in an optimization rule. Functions land should be a more approachable one, optimization rules are left for advanced logic and knowledge.
I will add a comment in the PR for this method, @bpintea also mentioned this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these two functions have special behavior in the sense that they can mess up the is not null
I don't completely agree with this statement. These two functions mess up with this rule, not with IsNotNull
.
They can accept complex expressions where the fields used can also be checked for null
This is closer to the point IMHO: the real peculiarity is that these functions have a custom behavior with null inputs, ie. even if one input is null, the function can still be not null, so func(input1, input2, ...) IS NOT NULL
is not equivalent to input1 IS NOT NULL and func(input1, input2, ...) IS NOT NULL
.
I think we can easily express this as a method.
The danger comes from the fact that this is an implementation detail, peculiar to each function.
Or better, it's how our code generator creates evaluators (Coalesce
and Case
have a custom evaluator, not generated).
This makes the risk pretty low for now, that's why IMHO it's still good as a fix, but IMHO we should make this rule more robust as a follow-up.
the flag interface or a method in the Expression class at this point in the life of ES|QL (where we welcome any contribution from outside) is, imo, not warranted: we will make the behavior of a function more complex than it needs and it will confuse contributors even more
My concern is that a contributor could introduce a new function with a similar behavior, and will unintentionally break this rule.
Also, this bug is serious enough that it warrants this simple exclusion and fix.
💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, let's move it forward.
See also my comment below, IMHO there is margin for improvements
@@ -106,6 +107,6 @@ private boolean doResolve(Expression exp, AttributeMap<Expression> aliases, Set< | |||
} | |||
|
|||
private static boolean skipExpression(Expression e) { | |||
return e instanceof Coalesce; | |||
return e instanceof Coalesce || e instanceof Case; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these two functions have special behavior in the sense that they can mess up the is not null
I don't completely agree with this statement. These two functions mess up with this rule, not with IsNotNull
.
They can accept complex expressions where the fields used can also be checked for null
This is closer to the point IMHO: the real peculiarity is that these functions have a custom behavior with null inputs, ie. even if one input is null, the function can still be not null, so func(input1, input2, ...) IS NOT NULL
is not equivalent to input1 IS NOT NULL and func(input1, input2, ...) IS NOT NULL
.
I think we can easily express this as a method.
The danger comes from the fact that this is an implementation detail, peculiar to each function.
Or better, it's how our code generator creates evaluators (Coalesce
and Case
have a custom evaluator, not generated).
This makes the risk pretty low for now, that's why IMHO it's still good as a fix, but IMHO we should make this rule more robust as a follow-up.
the flag interface or a method in the Expression class at this point in the life of ES|QL (where we welcome any contribution from outside) is, imo, not warranted: we will make the behavior of a function more complex than it needs and it will confuse contributors even more
My concern is that a contributor could introduce a new function with a similar behavior, and will unintentionally break this rule.
Also, this bug is serious enough that it warrants this simple exclusion and fix.
💯
@elasticmachine run elasticsearch-ci/part-4 |
@elasticmachine run elasticsearch-ci/bwc-snapshots |
💔 Backport failed
You can use sqren/backport to manually backport by running |
Fixes #112704 by excluding
case
function from theInferIsNotNull
local optimizer rule, because of the potential complexity of thecase
logic conditions.