-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Enhance checks for inexact fields #39427
Conversation
For functions: move checks for `text` fields without underlying `keyword` fields or with many of them (ambiguity) to the type resolution stage. For Order By/Group By: move checks to the `Verifier` to catch early before `QueryTranslator` or execution. Fixes: elastic#38501
Pinging @elastic/es-search |
Also fixes: #35203 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this. Left a few comments, mainly around naming.
import static org.elasticsearch.xpack.sql.expression.Expressions.name; | ||
import static org.elasticsearch.xpack.sql.type.DataType.BOOLEAN; | ||
|
||
public final class TypeResolutionUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer using the plural instead of Utils - TypeResolutions
. Utils has been used where the plural already existed in the rest of the codebase to avoid name clashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plural just sounded to me a bit weird, but I'll change.
|
||
private TypeResolutionUtils() {} | ||
|
||
public static TypeResolution typeMustBeBoolean(Expression e, String operationName, ParamOrdinal paramOrd) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, how about simplifying the method names from "typeMustBeNumeric" to "isNumeric" similar to the assertion in JUnit? It also add consistency ("expressionMustBe", "typeMustBe...").
return TypeResolution.TYPE_RESOLVED; | ||
} | ||
|
||
public static TypeResolution expressionMustBeTableColumn(Expression e, String operationName, ParamOrdinal paramOrd) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constant/table column is a bit misleading - I would just use isFoldable
/notFoldable
since it's a terminology already used through-out the code.
|
||
@Override | ||
public Tuple<Boolean, String> hasExact() { | ||
return PROCESS_EXACT_FIELD.apply(findExact()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason behind using the closure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The findExact() returns also the exact field (used by getExact()) so I wanted to convert the field == null ? false : true
and chose a Function
to be somehow more elegant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got that but why not use the code directly? The closure is not passed anywhere...
Tuple<EsField, String> tuple = findExact();
return tuple.v1() == null ? new Exact(false, tuple.v2()) : new Exact(true, null)
@@ -109,16 +108,6 @@ public void testTermEqualityAnalyzer() { | |||
assertEquals("value", tq.value()); | |||
} | |||
|
|||
public void testTermEqualityAnalyzerAmbiguous() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this tested somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, here: https://github.com/elastic/elasticsearch/pull/39427/files#diff-334bd2f8c33b60789817181fcef4ca52R331 with 2 methods now.
* {False, String} if this field doesn't have an underlying exact field, the v2 of the tuple is a string containing | ||
* an error message explaining why that is. | ||
*/ | ||
public Tuple<Boolean, String> hasExact() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having both isExact
and hasExact
is confusing - is the former still used anywhere? I opt for refactoring isExact to return the tuple instead (which can be promoted to a dedicated inner class such (Exact
) to make its use easier and its intent clearer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the confusing isExact
but it's used externally here: https://github.com/elastic/elasticsearch/pull/39427/files#diff-1059b5208ef7d10a0fdc89b5dad914edR90 and has slightly different semantics than hasExact
.
hasExact
returns <TRUE, null> if it's exact, or has an exact underlying field, so we don't know which is the case.
But if we introduce a class Exact
to replace the the Tuple
we can include this info there and remove the confusing isExact()
method. Thx!
Whatever it's easier - I think the PR itself is not complicated but is made big by the amount of files that it touches. I'm fine with backporting the whole PR. |
@costin thanks for the comments! Pushed a commit to address them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice cleanup. Left few minor comments.
@@ -292,6 +292,12 @@ public void testStarOnNested() { | |||
assertNotNull(accept("SELECT dep.* FROM test")); | |||
} | |||
|
|||
public void testGroupByOnInexact() { | |||
assertEquals("1:36: Grouping field of data type [text] for grouping; " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand the first sentence in this error message, it has no predicate. Can you try a more descriptive message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, sorry I mixed something up there, will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx for catching!
@@ -45,6 +46,11 @@ public Nullability nullable() { | |||
return Nullability.FALSE; | |||
} | |||
|
|||
@Override | |||
protected TypeResolution resolveType() { | |||
return typeMustBeExact(child, "ORDER BY cannot be applied to field of data type[{}]: {}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
White space between type
and [{}]
: ...to field of data type [{}]: {}
.
error("SELECT LENGTH(1)")); | ||
} | ||
|
||
public void testInvalidTypeForStringFunction_WithOneArgNumeric() { | ||
assertEquals("1:8: argument of [CHAR('foo')] must be [integer], found value ['foo'] type [keyword]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the error message look like for something like SELECT ASCII(CHAR('foo'))
? (nested functions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test: cda66ca#diff-334bd2f8c33b60789817181fcef4ca52R457
Good thinking! But seems fine since we use sourceText()
on the offending function and it's not showing the whole expression that can be complex and result in confusion for the user.
Also, shouldn't this one go to 7.0.x as well? |
@astefan yes, it will actually be backported all the way to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
MIN/MAX on strings are supported and are implemented with TopAggs FIRST/LAST respectively, but they cannot operate on `text` fields without underlying `keyword` fields => inexact. Follows: elastic#39427
MIN/MAX on strings are supported and are implemented with TopAggs FIRST/LAST respectively, but they cannot operate on `text` fields without underlying `keyword` fields => inexact. Follows: #39427
MIN/MAX on strings are supported and are implemented with TopAggs FIRST/LAST respectively, but they cannot operate on `text` fields without underlying `keyword` fields => inexact. Follows: #39427
MIN/MAX on strings are supported and are implemented with TopAggs FIRST/LAST respectively, but they cannot operate on `text` fields without underlying `keyword` fields => inexact. Follows: #39427
MIN/MAX on strings are supported and are implemented with TopAggs FIRST/LAST respectively, but they cannot operate on `text` fields without underlying `keyword` fields => inexact. Follows: #39427
For functions: move checks for
text
fields without underlyingkeyword
fields or with many of them (ambiguity) to the type resolution stage.
For Order By/Group By: move checks to the
Verifier
to catch earlybefore
QueryTranslator
or execution.Fixes: #38501