Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Enhance checks for inexact fields #39427

Merged
merged 4 commits into from
Feb 28, 2019
Merged

Conversation

matriv
Copy link
Contributor

@matriv matriv commented Feb 27, 2019

For functions: move checks for text fields without underlying keyword
fields or with many of them (ambiguity) to the type resolution stage.

For Order By/Group By: move checks to the Verifier to catch early
before QueryTranslator or execution.

Fixes: #38501

For functions: move checks for `text` fields without underlying `keyword`
fields or with many of them (ambiguity) to the type resolution stage.

For Order By/Group By: move checks to the `Verifier` to catch early
before `QueryTranslator` or execution.

Fixes: elastic#38501
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@matriv matriv requested review from costin and astefan February 27, 2019 00:04
@matriv
Copy link
Contributor Author

matriv commented Feb 27, 2019

Also fixes: #35203

@matriv matriv added the >bug label Feb 27, 2019
@matriv
Copy link
Contributor Author

matriv commented Feb 27, 2019

@costin @astefan As this PR fixes also an issue for group by and order by, do you think I should extract this fix separately as it's done here: #39443, to backport only the fix, or should I backport this whole PR to earlier versions?

Asking because this extraction requires some more work.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this. Left a few comments, mainly around naming.

import static org.elasticsearch.xpack.sql.expression.Expressions.name;
import static org.elasticsearch.xpack.sql.type.DataType.BOOLEAN;

public final class TypeResolutionUtils {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer using the plural instead of Utils - TypeResolutions. Utils has been used where the plural already existed in the rest of the codebase to avoid name clashes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plural just sounded to me a bit weird, but I'll change.


private TypeResolutionUtils() {}

public static TypeResolution typeMustBeBoolean(Expression e, String operationName, ParamOrdinal paramOrd) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, how about simplifying the method names from "typeMustBeNumeric" to "isNumeric" similar to the assertion in JUnit? It also add consistency ("expressionMustBe", "typeMustBe...").

return TypeResolution.TYPE_RESOLVED;
}

public static TypeResolution expressionMustBeTableColumn(Expression e, String operationName, ParamOrdinal paramOrd) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constant/table column is a bit misleading - I would just use isFoldable/notFoldable since it's a terminology already used through-out the code.


@Override
public Tuple<Boolean, String> hasExact() {
return PROCESS_EXACT_FIELD.apply(findExact());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason behind using the closure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The findExact() returns also the exact field (used by getExact()) so I wanted to convert the field == null ? false : true and chose a Function to be somehow more elegant.

Copy link
Member

@costin costin Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got that but why not use the code directly? The closure is not passed anywhere...

Tuple<EsField, String> tuple = findExact();
return tuple.v1() == null ? new Exact(false, tuple.v2()) : new Exact(true, null)

@@ -109,16 +108,6 @@ public void testTermEqualityAnalyzer() {
assertEquals("value", tq.value());
}

public void testTermEqualityAnalyzerAmbiguous() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this tested somewhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* {False, String} if this field doesn't have an underlying exact field, the v2 of the tuple is a string containing
* an error message explaining why that is.
*/
public Tuple<Boolean, String> hasExact() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having both isExact and hasExact is confusing - is the former still used anywhere? I opt for refactoring isExact to return the tuple instead (which can be promoted to a dedicated inner class such (Exact) to make its use easier and its intent clearer).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the confusing isExact but it's used externally here: https://github.com/elastic/elasticsearch/pull/39427/files#diff-1059b5208ef7d10a0fdc89b5dad914edR90 and has slightly different semantics than hasExact.

hasExact returns <TRUE, null> if it's exact, or has an exact underlying field, so we don't know which is the case.

But if we introduce a class Exact to replace the the Tuple we can include this info there and remove the confusing isExact() method. Thx!

@costin
Copy link
Member

costin commented Feb 27, 2019

@costin @astefan As this PR fixes also an issue for group by and order by, do you think I should extract this fix separately as it's done here: #39443, to backport only the fix, or should I backport this whole PR to earlier versions?

Asking because this extraction requires some more work.

Whatever it's easier - I think the PR itself is not complicated but is made big by the amount of files that it touches. I'm fine with backporting the whole PR.

@matriv
Copy link
Contributor Author

matriv commented Feb 27, 2019

@costin thanks for the comments! Pushed a commit to address them.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nice cleanup. Left few minor comments.

@@ -292,6 +292,12 @@ public void testStarOnNested() {
assertNotNull(accept("SELECT dep.* FROM test"));
}

public void testGroupByOnInexact() {
assertEquals("1:36: Grouping field of data type [text] for grouping; " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand the first sentence in this error message, it has no predicate. Can you try a more descriptive message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, sorry I mixed something up there, will fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx for catching!

@@ -45,6 +46,11 @@ public Nullability nullable() {
return Nullability.FALSE;
}

@Override
protected TypeResolution resolveType() {
return typeMustBeExact(child, "ORDER BY cannot be applied to field of data type[{}]: {}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

White space between type and [{}]: ...to field of data type [{}]: {}.

error("SELECT LENGTH(1)"));
}

public void testInvalidTypeForStringFunction_WithOneArgNumeric() {
assertEquals("1:8: argument of [CHAR('foo')] must be [integer], found value ['foo'] type [keyword]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the error message look like for something like SELECT ASCII(CHAR('foo'))? (nested functions)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test: cda66ca#diff-334bd2f8c33b60789817181fcef4ca52R457

Good thinking! But seems fine since we use sourceText() on the offending function and it's not showing the whole expression that can be complex and result in confusion for the user.

@astefan
Copy link
Contributor

astefan commented Feb 27, 2019

Also, shouldn't this one go to 7.0.x as well?

@matriv
Copy link
Contributor Author

matriv commented Feb 27, 2019

@astefan yes, it will actually be backported all the way to 6.6.2 as it also fixes the bug mentioned for Order By and Group By.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@matriv matriv merged commit 10ab391 into elastic:master Feb 28, 2019
@matriv matriv deleted the mt/fix-38501 branch February 28, 2019 23:10
matriv added a commit that referenced this pull request Mar 1, 2019
For functions: move checks for `text` fields without underlying `keyword`
fields or with many of them (ambiguity) to the type resolution stage.

For Order By/Group By: move checks to the `Verifier` to catch early
before `QueryTranslator` or execution.

Closes: #38501
Fixes: #35203
matriv added a commit that referenced this pull request Mar 1, 2019
For functions: move checks for `text` fields without underlying `keyword`
fields or with many of them (ambiguity) to the type resolution stage.

For Order By/Group By: move checks to the `Verifier` to catch early
before `QueryTranslator` or execution.

Closes: #38501
Fixes: #35203
matriv added a commit that referenced this pull request Mar 1, 2019
For functions: move checks for `text` fields without underlying `keyword`
fields or with many of them (ambiguity) to the type resolution stage.

For Order By/Group By: move checks to the `Verifier` to catch early
before `QueryTranslator` or execution.

Closes: #38501
Fixes: #35203
matriv added a commit that referenced this pull request Mar 1, 2019
For functions: move checks for `text` fields without underlying `keyword`
fields or with many of them (ambiguity) to the type resolution stage.

For Order By/Group By: move checks to the `Verifier` to catch early
before `QueryTranslator` or execution.

Closes: #38501
Fixes: #35203
@matriv
Copy link
Contributor Author

matriv commented Mar 1, 2019

Backported to 7.x with 9fb2f67
to 7.0 with 2c39bb3
to 6.7 with bfc6e22
tp 6.6 with 9c60367

matriv added a commit to matriv/elasticsearch that referenced this pull request Mar 1, 2019
MIN/MAX on strings are supported and are implemented with
TopAggs FIRST/LAST respectively, but they cannot operate on
`text` fields without underlying `keyword` fields => inexact.

Follows: elastic#39427
matriv added a commit that referenced this pull request Mar 4, 2019
MIN/MAX on strings are supported and are implemented with
TopAggs FIRST/LAST respectively, but they cannot operate on
`text` fields without underlying `keyword` fields => inexact.

Follows: #39427
matriv added a commit that referenced this pull request Mar 4, 2019
MIN/MAX on strings are supported and are implemented with
TopAggs FIRST/LAST respectively, but they cannot operate on
`text` fields without underlying `keyword` fields => inexact.

Follows: #39427
matriv added a commit that referenced this pull request Mar 4, 2019
MIN/MAX on strings are supported and are implemented with
TopAggs FIRST/LAST respectively, but they cannot operate on
`text` fields without underlying `keyword` fields => inexact.

Follows: #39427
matriv added a commit that referenced this pull request Mar 4, 2019
MIN/MAX on strings are supported and are implemented with
TopAggs FIRST/LAST respectively, but they cannot operate on
`text` fields without underlying `keyword` fields => inexact.

Follows: #39427
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQL: Enhance check for text fields without underlying keyword
6 participants