-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Failing Group By queries due to different ExpressionIds #43072
SQL: Failing Group By queries due to different ExpressionIds #43072
Conversation
Pinging @elastic/es-search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @emasab!
First of all apologies for the delay on picking up this PR and many thanks for your effort here.
The issue with the ExpressionIds is quite complicated and there are plans to solve it overall through a substantial refactoring. Your changes though, solve a substantial subset of the issues
so we decided to go ahead and proceed with this fix. I've changed the title and description of the PR and link to the issues that it fixes.
For some cases (without aliases or numeric refs) queries still fail, e.g.:
SELECT PI() * emp_no FROM test_emp GROUP BY PI() * emp_no ORDER BY PI() * emp_no LIMIT 10
Moreover I'd like to ask from you to add some integration tests for the issues that this fix addresses:
In agg.sql-spec
file: https://github.com/emasab/elasticsearch/blob/bugfix/42041-failing-group-by-queries/x-pack/plugin/sql/qa/src/main/resources/agg.sql-spec#L52 the following:
groupByCastScalar
SELECT CAST(ABS(EXTRACT(YEAR FROM "birth_date")) AS BIGINT) FROM test_emp GROUP BY CAST(ABS(EXTRACT(YEAR FROM "birth_date")) AS BIGINT) ORDER BY CAST(ABS(EXTRACT(YEAR FROM "birth_date")) AS BIGINT) NULLS FIRST;
groupByCastScalarWithAlias
SELECT CAST(ABS(EXTRACT(YEAR FROM "birth_date")) AS BIGINT) as "cast" FROM test_emp GROUP BY "cast" ORDER BY "cast" NULLS FIRST;
and in agg.csv-spec
file: https://github.com/emasab/elasticsearch/blob/bugfix/42041-failing-group-by-queries/x-pack/plugin/sql/qa/src/main/resources/agg.csv-spec#L220
the following:
// Fails for H2
groupByCastScalarWithNumericRef
SELECT CAST(ABS(EXTRACT(YEAR FROM "birth_date")) AS BIGINT) FROM test_emp GROUP BY 1 ORDER BY 1 NULLS FIRST;
CAST(ABS(EXTRACT(YEAR FROM "birth_date")) AS BIGINT):l
------------------------------------------------------
null
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
;
groupByConvertScalar
SELECT CONVERT(ABS(EXTRACT(YEAR FROM "birth_date")), SQL_BIGINT) FROM test_emp GROUP BY CONVERT(ABS(EXTRACT(YEAR FROM "birth_date")), SQL_BIGINT) ORDER BY CONVERT(ABS(EXTRACT(YEAR FROM "birth_date")), SQL_BIGINT) NULLS FIRST;
CONVERT(ABS(EXTRACT(YEAR FROM "birth_date")), SQL_BIGINT):l
-----------------------------------------------------------
null
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
;
groupByConvertScalarWithAlias
SELECT CONVERT(ABS(EXTRACT(YEAR FROM "birth_date")), SQL_BIGINT) as "convert" FROM test_emp GROUP BY "convert" ORDER BY "convert" NULLS FIRST;
convert:l
---------
null
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
;
groupByConvertScalarWithNumericRef
SELECT CONVERT(ABS(EXTRACT(YEAR FROM "birth_date")), SQL_BIGINT) FROM test_emp GROUP BY 1 ORDER BY 1 NULLS FIRST;
CONVERT(ABS(EXTRACT(YEAR FROM "birth_date")), SQL_BIGINT):l
-----------------------------------------------------------
null
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
;
// AwaitsFix https://github.com/elastic/elasticsearch/issues/36074
//groupByConstantScalar
//SELECT PI() * emp_no FROM test_emp GROUP BY PI() * emp_no ORDER BY PI() * emp_no LIMIT 10;
//PI() * emp_no:d
//---------------
//31419.0681285515
//31422.2097212051
//31425.3513138587
//31428.4929065123
//31431.6344991659
//31434.7760918195
//31437.9176844731
//31441.0592771266
//31444.2008697802
//31447.3424624338
//;
groupByConstantScalarWithAlias
SELECT PI() * emp_no AS "value" FROM test_emp GROUP BY value ORDER BY value LIMIT 10;
value:d
-------
31419.0681285515
31422.2097212051
31425.3513138587
31428.4929065123
31431.6344991659
31434.7760918195
31437.9176844731
31441.0592771266
31444.2008697802
31447.3424624338
;
groupByConstantScalarWithNumericRef
SELECT PI() * emp_no FROM test_emp GROUP BY 1 ORDER BY 1 LIMIT 10;
PI() * emp_no:d
---------------
31419.0681285515
31422.2097212051
31425.3513138587
31428.4929065123
31431.6344991659
31434.7760918195
31437.9176844731
31441.0592771266
31444.2008697802
31447.3424624338
;
@@ -779,6 +779,85 @@ public void testAllCountVariantsWithHavingGenerateCorrectAggregations() { | |||
+ "\"gap_policy\":\"skip\"}}}}}")); | |||
} | |||
|
|||
public void testGroupSelection() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to split this up to more methods with more descriptive names and add more tests so that all 3 cases (no alias, alias, numeric reference) are covered, similarly to the integration tests in my comment: #43072 (review)
Of course some of the cases, as mentioned, should be skipped as they won't work even with this fix.
Hi @matriv , happy to hear from you. I understand, the checks to do are a lot before giving an ok. I'm going to include these integration tests, change the unit tests and see if there is a simple fix for the failing query that you've highlighted. |
Fixed the groupByConstantScalar case too, mainly unifying the semantic of NamedExpression.equals to that of NamedExpression.hashCode . The ExpressionId is still there but is used only as a unique identifier for the object instance. Changed the semantic of Failure.equals because now two NamedExpression are equal if they have the same tree even if the position in the source is different so Failure takes into account the source position not the parsed node. |
Sorry, the latest commit even if more correct theorically because of the difference between hashCode and equals, solves the specified case and passes the unit tests, causes a lot of failures with the integration test org.elasticsearch.xpack.sql.qa.single_node.JdbcSqlSpecIT. The previous one is not affected |
With the 9b7878b I've solved the integration tests failures while keeping the equals and hashCode methods consistent. The order by constant scalar is still excluded. With my previous commit there was an incomplete change to ResolveMissingRefs that made it work but I've reverted it because this other query failed:
|
In the latest commit these is a fix for groupByConstantScalar case with ORDER BY, while keeping the equals consistent with the hashCode, and the integration tests successful The result of the ordering is still not correct in the DESC case or when multiplied by -1 but the query seems to be translated correctly before it's processed by the Querier |
e904df4
to
4111876
Compare
rebased on current master |
4111876
to
4a57627
Compare
4a57627
to
2ccb219
Compare
In the latest commit there is a fix for the results ordering when there is a grouped ScalarFunctionAttribute in the ORDER BY clause, like: That should be the last change for this PR, unless other issues come out, you request some changes or rebases to master |
@elasticmachine test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @emasab,
Once again thanks a lot for your effort here.
There has been delay in getting back to you, as there were other features and bugs in our plate, and as stated before there is an ongoing effort of complete refactoring in the area of NamedExpression and ExpressionIds.
The state of the PR though looks good overall to me and would be nice to fix those issues independently of the refactoring that might take more time to be ready.
I left another round of comments and I'd like to do some more testing myself using your branch.
x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/expression/Expressions.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/expression/gen/script/Param.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/expression/gen/script/Param.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/expression/gen/script/Param.java
Outdated
Show resolved
Hide resolved
.../plugin/sql/src/main/java/org/elasticsearch/xpack/sql/querydsl/container/QueryContainer.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/querydsl/agg/Aggs.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/expression/gen/script/Params.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/plan/logical/Pivot.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/plan/logical/Pivot.java
Outdated
Show resolved
Hide resolved
public static AttributeSet referencesExcept(Set<? extends Expression> exps, | ||
AttributeSet except) { | ||
AttributeSet ret = new AttributeSet(); | ||
while(exps.size()>0){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get the logic here, why do you need the while loop and revisit the exps
which have become filteredExps
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a breadth-first search where some nodes are removed during the traversal. At each loop the exps represents the previous list of tree nodes and filteredExps the expressions that aren't in the except set. At the end of the loop exps is replaced with the children of each node. When a leaf is found, its references are added to ret.
code style Co-Authored-By: Marios Trivyzas <matriv@gmail.com>
x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/plan/logical/Pivot.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/plan/logical/Pivot.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/plan/logical/Pivot.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for addressing the comments @emasab
I left a few more, minor, regarding code formatting.
x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/expression/Expressions.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/expression/Expressions.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/expression/Expressions.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/expression/Expressions.java
Outdated
Show resolved
Hide resolved
code style Co-Authored-By: Marios Trivyzas <matriv@gmail.com>
Now I've set up the save action and the eclipseformat :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you @emasab for your work, your patience and the effort and time spent on addressing the comments.
@elasticmachine test this please |
Fix an issue that arises from the use of ExpressionIds as keys in a lookup map that helps the QueryTranslator to identify the grouping columns. The issue is that the same expression in different parts of the query (SELECT clause and GROUP BY clause) ends up with different ExpressionIds so the lookup fails. So, instead of ExpressionIds use the hashCode() of NamedExpression. Fixes: #41159 Fixes: #40001 Fixes: #40240 Fixes: #33361 Fixes: #46316 Fixes: #36074 Fixes: #34543 Fixes: #37044 Fixes: #42041 (cherry picked from commit 3c38ea5)
Fix an issue that arises from the use of ExpressionIds as keys in a lookup map that helps the QueryTranslator to identify the grouping columns. The issue is that the same expression in different parts of the query (SELECT clause and GROUP BY clause) ends up with different ExpressionIds so the lookup fails. So, instead of ExpressionIds use the hashCode() of NamedExpression. Fixes: #41159 Fixes: #40001 Fixes: #40240 Fixes: #33361 Fixes: #46316 Fixes: #36074 Fixes: #34543 Fixes: #37044 Fixes: #42041 (cherry picked from commit 3c38ea5)
Fix an issue that arises from the use of ExpressionIds as keys in a lookup map that helps the QueryTranslator to identify the grouping columns. The issue is that the same expression in different parts of the query (SELECT clause and GROUP BY clause) ends up with different ExpressionIds so the lookup fails. So, instead of ExpressionIds use the hashCode() of NamedExpression. Fixes: #41159 Fixes: #40001 Fixes: #40240 Fixes: #33361 Fixes: #46316 Fixes: #36074 Fixes: #34543 Fixes: #37044 Fixes: #42041 (cherry picked from commit 3c38ea5)
Fix an issue that arises from the use of ExpressionIds as keys in a lookup map that helps the QueryTranslator to identify the grouping columns. The issue is that the same expression in different parts of the query (SELECT clause and GROUP BY clause) ends up with different ExpressionIds so the lookup fails. So, instead of ExpressionIds use the hashCode() of NamedExpression. Fixes: #41159 Fixes: #40001 Fixes: #40240 Fixes: #33361 Fixes: #46316 Fixes: #36074 Fixes: #34543 Fixes: #37044 Fixes: #42041 (cherry picked from commit 3c38ea5)
Fix an issue that arises from the use of ExpressionIds as keys in a lookup map
that helps the
QueryTranslator
to identify the grouping columns. The issue isthat the same expression in different parts of the query (SELECT clause and GROUP BY clause)
ends up with different ExpressionIds so the lookup fails. So, instead of ExpressionIds
use the hashCode() of NamedExpression.
Fixes: #41159
Fixes: #40001
Fixes: #40240
Fixes: #33361
Fixes: #46316
Fixes: #36074
Fixes: #34543
Fixes: #37044
Fixes: #42041