-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-5510] Support sort by ordinal when sort by column is an expression in RelToSqlConverter #3057
Conversation
…ession in RelToSqlConverter Keep the output sort by column to ordinary when sort by ordinal in dialect is true and the column is an expression.
LGTM |
@@ -759,17 +760,17 @@ private static String toSql(RelNode root, SqlDialect dialect, | |||
final String expected = "SELECT \"product_class_id\", COUNT(*) AS \"C\"\n" | |||
+ "FROM \"foodmart\".\"product\"\n" | |||
+ "GROUP BY ROLLUP(\"product_class_id\")\n" | |||
+ "ORDER BY \"product_class_id\", COUNT(*)"; | |||
+ "ORDER BY \"product_class_id\", 2"; | |||
final String expectedMysql = "SELECT `product_class_id`, COUNT(*) AS `C`\n" | |||
+ "FROM `foodmart`.`product`\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
COUNT(*) is replaced by 2 rather than 1. So the ordinal starts from 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think the ordinal is the position of column in select list, it starts from 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peng128 thanks for your PR, I've left some comments.
final String expectedMysql = "SELECT `product_class_id`, `brand_name`," | ||
+ " COUNT(*) AS `C`\n" | ||
+ "FROM `foodmart`.`product`\n" | ||
+ "GROUP BY `product_class_id`, `brand_name` WITH ROLLUP\n" | ||
+ "ORDER BY `product_class_id` IS NULL, `product_class_id`," | ||
+ " `brand_name` IS NULL, `brand_name`," | ||
+ " COUNT(*) IS NULL, COUNT(*)"; | ||
+ " 3 IS NULL, 3"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is order by 3 is null
valid here? In Calcite, we'll take this not a 'field reference'. I'm not sure how other databases behaves in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it valid here.
I try to make a test to compare the relNode compare '3 is null' sql with the original SQL and the 'count(*) is null' sql.
Using default SqlParser config.
- For original sql:
select "product_class_id", "brand_name",count(*) as c
from "product"
group by rollup("product_class_id", "brand_name")
order by 1, 2, 3
relNode is
LogicalSort(sort0=[$0], sort1=[$1], sort2=[$2], dir0=[ASC], dir1=[ASC], dir2=[ASC])
LogicalAggregate(group=[{0, 1}], groups=[[{0, 1}, {0}, {}]], C=[COUNT()])
LogicalProject(product_class_id=[$0], brand_name=[$2])
JdbcTableScan(table=[[foodmart, product]])
- For 'count(*) is null' sql:
select "product_class_id", "brand_name", count(*) as c
from "product"
group by rollup("product_class_id", "brand_name")
order by "product_class_id", "brand_name", count(*) is null, count(*)
relNode is
LogicalProject(product_class_id=[$0], brand_name=[$1], C=[$2])
LogicalSort(sort0=[$0], sort1=[$1], sort2=[$3], sort3=[$2], dir0=[ASC], dir1=[ASC], dir2=[ASC], dir3=[ASC])
LogicalProject(product_class_id=[$0], brand_name=[$1], C=[$2], EXPR$3=[false])
LogicalAggregate(group=[{0, 1}], groups=[[{0, 1}, {0}, {}]], C=[COUNT()])
LogicalProject(product_class_id=[$0], brand_name=[$2])
JdbcTableScan(table=[[foodmart, product]])
- For '3 is null' sql:
select "product_class_id", "brand_name", count(*) as c
from "product"
group by rollup("product_class_id", "brand_name")
order by 1, 2, 3 is null, 3
relNode is
LogicalProject(product_class_id=[$0], brand_name=[$1], C=[$2])
LogicalSort(sort0=[$0], sort1=[$1], sort2=[$3], sort3=[$2], dir0=[ASC], dir1=[ASC], dir2=[ASC], dir3=[ASC])
LogicalProject(product_class_id=[$0], brand_name=[$1], C=[$2], EXPR$3=[false])
LogicalAggregate(group=[{0, 1}], groups=[[{0, 1}, {0}, {}]], C=[COUNT()])
LogicalProject(product_class_id=[$0], brand_name=[$2])
JdbcTableScan(table=[[foodmart, product]])
The relNode of '3 is null' sql and 'count(*) is null' sql is almost some. Is my understanding correct?
So I think it's fine here. And I test '3 is null' sql in MySQL, it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, what I mean is that we only takes pure literal as a field reference in Calcite, if it's an expression, it won't be taken as a field reference. Your second case shows what I described above.
To be more precise, 3 is null
equals false
, not count(*) is null
in this case.
I'm not saying Calcite is definitely correct for current behavior. We need to investigate what other databases behave when the literal is in an expression, and let Calcite be consistent with the majority of them.
If we prefer to not take literal in expressions as a field reference, then your change now would make it incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it. We shouldn't take literal in experssion as field reference.
How about keep original expression when we need to sort null values in order by?
For example, the output sql will be
select "product_class_id", "brand_name", count(*) as c
from "product"
group by rollup("product_class_id", "brand_name")
order by 1, 2, count(*) is null, 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed it. Please review again, thanks.
@@ -615,6 +615,13 @@ public SqlNode orderField(int ordinal) { | |||
final String strValue = ((SqlNumericLiteral) node).toValue(); | |||
return SqlLiteral.createCharString(strValue, node.getParserPosition()); | |||
} | |||
if (node instanceof SqlCall | |||
&& dialect.getConformance().isSortByOrdinal()) { | |||
// If the filed is expression and sort by ordinal is set in dialect, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filed
-> field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it. Thanks~
…ession in RelToSqlConverter Keep the output sort by column to ordinary when sort by ordinal in dialect is true and the column is an expression.
…ession in RelToSqlConverter Keep the output sort by column to ordinary when sort by ordinal in dialect is true and the column is an expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now, with one minor comment. After fixing that, I'll merge this PR.
* RelToSqlConverter don't support sort by ordinal when sort by column is an expression</a>. | ||
*/ | ||
@Test void testOrderByOrdinalWithExpression() { | ||
String query = "select \"product_id\", count(*) as \"c\"\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final
…ession in RelToSqlConverter Keep the output sort by column to ordinary when sort by ordinal in dialect is true and the column is an expression.
reopened, because slow tests fail due to this change: https://github.com/apache/calcite/actions/runs/4223301074/jobs/7332876394 |
…ession in RelToSqlConverter Keep the output sort by column to ordinary when sort by ordinal in dialect is true and the column is an expression.
…ession in RelToSqlConverter Keep the output sort by column to ordinary when sort by ordinal in dialect is true and the column is an expression.
Kudos, SonarCloud Quality Gate passed! |
&& dialect.getConformance().isSortByOrdinal()) { | ||
// Cannot merge a Project that contains sort by ordinal under it. | ||
return hasSortByOrdinal(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last version we merged does not have this logic, could you help me to understand why do you need to add this logic now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We found test case LatticeTest#testSum is failed in slow test.
These code is to fix it.
The reason of test case failure is that the project will be merged when the sub-relnode is sort(change by CALCITE-4901).
For example,
If the relNode is
JdbcProject(C=[$1])
JdbcSort(sort0=[$1], dir0=[DESC], fetch=[1])
JdbcAggregate(group=[{0}], C=[$SUM0($7)])
JdbcTableScan(table=[[foodmart, sales_fact_1997]])
The result after merge project is
SELECT SUM(`sales_fact_1997`.`unit_sales`) AS `C`
FROM `foodmart`.`sales_fact_1997` AS `sales_fact_1997`
GROUP BY `sales_fact_1997`.`product_id`
ORDER BY 2 DESC
it's a incorrect sql, we want
SELECT `C` FROM (
SELECT `sales_fact_1997`.`product_id`, SUM(`sales_fact_1997`.`unit_sales`) AS `C`
FROM `foodmart`.`sales_fact_1997` AS `sales_fact_1997`
GROUP BY `sales_fact_1997`.`product_id`
ORDER BY 2 DESC) t
So, these code to find out whether there is sort by original under project.
And don't merge project when there is sort by original under project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we agreed that the slow tests failure is not caused by this PR, we'd better handle this in CALCITE-5537.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are three test cases failure.
- LatticeTest#testSum
- LatticeSuggesterTest#testFoodMartAll
- LatticeSuggesterTest#testFoodMartAllEvolve
Case 1 is caused by this PR. Maybe we could fix in this PR?
Case 2 & 3 is not caused by this PR. I think we could fix case 2 & 3 in CALCITE-5537.
BTW, these code just fix case 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification, I agree with you. I'll review the PR in the coming days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, merging. Thanks @peng128 for your contribution!
… the dialect allows Close apache#3057 (cherry picked from commit a990ecc)
Keep the output sort by column to ordinary when sort by ordinal in dialect is true and the column is an expression.