Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-5510] Support sort by ordinal when sort by column is an expression in RelToSqlConverter #3057

Closed
wants to merge 6 commits into from

Conversation

peng128
Copy link

@peng128 peng128 commented Feb 2, 2023

Keep the output sort by column to ordinary when sort by ordinal in dialect is true and the column is an expression.

@peng128 peng128 marked this pull request as draft February 2, 2023 10:57
…ession in RelToSqlConverter

Keep the output sort by column to ordinary when sort by ordinal in dialect is true and the column is an expression.
@peng128 peng128 marked this pull request as ready for review February 2, 2023 13:18
@zoudan
Copy link
Contributor

zoudan commented Feb 6, 2023

LGTM

@@ -759,17 +760,17 @@ private static String toSql(RelNode root, SqlDialect dialect,
final String expected = "SELECT \"product_class_id\", COUNT(*) AS \"C\"\n"
+ "FROM \"foodmart\".\"product\"\n"
+ "GROUP BY ROLLUP(\"product_class_id\")\n"
+ "ORDER BY \"product_class_id\", COUNT(*)";
+ "ORDER BY \"product_class_id\", 2";
final String expectedMysql = "SELECT `product_class_id`, COUNT(*) AS `C`\n"
+ "FROM `foodmart`.`product`\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COUNT(*) is replaced by 2 rather than 1. So the ordinal starts from 1?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think the ordinal is the position of column in select list, it starts from 1.

Copy link
Member

@libenchao libenchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peng128 thanks for your PR, I've left some comments.

final String expectedMysql = "SELECT `product_class_id`, `brand_name`,"
+ " COUNT(*) AS `C`\n"
+ "FROM `foodmart`.`product`\n"
+ "GROUP BY `product_class_id`, `brand_name` WITH ROLLUP\n"
+ "ORDER BY `product_class_id` IS NULL, `product_class_id`,"
+ " `brand_name` IS NULL, `brand_name`,"
+ " COUNT(*) IS NULL, COUNT(*)";
+ " 3 IS NULL, 3";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is order by 3 is null valid here? In Calcite, we'll take this not a 'field reference'. I'm not sure how other databases behaves in this case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it valid here.
I try to make a test to compare the relNode compare '3 is null' sql with the original SQL and the 'count(*) is null' sql.
Using default SqlParser config.

  1. For original sql:
select "product_class_id", "brand_name",count(*) as c
from "product"
group by rollup("product_class_id", "brand_name")
order by 1, 2, 3

relNode is

LogicalSort(sort0=[$0], sort1=[$1], sort2=[$2], dir0=[ASC], dir1=[ASC], dir2=[ASC])
  LogicalAggregate(group=[{0, 1}], groups=[[{0, 1}, {0}, {}]], C=[COUNT()])
    LogicalProject(product_class_id=[$0], brand_name=[$2])
      JdbcTableScan(table=[[foodmart, product]])
  1. For 'count(*) is null' sql:
select "product_class_id", "brand_name", count(*) as c
from "product"
group by rollup("product_class_id", "brand_name")
order by "product_class_id", "brand_name", count(*) is null, count(*)

relNode is

LogicalProject(product_class_id=[$0], brand_name=[$1], C=[$2])
  LogicalSort(sort0=[$0], sort1=[$1], sort2=[$3], sort3=[$2], dir0=[ASC], dir1=[ASC], dir2=[ASC], dir3=[ASC])
    LogicalProject(product_class_id=[$0], brand_name=[$1], C=[$2], EXPR$3=[false])
      LogicalAggregate(group=[{0, 1}], groups=[[{0, 1}, {0}, {}]], C=[COUNT()])
        LogicalProject(product_class_id=[$0], brand_name=[$2])
          JdbcTableScan(table=[[foodmart, product]])
  1. For '3 is null' sql:
select "product_class_id", "brand_name", count(*) as c
from "product"
group by rollup("product_class_id", "brand_name")
order by 1, 2, 3 is null, 3

relNode is

LogicalProject(product_class_id=[$0], brand_name=[$1], C=[$2])
  LogicalSort(sort0=[$0], sort1=[$1], sort2=[$3], sort3=[$2], dir0=[ASC], dir1=[ASC], dir2=[ASC], dir3=[ASC])
    LogicalProject(product_class_id=[$0], brand_name=[$1], C=[$2], EXPR$3=[false])
      LogicalAggregate(group=[{0, 1}], groups=[[{0, 1}, {0}, {}]], C=[COUNT()])
        LogicalProject(product_class_id=[$0], brand_name=[$2])
          JdbcTableScan(table=[[foodmart, product]])

The relNode of '3 is null' sql and 'count(*) is null' sql is almost some. Is my understanding correct?
So I think it's fine here. And I test '3 is null' sql in MySQL, it works.

Copy link
Member

@libenchao libenchao Feb 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, what I mean is that we only takes pure literal as a field reference in Calcite, if it's an expression, it won't be taken as a field reference. Your second case shows what I described above.

To be more precise, 3 is null equals false, not count(*) is null in this case.

I'm not saying Calcite is definitely correct for current behavior. We need to investigate what other databases behave when the literal is in an expression, and let Calcite be consistent with the majority of them.

If we prefer to not take literal in expressions as a field reference, then your change now would make it incorrect.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it. We shouldn't take literal in experssion as field reference.
How about keep original expression when we need to sort null values in order by?
For example, the output sql will be

select "product_class_id", "brand_name", count(*) as c
from "product"
group by rollup("product_class_id", "brand_name")
order by 1, 2, count(*) is null, 3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds good to me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed it. Please review again, thanks.

@@ -615,6 +615,13 @@ public SqlNode orderField(int ordinal) {
final String strValue = ((SqlNumericLiteral) node).toValue();
return SqlLiteral.createCharString(strValue, node.getParserPosition());
}
if (node instanceof SqlCall
&& dialect.getConformance().isSortByOrdinal()) {
// If the filed is expression and sort by ordinal is set in dialect,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filed -> field

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it. Thanks~

wangpeng added 2 commits February 19, 2023 22:07
…ession in RelToSqlConverter

Keep the output sort by column to ordinary when sort by ordinal in dialect is true and the column is an expression.
…ession in RelToSqlConverter

Keep the output sort by column to ordinary when sort by ordinal in dialect is true and the column is an expression.
Copy link
Member

@libenchao libenchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now, with one minor comment. After fixing that, I'll merge this PR.

* RelToSqlConverter don't support sort by ordinal when sort by column is an expression</a>.
*/
@Test void testOrderByOrdinalWithExpression() {
String query = "select \"product_id\", count(*) as \"c\"\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final

…ession in RelToSqlConverter

Keep the output sort by column to ordinary when sort by ordinal in dialect is true and the column is an expression.
@libenchao libenchao closed this in 0e8d6b0 Feb 20, 2023
@libenchao libenchao reopened this Feb 20, 2023
@libenchao
Copy link
Member

reopened, because slow tests fail due to this change: https://github.com/apache/calcite/actions/runs/4223301074/jobs/7332876394

wangpeng added 2 commits February 21, 2023 16:38
…ession in RelToSqlConverter

Keep the output sort by column to ordinary when sort by ordinal in dialect is true and the column is an expression.
…ession in RelToSqlConverter

Keep the output sort by column to ordinary when sort by ordinal in dialect is true and the column is an expression.
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

93.0% 93.0% Coverage
0.0% 0.0% Duplication

&& dialect.getConformance().isSortByOrdinal()) {
// Cannot merge a Project that contains sort by ordinal under it.
return hasSortByOrdinal();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last version we merged does not have this logic, could you help me to understand why do you need to add this logic now?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We found test case LatticeTest#testSum is failed in slow test.
These code is to fix it.

The reason of test case failure is that the project will be merged when the sub-relnode is sort(change by CALCITE-4901).

For example,
If the relNode is

JdbcProject(C=[$1])
  JdbcSort(sort0=[$1], dir0=[DESC], fetch=[1])
    JdbcAggregate(group=[{0}], C=[$SUM0($7)])
      JdbcTableScan(table=[[foodmart, sales_fact_1997]])

The result after merge project is

SELECT SUM(`sales_fact_1997`.`unit_sales`) AS `C`
FROM `foodmart`.`sales_fact_1997` AS `sales_fact_1997`
GROUP BY `sales_fact_1997`.`product_id`
ORDER BY 2 DESC

it's a incorrect sql, we want

SELECT `C` FROM (
SELECT `sales_fact_1997`.`product_id`, SUM(`sales_fact_1997`.`unit_sales`) AS `C`
FROM `foodmart`.`sales_fact_1997` AS `sales_fact_1997`
GROUP BY `sales_fact_1997`.`product_id`
ORDER BY 2 DESC) t

So, these code to find out whether there is sort by original under project.
And don't merge project when there is sort by original under project.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we agreed that the slow tests failure is not caused by this PR, we'd better handle this in CALCITE-5537.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three test cases failure.

  1. LatticeTest#testSum
  2. LatticeSuggesterTest#testFoodMartAll
  3. LatticeSuggesterTest#testFoodMartAllEvolve

Case 1 is caused by this PR. Maybe we could fix in this PR?
Case 2 & 3 is not caused by this PR. I think we could fix case 2 & 3 in CALCITE-5537.

BTW, these code just fix case 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification, I agree with you. I'll review the PR in the coming days.

Copy link
Member

@libenchao libenchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, merging. Thanks @peng128 for your contribution!

@libenchao libenchao closed this in a990ecc Feb 25, 2023
eventd pushed a commit to Kyligence/calcite that referenced this pull request Mar 3, 2023
… the dialect allows

Close apache#3057

(cherry picked from commit a990ecc)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants