Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CALCITE-6804 Anti-join with WHERE NOT EXISTS syntax has corrupted condition #4177

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

antonkw
Copy link

@antonkw antonkw commented Feb 3, 2025

I don't expect that it can be merged as is.
In the first order, I want to demonstrate the issue with anti joins: https://issues.apache.org/jira/browse/CALCITE-6804

For any anti-join with input (left-side) more complex than single table scan, rex sub query can't be converted to SQL.
Reason: Filter's node input is not considered as something that can be referenced (via alias) in condition.
But anti-join is that exact case.

Example of logical plan:

LogicalProject(product_id=[$0])
  LogicalFilter(condition=[NOT(EXISTS({
LogicalFilter(condition=[=($cor0.product_id, $1)])
  JdbcTableScan(table=[[foodmart, product]])
}))], variablesSet=[[$cor0]])
    LogicalProject(product_id=[$1])
      LogicalJoin(condition=[=($1, $16)], joinType=[left])
        JdbcTableScan(table=[[foodmart, product]])
        JdbcTableScan(table=[[foodmart, product]])

$cor0 should eventually refer to whole input aliased as t.
Without proposed change $cor0 is being converted into JdbcTableScan(table=[[foodmart, product]])

@@ -456,7 +458,10 @@ public Result visit(Filter e) {
builder.context.toSql(null, e.getCondition())));
return builder.result();
} else {
final Result x = visitInput(e, 0, Clause.WHERE);
Result x = visitInput(e, 0, Clause.WHERE);
if (e.getCondition().getKind() == NOT || e.getCondition().getKind() == EXISTS) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these the only conditions that require a reset alias?

Copy link
Contributor

@suibianwanwank suibianwanwank Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this doesn't cover all cases. e.g.
e2."product_id" = selected."product_id" and e2."product_id" > 10
How about resetting on parseCorrelTable?

+++ b/core/src/main/java/org/apache/calcite/rel/rel2sql/RelToSqlConverter.java
@@ -1406,7 +1406,7 @@ public List<SqlNode> createAsFullOperands(RelDataType rowType, SqlNode leftOpera
 
   private void parseCorrelTable(RelNode relNode, Result x) {
     for (CorrelationId id : relNode.getVariablesSet()) {
-      correlTableMap.put(id, x.qualifiedContext());
+      correlTableMap.put(id, x.resetAlias().qualifiedContext());
     }
   }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mihaibudiu

why are these the only conditions that require a reset alias?

My take here is that typically predicates can't access the input via aliases. At least, for many SQL engines user ought to move the input into CTE to refer to it in filtering conditions. And JOINs with EXISTS/NOT EXISTS syntax are exceptions in that regard.

I thought about explicit function with name like canAccessInputViaAlias that processes condition. But it seems too nit-pick'y given the fact that I am sure only about two greenlighting keywords

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suibianwanwank
Can you please elaborate on what you suspected?

I added the predicate you mentioned:

  @Test void testAntiJoinWithComplexInput() {
    final String sql = "SELECT * FROM "
        + "(select * from ("
        + "select e1.\"product_id\" FROM \"foodmart\".\"product\" e1 "
        + "LEFT JOIN \"foodmart\".\"product\" e3 "
        + "on e1.\"product_id\" = e3.\"product_id\""
        + ")"
        + ") selected where not exists\n"
        + "(select 1 from \"foodmart\".\"product\" e2 "
        + "where e2.\"product_id\" = selected.\"product_id\" and e2.\"product_id\" > 10)";
    final String expected =
        "SELECT *\nFROM (SELECT \"product\".\"product_id\"\nFROM \"foodmart\".\"product\"\n"
            + "LEFT JOIN \"foodmart\".\"product\" AS \"product0\" "
            + "ON \"product\".\"product_id\" = \"product0\".\"product_id\") AS \"t\"\n"
            + "WHERE NOT EXISTS ("
            + "SELECT *\nFROM \"foodmart\".\"product\"\n"
            + "WHERE \"product_id\" = \"t\".\"product_id\" AND \"product_id\" > 10"
            + ")";
    sql(sql).ok(expected);
  }

Without escaping mess:

SELECT * FROM 
  foodmart.product 
WHERE 
  product_id = t.product_id AND product_id > 10

Seems good to me.

How about resetting on parseCorrelTable?

Feels unsafe to me. Particularly due to fact that such aliases are not legitimate in all cases(why we should represent and maintain state that isn't reflecting possible query).
I can try your suggestion to see what tests think about it :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, you are right. I misunderstood the meaning here.

@mihaibudiu mihaibudiu added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Feb 4, 2025
@mihaibudiu
Copy link
Contributor

Please use a commit message and a PR title that match exactly the issue in JIRA.
They are used to keep track of issues.

@antonkw antonkw changed the title CALCITE-6804 reset alias in anti-join Filter CALCITE-6804 Anti-join with WHERE NOT EXISTS syntax has corrupted condition Feb 10, 2025
@suibianwanwank
Copy link
Contributor

@antonkw Maybe I’m overthinking it, but I still believe that only considering NOT and EXISTS doesn't cover all possible cases for subqueries. Could you try the example with

    final String sql = "SELECT * FROM "
        + "(select * from ("
        + "select e1.\"product_id\" FROM \"foodmart\".\"product\" e1 "
        + "LEFT JOIN \"foodmart\".\"product\" e3 "
        + "on e1.\"product_id\" = e3.\"product_id\""
        + ")"
        + ") selected where 1 in\n"
        + "(select \"gross_weight\" from \"foodmart\".\"product\" e2 "
        + "where e2.\"product_id\" = selected.\"product_id\" and e2.\"product_id\" > 10)";

and see if it works?

@antonkw
Copy link
Author

antonkw commented Feb 10, 2025

@suibianwanwank

\"product_id\" = \"product\".\"product_class_id\"

here we go again :)

Added IN match and test has repaired.

  @Test void testFilterWithSubQuery() {
    final String sql = "SELECT * FROM "
        + "(select * from ("
        + "select e1.\"product_id\" FROM \"foodmart\".\"product\" e1 "
        + "LEFT JOIN \"foodmart\".\"product\" e3 "
        + "on e1.\"product_id\" = e3.\"product_id\""
        + ")"
        + ") selected where 1 in\n"
        + "(select \"gross_weight\" from \"foodmart\".\"product\" e2 "
        + "where e2.\"product_id\" = selected.\"product_id\" and e2.\"product_id\" > 10)";

    final String expected =
        "SELECT *\nFROM (SELECT \"product\".\"product_id\"\nFROM \"foodmart\".\"product\"\n"
            + "LEFT JOIN \"foodmart\".\"product\" AS \"product0\" "
            + "ON \"product\".\"product_id\" = \"product0\".\"product_id\") AS \"t\"\n"
            + "WHERE CAST(1 AS DOUBLE) IN ("
            + "SELECT \"gross_weight\"\nFROM \"foodmart\".\"product\"\n"
            + "WHERE \"product_id\" = \"t\".\"product_id\" AND \"product_id\" > 10)";

    sql(sql).ok(expected);
  }

Waiting for CI to see if everything IN-related is good

@antonkw
Copy link
Author

antonkw commented Feb 10, 2025

@suibianwanwank
I added the test with IN clause.
CI is green after catching IN sql-kind of conditions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LGTM-will-merge-soon Overall PR looks OK. Only minor things left.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants