Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Plan non-equijoin conditions as cross join followed by filter #15302

Merged
merged 18 commits into from
Nov 29, 2023

Conversation

abhishekagarwal87
Copy link
Contributor

@abhishekagarwal87 abhishekagarwal87 commented Nov 1, 2023

This PR revives #14978 with a few more bells and whistles. Instead of an unconditional cross-join, we will now split the join condition such that some conditions are now evaluated post-join. To decide what sub-condition goes where, I have refactored DruidJoinRule class to extract unsupported sub-conditions. We build a postJoinFilter out of these unsupported sub-conditions and push to the join.

Release notes
Druid can support arbitrary join conditions for INNER join. For INNER joins, druid will look at the join condition, and any sub-conditions that cannot be evaluated efficiently as part of the join, will be converted to a post-join filter. With this feature, you can do in-equality joins that were not possible before.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@github-actions github-actions bot added Area - Documentation Area - Batch Ingestion Area - Querying Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Nov 1, 2023
.build()
),
"j0.",
equalsCondition(makeColumnExpression("m1"), makeColumnExpression("j0.m1")),

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
CalciteTestBase.makeColumnExpression
should be avoided because it has been deprecated.
.build()
),
"j0.",
equalsCondition(makeColumnExpression("m1"), makeColumnExpression("j0.m1")),

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
CalciteTestBase.makeColumnExpression
should be avoided because it has been deprecated.
@abhishekagarwal87 abhishekagarwal87 marked this pull request as ready for review November 8, 2023 09:40
@@ -772,7 +772,7 @@ public void testJoinWithLookup()
DruidExpression.ofColumn(ColumnType.STRING, "dim2"),
DruidExpression.ofColumn(ColumnType.STRING, "j0.k")
),
JoinType.LEFT
JoinType.INNER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial query has a left join here and we end up planning it as in inner. Can there be a case where the user does not expect the results which have a pair on both left and right ? Also that seems to be case where the result changed for the not sql compatible mode

@@ -723,7 +723,7 @@ public void testFilterAndGroupByLookupUsingJoinOperatorBackwards(Map<String, Obj
),
"j0.",
equalsCondition(makeColumnExpression("k"), makeColumnExpression("j0.dim2")),
JoinType.RIGHT
JoinType.INNER
Copy link
Contributor

@soumyava soumyava Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before, we are converting a right join to an inner join here and that is changing the expected results

@@ -768,7 +762,7 @@ public void testFilterAndGroupByLookupUsingJoinOperatorWithNotFilter(Map<String,
new LookupDataSource("lookyloo"),
"j0.",
equalsCondition(makeColumnExpression("dim2"), makeColumnExpression("j0.k")),
JoinType.LEFT
JoinType.INNER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a post-join filter that will reject nulls on the right side. Hence the SQL query is equivalent to INNER join with post-join filter moved into the join condition itself. That's also the same reason why some other join queries have become INNER join queries. I have also now tweaked the rule a bit so that we still reject non-equi conditions for non-sql compatible mode. With that, results in non-sql compatible mode won't change.

@@ -821,7 +809,7 @@ public void testJoinUnionTablesOnLookup(Map<String, Object> queryContext)
new LookupDataSource("lookyloo"),
"j0.",
equalsCondition(makeColumnExpression("dim2"), makeColumnExpression("j0.k")),
JoinType.LEFT
JoinType.INNER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment as the previous one

"1",
NullHandling.sqlCompatible() ?
equalsCondition(
DruidExpression.fromExpression("CAST(\"j0.k\", 'LONG')"),

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
DruidExpression.fromExpression
should be avoided because it has been deprecated.
@@ -740,7 +740,6 @@ public void testFilterAndGroupByLookupUsingJoinOperatorBackwards(Map<String, Obj
new Object[]{"xabc", 1L}
)
: ImmutableList.of(
new Object[]{null, 5L},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This result changed because there is a bug in splitting the join filter on leaf nodes. A filter that can reject null cannot be pushed into the left table if the join is right. But that's what https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java#L198 will do right now. I will fix this in another PR.

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

docs/querying/datasource.md Outdated Show resolved Hide resolved
Comment on lines +269 to +270
// I am not sure in what case, the list of system fields will be not empty. I have just picked up this logic
// directly from https://github.com/apache/calcite/blob/calcite-1.35.0/core/src/main/java/org/apache/calcite/rel/rules/AbstractJoinExtractFilterRule.java#L58
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is systemFieldList? i dug around a bit and isnt super obvious to me 🙃

   * @param systemFieldList  List of system fields that will be prefixed to
   *                         output row type; typically empty but must not be
   *                         null

or Returns a list of system fields that will be prefixed to output row type. are all that i can find but i don't quite fully understand what it means

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure either. I had asked @kgyrtkirk, and he didn't have any answer either. Maybe @gianm knows since he was exchanging emails with calcite community recently over other kind of system fields (like file names in input format).
In any case, I don't think this list will ever be non-empty based on its usages I found in calcite.

@abhishekagarwal87 abhishekagarwal87 merged commit 0a56c87 into apache:master Nov 29, 2023
83 checks passed
@abhishekagarwal87
Copy link
Contributor Author

Thank you @clintropolis and @soumyava for review.

yashdeep97 pushed a commit to yashdeep97/druid that referenced this pull request Dec 1, 2023
…pache#15302)

This PR revives apache#14978 with a few more bells and whistles. Instead of an unconditional cross-join, we will now split the join condition such that some conditions are now evaluated post-join. To decide what sub-condition goes where, I have refactored DruidJoinRule class to extract unsupported sub-conditions. We build a postJoinFilter out of these unsupported sub-conditions and push to the join.
yashdeep97 pushed a commit to yashdeep97/druid that referenced this pull request Dec 1, 2023
…pache#15302)

This PR revives apache#14978 with a few more bells and whistles. Instead of an unconditional cross-join, we will now split the join condition such that some conditions are now evaluated post-join. To decide what sub-condition goes where, I have refactored DruidJoinRule class to extract unsupported sub-conditions. We build a postJoinFilter out of these unsupported sub-conditions and push to the join.
abhishekagarwal87 pushed a commit that referenced this pull request Jan 4, 2024
…lter rule (#15511)

FILTER_INTO_JOIN is mainly run along with the other rules with the Volcano planner; however if the query starts highly underdefined (join conditions in the where clauses) that generic query could give a lot of room for the other rules to play around with only enabled it for when the join uses subqueries for its inputs. 

PROJECT_FILTER rule is not that useful. and could increase planning times by providing new plans. This problem worsened after we started supporting inner joins with arbitrary join conditions in #15302
@LakshSingla LakshSingla added this to the 29.0.0 milestone Jan 29, 2024
@maytasm
Copy link
Contributor

maytasm commented May 21, 2024

@abhishekagarwal87 This doesn't work if we are running in useDefaultValueForNull (Legacy) mode right?

@abhishekagarwal87
Copy link
Contributor Author

@maytasm Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - Documentation Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Querying Release Notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants