-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Plan non-equijoin conditions as cross join followed by filter #15302
SQL: Plan non-equijoin conditions as cross join followed by filter #15302
Conversation
sql/src/test/java/org/apache/druid/sql/calcite/CalciteJoinQueryTest.java
Fixed
Show fixed
Hide fixed
.build() | ||
), | ||
"j0.", | ||
equalsCondition(makeColumnExpression("m1"), makeColumnExpression("j0.m1")), |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation Note test
CalciteTestBase.makeColumnExpression
.build() | ||
), | ||
"j0.", | ||
equalsCondition(makeColumnExpression("m1"), makeColumnExpression("j0.m1")), |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation Note test
CalciteTestBase.makeColumnExpression
sql/src/main/java/org/apache/druid/sql/calcite/rule/JoinRules.java
Outdated
Show resolved
Hide resolved
@@ -772,7 +772,7 @@ public void testJoinWithLookup() | |||
DruidExpression.ofColumn(ColumnType.STRING, "dim2"), | |||
DruidExpression.ofColumn(ColumnType.STRING, "j0.k") | |||
), | |||
JoinType.LEFT | |||
JoinType.INNER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial query has a left join here and we end up planning it as in inner. Can there be a case where the user does not expect the results which have a pair on both left and right ? Also that seems to be case where the result changed for the not sql compatible mode
@@ -723,7 +723,7 @@ public void testFilterAndGroupByLookupUsingJoinOperatorBackwards(Map<String, Obj | |||
), | |||
"j0.", | |||
equalsCondition(makeColumnExpression("k"), makeColumnExpression("j0.dim2")), | |||
JoinType.RIGHT | |||
JoinType.INNER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as before, we are converting a right join to an inner join here and that is changing the expected results
@@ -768,7 +762,7 @@ public void testFilterAndGroupByLookupUsingJoinOperatorWithNotFilter(Map<String, | |||
new LookupDataSource("lookyloo"), | |||
"j0.", | |||
equalsCondition(makeColumnExpression("dim2"), makeColumnExpression("j0.k")), | |||
JoinType.LEFT | |||
JoinType.INNER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a post-join filter that will reject nulls on the right side. Hence the SQL query is equivalent to INNER join with post-join filter moved into the join condition itself. That's also the same reason why some other join queries have become INNER join queries. I have also now tweaked the rule a bit so that we still reject non-equi conditions for non-sql compatible mode. With that, results in non-sql compatible mode won't change.
@@ -821,7 +809,7 @@ public void testJoinUnionTablesOnLookup(Map<String, Object> queryContext) | |||
new LookupDataSource("lookyloo"), | |||
"j0.", | |||
equalsCondition(makeColumnExpression("dim2"), makeColumnExpression("j0.k")), | |||
JoinType.LEFT | |||
JoinType.INNER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment as the previous one
sql/src/test/java/org/apache/druid/sql/calcite/DecoupledPlanningCalciteQueryTest.java
Outdated
Show resolved
Hide resolved
"1", | ||
NullHandling.sqlCompatible() ? | ||
equalsCondition( | ||
DruidExpression.fromExpression("CAST(\"j0.k\", 'LONG')"), |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation Note test
DruidExpression.fromExpression
This reverts commit 2b5026e.
@@ -740,7 +740,6 @@ public void testFilterAndGroupByLookupUsingJoinOperatorBackwards(Map<String, Obj | |||
new Object[]{"xabc", 1L} | |||
) | |||
: ImmutableList.of( | |||
new Object[]{null, 5L}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This result changed because there is a bug in splitting the join filter on leaf nodes. A filter that can reject null cannot be pushed into the left table if the join is right. But that's what https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/join/filter/JoinFilterAnalyzer.java#L198 will do right now. I will fix this in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
sql/src/test/java/org/apache/druid/sql/calcite/rule/DruidJoinRuleTest.java
Outdated
Show resolved
Hide resolved
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidJoinRule.java
Outdated
Show resolved
Hide resolved
// I am not sure in what case, the list of system fields will be not empty. I have just picked up this logic | ||
// directly from https://github.com/apache/calcite/blob/calcite-1.35.0/core/src/main/java/org/apache/calcite/rel/rules/AbstractJoinExtractFilterRule.java#L58 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is systemFieldList? i dug around a bit and isnt super obvious to me 🙃
* @param systemFieldList List of system fields that will be prefixed to
* output row type; typically empty but must not be
* null
or Returns a list of system fields that will be prefixed to output row type.
are all that i can find but i don't quite fully understand what it means
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure either. I had asked @kgyrtkirk, and he didn't have any answer either. Maybe @gianm knows since he was exchanging emails with calcite community recently over other kind of system fields (like file names in input format).
In any case, I don't think this list will ever be non-empty based on its usages I found in calcite.
Thank you @clintropolis and @soumyava for review. |
…pache#15302) This PR revives apache#14978 with a few more bells and whistles. Instead of an unconditional cross-join, we will now split the join condition such that some conditions are now evaluated post-join. To decide what sub-condition goes where, I have refactored DruidJoinRule class to extract unsupported sub-conditions. We build a postJoinFilter out of these unsupported sub-conditions and push to the join.
…pache#15302) This PR revives apache#14978 with a few more bells and whistles. Instead of an unconditional cross-join, we will now split the join condition such that some conditions are now evaluated post-join. To decide what sub-condition goes where, I have refactored DruidJoinRule class to extract unsupported sub-conditions. We build a postJoinFilter out of these unsupported sub-conditions and push to the join.
…lter rule (#15511) FILTER_INTO_JOIN is mainly run along with the other rules with the Volcano planner; however if the query starts highly underdefined (join conditions in the where clauses) that generic query could give a lot of room for the other rules to play around with only enabled it for when the join uses subqueries for its inputs. PROJECT_FILTER rule is not that useful. and could increase planning times by providing new plans. This problem worsened after we started supporting inner joins with arbitrary join conditions in #15302
@abhishekagarwal87 This doesn't work if we are running in useDefaultValueForNull (Legacy) mode right? |
@maytasm Yes |
This PR revives #14978 with a few more bells and whistles. Instead of an unconditional cross-join, we will now split the join condition such that some conditions are now evaluated post-join. To decide what sub-condition goes where, I have refactored
DruidJoinRule
class to extract unsupported sub-conditions. We build a postJoinFilter out of these unsupported sub-conditions and push to the join.Release notes
Druid can support arbitrary join conditions for INNER join. For INNER joins, druid will look at the join condition, and any sub-conditions that cannot be evaluated efficiently as part of the join, will be converted to a post-join filter. With this feature, you can do in-equality joins that were not possible before.
This PR has: