-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix colocated join when no mapping project is used #13666
Fix colocated join when no mapping project is used #13666
Conversation
7e54702
to
316ae09
Compare
316ae09
to
dae4f3d
Compare
" │ ├── [2]@localhost:2|[2] MAIL_SEND(HASH_DISTRIBUTED)[PARTITIONED]->{[1]@localhost:2|[2]} (Subtree Omitted)\n", | ||
" │ ├── [2]@localhost:2|[3] MAIL_SEND(HASH_DISTRIBUTED)[PARTITIONED]->{[1]@localhost:2|[3]} (Subtree Omitted)\n", | ||
" │ ├── [2]@localhost:1|[0] MAIL_SEND(HASH_DISTRIBUTED)[PARTITIONED]->{[1]@localhost:1|[0]} (Subtree Omitted)\n", | ||
" │ └── [2]@localhost:1|[1] MAIL_SEND(HASH_DISTRIBUTED)[PARTITIONED]->{[1]@localhost:1|[1]}\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In master this is:
│ ├── [2]@localhost:2|[2] MAIL_SEND(HASH_DISTRIBUTED)->{[1]@localhost:1|[1],[1]@localhost:2|[0]} (Subtree Omitted)
│ ├── [2]@localhost:2|[3] MAIL_SEND(HASH_DISTRIBUTED)->{[1]@localhost:1|[1],[1]@localhost:2|[0]} (Subtree Omitted)
│ ├── [2]@localhost:1|[0] MAIL_SEND(HASH_DISTRIBUTED)->{[1]@localhost:1|[1],[1]@localhost:2|[0]} (Subtree Omitted)
│ └── [2]@localhost:1|[1] MAIL_SEND(HASH_DISTRIBUTED)->{[1]@localhost:1|[1],[1]@localhost:2|[0]}
"description": "explain plan with colocated join and a projection that doesn't keep the key column", | ||
"sql": "EXPLAIN IMPLEMENTATION PLAN FOR SELECT a.mySum, b.col3 FROM (select col3 as col2, col3 + col2 as mySum from a /*+ tableOptions(partition_function='hashcode', partition_key='col2', partition_size='4') */) as a JOIN b /*+ tableOptions(partition_function='hashcode', partition_key='col1', partition_size='4') */ ON a.col2 = b.col1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an extra test to verify that when projection is removing the partition key, colocated is not being used.
...-planner/src/main/java/org/apache/pinot/calcite/rel/rules/PinotRelDistributionTraitRule.java
Outdated
Show resolved
Hide resolved
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13666 +/- ##
============================================
+ Coverage 61.75% 62.00% +0.25%
+ Complexity 207 198 -9
============================================
Files 2436 2554 +118
Lines 133233 140620 +7387
Branches 20636 21875 +1239
============================================
+ Hits 82274 87195 +4921
- Misses 44911 46795 +1884
- Partials 6048 6630 +582
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this!
(In a separate PR) In order to have a way to work around similar bugs, can we enforce using/not using colocated join if query hint is explicitly provided?
This PR fixes an issue related to colocated join.
For example, in ColocatedJoinEngineQuickStart, the following query should be executed in colocated fashion given:
But the actual plan is:
Which is not colocated (see the first join input).
The reason is that
PinotRelDistributionTraitRule.deriveDistribution
is doinginputRelDistribution.apply(project.getMapping())
for projects.project.getMapping()
returns null if the project is not a mapping, which is defined as _all inputs are references. In our case the project includes the projection key, but it also includes a call to+
so it is not mapping.What
PinotRelDistributionTraitRule.deriveDistribution
should be doing is similar to Calcites RelMdDistribution#L165 (in fact we should be using that class, but that would require some refactors) which is ask for a partial mapping instead of complete mapping.I've tried that solution, but found what it looks to be a bug in Calcite. I've reported the issue in Calcite's dev mailist (see https://lists.apache.org/thread/qz18qxrfp5bqldnoln2tg4582g402zyv). But there is a simple solution that consist on creating a copy of the mapping itself and that is what I'm adding here (including a note to try to remove my hack once the issue or my lack of knowledge is solved).