Clean up hash_join's ExecutionPlan::execute#15418
Conversation
|
I think similar changes can be made to cross_join and nested_loop_join if desired. |
This backs out commit 4346cb8.
berkaysynnada
left a comment
There was a problem hiding this comment.
Thank you @ctsk. I believe we can merge this as is. However I'd like to raise one thing that comes to mind whenever I look this code. I'm not very comfortable with adding a CoalescePartitions after calling execute(). Modifying the plan post-execute() feels a bit off to me. Does it seem like a smell to you as well?
🤔 I think coalesce added before However just noticed the execute currently happens in main thread instead of |
This should not trigger for physical plans generated by datafusion, since the EnforceDistribution pass already adds that CoalescePartitionsExec. |
I mean HashJoin::execute() |
You mean |
I wanted to keep the PR simple and just refactor. Here is a follow-up that removes it: #15476. Let's see if the tests pass :) EnforceDistribution adds it here: datafusion/datafusion/physical-optimizer/src/enforce_distribution.rs Lines 940 to 968 in fa452e6
datafusion/datafusion/physical-optimizer/src/enforce_distribution.rs Lines 1256 to 1259 in fa452e6 This apples to CollectLeft HJs left child: datafusion/datafusion/physical-plan/src/joins/hash_join.rs Lines 705 to 710 in fa452e6 |
|
Closed in favor of #15476 |
Rationale for this change + What changes are included in this PR?
The logic of HashJoin's
executeimplements differs from the other operators - The recursive call to theexecutestep of the build side are delayed untilcollect_left_joinis poll'ed. This PR changes that to make it more alike to the standard implementations ofexecute.Are these changes tested?
No changes in behaviour are expected.
Are there any user-facing changes?
No