-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retire GUC parallel_hash_enable_motion_broadcast & fix potential Assertion failure #103
Retire GUC parallel_hash_enable_motion_broadcast & fix potential Assertion failure #103
Conversation
1eb3088
to
8ba6bfd
Compare
By fixing regression diffs, found some behavior changes. We need to discuss about this pr. |
Assert failure found: #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=139765160147328) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=139765160147328) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=139765160147328, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007f1d9f766476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x00007f1d9f74c7f3 in __GI_abort () at ./stdlib/abort.c:79
#5 0x00005646ea5c4b0e in ExceptionalCondition (
conditionName=0x5646ead83280 "!CdbPathLocus_IsHashedWorkers(a) && !CdbPathLocus_IsSegmentGeneralWorkers(a) && !CdbP
athLocus_IsReplicatedWorkers(a)", errorType=0x5646ead830c5 "FailedAssertion",
fileName=0x5646ead83144 "cdbpathlocus.c", lineNumber=758) at assert.c:48
#6 0x00005646ea6c1797 in cdbpathlocus_join (jointype=JOIN_INNER, a=..., b=...) at cdbpathlocus.c:758
#7 0x00005646ea6bae8c in cdbpath_motion_for_join (root=0x5646f30bbfa8, jointype=JOIN_INNER,
p_outer_path=0x7ffcf02bbb70, p_inner_path=0x7ffcf02bbb68, p_rowidexpr_id=0x7ffcf02bbbac,
redistribution_clauses=0x5646f301e998, restrict_clauses=0x5646f301e7e0, outer_pathkeys=0x0, inner_pathkeys=0x0,
outer_require_existing_order=false, inner_require_existing_order=false) at cdbpath.c:2196
#8 0x00005646ea288af0 in create_hashjoin_path (root=0x5646f30bbfa8, joinrel=0x5646f301c498, jointype=JOIN_INNER,
orig_jointype=JOIN_INNER, workspace=0x7ffcf02bbca0, extra=0x7ffcf02bbe70, outer_path=0x5646f3015460,
inner_path=0x5646f301f2f0, parallel_hash=false, restrict_clauses=0x5646f301e7e0, required_outer=0x0,
redistribution_clauses=0x5646f301e998, hashclauses=0x5646f301f298) at pathnode.c:4352
#9 0x00005646ea219a7d in try_partial_hashjoin_path (root=0x5646f30bbfa8, joinrel=0x5646f301c498,
outer_path=0x5646f3015460, inner_path=0x5646f30c28e0, hashclauses=0x5646f301f298, jointype=JOIN_INNER,
orig_jointype=JOIN_INNER, extra=0x7ffcf02bbe70, parallel_hash=false) at joinpath.c:1149
#10 0x00005646ea21b9a5 in hash_inner_and_outer (root=0x5646f30bbfa8, joinrel=0x5646f301c498, outerrel=0x5646f3012ef8,
innerrel=0x5646f30bde98, jointype=JOIN_INNER, extra=0x7ffcf02bbe70) at joinpath.c:2315 |
[outerjoinpath]
-> CdbMotionPath [pathtype=T_Motion parallel_aware=false parallel_safe=true parallel_workers=0 rows=10 startup_cost=0 total_cost=1.1333333333333333 memory=0 locus={locustype = CdbLocusType_ReplicatedWorkers, distkey = 0x0, numsegments = 3, parallel_workers = 0}
motionHazard=true barrierHazard=false rescannable=false sameslice_relids=0x0]
[is_explicit_motion=false]
[parent] (RelOptInfo *)0x5602ef40a838
--Type <RET> for more, q to quit, c to continue without paging--
[pathtarget]
PathTarget [sortgrouprefs=0x0 cost={startup = 0, per_tuple = 0} width=4 has_volatile_expr=VOLATILITY_UNKNOWN]
[exprs]
Var [varno=4 varattno=5 vartype=23 varnosyn=4 varattnosyn=5]
[subpath]
-> Path [pathtype=T_SeqScan parallel_aware=true parallel_safe=true parallel_workers=2 rows=1.6666666666666667 startup_cost=0 total_cost=1.0166666666666666
memory=0 locus={locustype = CdbLocusType_HashedWorkers, distkey = 0x5602ef40f698, numsegments = 3, parallel_workers = 2} motionHazard=false barrierHazard=false
rescannable=true sameslice_relids=0x00000010]
[parent] (RelOptInfo *)0x5602ef40a838
[pathtarget]
PathTarget [sortgrouprefs=0x0 cost={startup = 0, per_tuple = 0} width=4 has_volatile_expr=VOLATILITY_UNKNOWN]
[exprs]
Var [varno=4 varattno=5 vartype=23 varnosyn=4 varattnosyn=5]
[innerjoinpath]
-> HashPath [pathtype=T_HashJoin parallel_aware=false parallel_safe=true parallel_workers=0 rows=7.4166667759418488 startup_cost=1.0750000000000002 total_cost=2.3283333344260853
memory=0 locus={locustype = CdbLocusType_Hashed, distkey = 0x5602ef40e788, numsegments = 3, parallel_workers = 0} motionHazard=true barrierHazard=false
rescannable=false sameslice_relids=0x00000002]
[jointype=JOIN_INNER inner_unique=false]
[num_batches=1 inner_rows_total=3.3333333333333335 batch0_barrier=false]
[parent] (RelOptInfo *)0x5602ef5d1380
[pathtarget]
PathTarget [sortgrouprefs=0x0 cost={startup = 0, per_tuple = 0} width=20 has_volatile_expr=VOLATILITY_UNKNOWN]
[exprs]
Var [varno=1 varattno=2 vartype=23 varnosyn=1 varattnosyn=2]
Var [varno=1 varattno=14 vartype=23 varnosyn=1 varattnosyn=14]
Var [varno=1 varattno=15 vartype=23 varnosyn=1 varattnosyn=15]
Var [varno=5 varattno=1 vartype=23 varnosyn=5 varattnosyn=1]
Var [varno=5 varattno=13 vartype=23 varnosyn=5 varattnosyn=13]
[outerjoinpath]
-> CdbMotionPath [pathtype=T_Motion parallel_aware=false parallel_safe=true parallel_workers=0 rows=10 startup_cost=0 total_cost=1.1333333333333333 memory=0 locus={locustype = CdbLocusType_Replicated, distkey = 0x0, numsegments = 3, parallel_workers = 0}
motionHazard=true barrierHazard=false rescannable=false sameslice_relids=0x0]
[is_explicit_motion=false]
[parent] (RelOptInfo *)0x5602ef40ac68
[pathtarget]
PathTarget [sortgrouprefs=0x0 cost={startup = 0, per_tuple = 0} width=12 has_volatile_expr=VOLATILITY_UNKNOWN]
[exprs]
--Type <RET> for more, q to quit, c to continue without paging--
Var [varno=5 varattno=1 vartype=23 varnosyn=5 varattnosyn=1]
Var [varno=5 varattno=13 vartype=23 varnosyn=5 varattnosyn=13]
Var [varno=5 varattno=3 vartype=23 varnosyn=5 varattnosyn=3]
[subpath]
-> Path [pathtype=T_SeqScan parallel_aware=true parallel_safe=true parallel_workers=2 rows=1.6666666666666667 startup_cost=0 total_cost=1.0166666666666666
memory=0 locus={locustype = CdbLocusType_HashedWorkers, distkey = 0x5602ef40fcf0, numsegments = 3, parallel_workers = 2} motionHazard=false barrierHazard=false
rescannable=true sameslice_relids=0x00000020]
[parent] (RelOptInfo *)0x5602ef40ac68
[pathtarget]
PathTarget [sortgrouprefs=0x0 cost={startup = 0, per_tuple = 0} width=12 has_volatile_expr=VOLATILITY_UNKNOWN]
[exprs]
Var [varno=5 varattno=1 vartype=23 varnosyn=5 varattnosyn=1]
Var [varno=5 varattno=13 vartype=23 varnosyn=5 varattnosyn=13]
Var [varno=5 varattno=3 vartype=23 varnosyn=5 varattnosyn=3]
[innerjoinpath]
--Type <RET> for more, q to quit, c to continue without paging--
-> Path [pathtype=T_SeqScan parallel_aware=false parallel_safe=true parallel_workers=0 rows=3.3333333333333335 startup_cost=0 total_cost=1.0333333333333334
memory=0 locus={locustype = CdbLocusType_Hashed, distkey = 0x5602ef40e788, numsegments = 3, parallel_workers = 0} motionHazard=false barrierHazard=false
rescannable=true sameslice_relids=0x00000002] The RC is |
40712ed
to
cf97ac3
Compare
4b826e3
to
1011ad4
Compare
It's added by history reasons as a workaround. We don't have exactly the right parallel feature at the time. It's neither reasonable to keep it, nor we have test cases for this wired GUC. Retire it and keep codes same with UPSTREAM. Authored-by: Zhang Mingli avamingli@gmail.com
1011ad4
to
12dc126
Compare
When there is a parallel ware join, we always make ParallelBroadcastMotion before to guarantee all data are replicated on segments across parallel processes, even if target side's parallel workers is 0, For example, inner(Hashed, workers=0) Join outer(HashedWorkers, workers=2) when parallel aware. If we ParallelBroadcastMotion the outer side to join with inner, it will get outer(ReplicatedWorkers) Join inner(Hashed) = Join Locus(HashedWorkers, workers=0). That will cause an Assert Failure in cdbpathlocus_parallel_join(). So, we should use BroadcastMotion instead of ParallelBroadcastMotion if target side's parallel workers is 0 when parallel aware. There are no differences between BroadcastMotion and ParallelBroadcastMotion when execution if target slice's parallel workers is 0. Another benefit is we could get a better Hashed locus instead of HashedWorkers for upper join with others directly without Motion. Authored-by: Zhang Mingli avamingli@gmail.com
12dc126
to
3c3c227
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There are two problems here, one is the GUC retire, another one is found after retiring GUC that a potential Assertion failure we didn't reproduce it well before.
Please review this pr by commits.
Retire GUC parallel_hash_enable_motion_broadcast
It's added by history reasons as a workaround.
We don't have exactly the right parallel feature at the time.
It's neither reasonable to keep it, nor we have test cases for this wired GUC.
Retire it and keep codes same with UPSTREAM.
Make BroadcastMotion if target plan's parallel < 1 when parallel aware.
When there is a parallel ware join, we always make ParallelBroadcastMotion
before to guarantee all data are replicated on segments across parallel
processes, even if target side's parallel workers is 0,
For example, inner(Hashed, workers=0) Join outer(HashedWorkers, workers=2)
when parallel aware.
If we ParallelBroadcastMotion the outer side to join with inner, it will
get outer(ReplicatedWorkers) Join inner(Hashed) = Join Locus(HashedWorkers,
workers=0).
That will cause an Assert Failure in cdbpathlocus_parallel_join().
So, we should use BroadcastMotion instead of ParallelBroadcastMotion if
target side's parallel workers is 0 when parallel aware.
There are no differences between BroadcastMotion and ParallelBroadcastMotion
when execution if target slice's parallel workers is 0.
Another benefit is we could get a better Hashed locus instead of HashedWorkers
for upper join with others directly without Motion
Authored-by: Zhang Mingli avamingli@gmail.com
closes: #96
Change logs
Describe your change clearly, including what problem is being solved or what feature is being added.
If it has some breaking backward or forward compatibility, please clary.
Why are the changes needed?
Describe why the changes are necessary.
Does this PR introduce any user-facing change?
If yes, please clarify the previous behavior and the change this PR proposes.
How was this patch tested?
Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.
Contributor's Checklist
Here are some reminders and checklists before/when submitting your pull request, please check them:
make installcheck
make -C src/test installcheck-cbdb-parallel