-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add motionhazard to the outer side of parallel aware join.(fix flaky incorrect results of agg) #284
Conversation
c7865ca
to
e4b313c
Compare
Results check of case in #259 , rewrite to sql select count(i3::numeric), sum(i3::numeric), avg(i3::numeric) from
(SELECT t1.* FROM test_hj_spill AS t1 right JOIN test_hj_spill AS t2 ON
t1.i1=t2.i2) foo; correct result: count | sum | avg
-------+----------+----------------------
45000 | 22477500 | 499.5000000000000000
(1 row) without this patch500 loops, 470 succeed to run, 30 wrong results gpadmin@i-v3roa8c2:/var/tmp$ cat res.log | grep -i count | wc -l
470
gpadmin@i-v3roa8c2:/var/tmp$ cat res.log | grep -i 45000 | wc -l
440 It's possible that some loops didn't succeed to run a sql file, because I limit the resource and CBDB may get errors like: psql:p.sql:3: ERROR: failed to acquire resources on one or more segments
DETAIL: connection to server at "127.0.1.1", port 9004 failed: FATAL: reader could not find writer proc entry
DETAIL: lock [0,1260] AccessShareLock 0. Probably because writer gang is gone somehow. Maybe try rerunning.
(seg2 127.0.1.1:9004) It's expected, just bypass them. With this patch:500 loops, 474 succeed to run, 0 wrong results gpadmin@i-v3roa8c2:/var/tmp$ cat res.log | grep -i count | wc -l
474
gpadmin@i-v3roa8c2:/var/tmp$ cat res.log | grep -i 45000 | wc -l
474 comparison500 loops test
|
e4b313c
to
2a994b5
Compare
For a parallel aware join, we may partition outer data to batches if there is not enough memory.And if a worker has consumed all the data from subnode, it will arrive and wait for others to begin next phase: PHJ_BUILD_DONE. PHJ_BUILD_DONE means that we has partitioned every thing from PHJ_BUILD_HASHING_OUTER phase and are ready to probe. It's true for Postgres, becuase that if a worker pulls no data from subnode and it leaves PHJ_BUILD_HASHING_OUTER phase, all subnode's data must be processed(ex: a Parallel Seqscan). As all participants leaves PHJ_BUILD_HASHING_OUTER, the whole data should be pulled completely by these processes, or there may be other siblings who have never participated in phase PHJ_BUILD_HASHING_OUTER at all. But it's not true for CBDB, a plan node may have Motions behind. Some parallel workers pull no data from subnode, it doesn't mean that other workers have no data to process at all. We need to wait for all parallel workers together instead of build barrier's participants arrived at the current phase before we move to the next phase. Add an outer_motion_barrier to ensure all parallel workers finish their work of partition outer batches. Authored-by: Zhang Mingli avamingli@gmail.com
2a994b5
to
12bb939
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
For a parallel aware join, we may partition outer data to batches if there is not enough memory.And if a worker has consumed all the data from subnode, it will arrive and wait for others to begin next phase: PHJ_BUILD_DONE.
PHJ_BUILD_DONE means that we has partitioned every thing from PHJ_BUILD_HASHING_OUTER phase and are ready to probe. It's true for Postgres, becuase that if a worker pulls no data from subnode and it leaves PHJ_BUILD_HASHING_OUTER phase, all subnode's data must be processed(ex: a Parallel Seqscan).
As all participants leaves PHJ_BUILD_HASHING_OUTER, the whole data should be pulled completely by these processes, or there may be other siblings who have never participated in phase PHJ_BUILD_HASHING_OUTER at all.
But it's not true for CBDB, a plan node may have Motions behind. Some parallel workers pull no data from subnode, it doesn't mean that other workers have no data to process at all. We need to wait for all parallel workers together instead of build barrier's participants arrived at the current phase before we move to the next phase.
Add an outer_motion_barrier to ensure all parallel workers finish their work of partition outer batches.
Authored-by: Zhang Mingli avamingli@gmail.com
closes: #ISSUE_Number
Change logs
Describe your change clearly, including what problem is being solved or what feature is being added.
If it has some breaking backward or forward compatibility, please clary.
Why are the changes needed?
Describe why the changes are necessary.
Does this PR introduce any user-facing change?
If yes, please clarify the previous behavior and the change this PR proposes.
How was this patch tested?
Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.
Contributor's Checklist
Here are some reminders and checklists before/when submitting your pull request, please check them:
make installcheck
make -C src/test installcheck-cbdb-parallel