-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-18846][Scheduler] Fix flakiness in SchedulerIntegrationSuite #16270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There is a small race in SchedulerIntegrationSuite. The test assumes that the taskscheduler thread processing that last task will finish before the DAGScheduler processes the task event and notifies the job waiter, but that is not 100% guaranteed.
|
|
||
| import org.scalactic.TripleEquals | ||
| import org.scalatest.concurrent.Eventually._ | ||
| import org.scalatest.concurrent.PatienceConfiguration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need this? (I compiled successfully w/o it)
| // and notifies the job waiter before our original thread in the task scheduler finishes | ||
| // handling the event and marks the taskset as complete. So its ok if we need to wait a | ||
| // *little* bit longer for the original taskscheduler thread to finish up to deal w/ the race. | ||
| eventually(timeout(1 second), interval(100 millis)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might do 10 millis here since it seems like there's not much cost here to checking more frequently, and 100 millis is somewhat long to wait
| import org.scalactic.TripleEquals | ||
| import org.scalatest.concurrent.Eventually._ | ||
| import org.scalatest.time.SpanSugar._ | ||
| import org.scalatest.Assertions.AssertionsHelper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
derp sorry just noticed this one is out of order (should be above org.scalatest.concurrent)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no problem, sorry I never get these right -- I actually thought package imports were supposed to be first and checked some other places to realize you are right.
|
LGTM |
|
Test build #70098 has finished for PR 16270 at commit
|
|
Test build #70099 has finished for PR 16270 at commit
|
|
Test build #70104 has finished for PR 16270 at commit
|
|
Test build #70108 has finished for PR 16270 at commit
|
|
merged to master |
There is a small race in SchedulerIntegrationSuite. The test assumes that the taskscheduler thread processing that last task will finish before the DAGScheduler processes the task event and notifies the job waiter, but that is not 100% guaranteed. ran the test locally a bunch of times, never failed, though admittedly it never failed locally for me before either. However I am nearly 100% certain this is what caused the failure of one jenkins build https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68694/consoleFull (which is long gone now, sorry -- I fixed it as part of apache#14079 initially) Author: Imran Rashid <irashid@cloudera.com> Closes apache#16270 from squito/sched_integ_flakiness.
There is a small race in SchedulerIntegrationSuite. The test assumes that the taskscheduler thread processing that last task will finish before the DAGScheduler processes the task event and notifies the job waiter, but that is not 100% guaranteed. ran the test locally a bunch of times, never failed, though admittedly it never failed locally for me before either. However I am nearly 100% certain this is what caused the failure of one jenkins build https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68694/consoleFull (which is long gone now, sorry -- I fixed it as part of apache#14079 initially) Author: Imran Rashid <irashid@cloudera.com> Closes apache#16270 from squito/sched_integ_flakiness.
What changes were proposed in this pull request?
There is a small race in SchedulerIntegrationSuite.
The test assumes that the taskscheduler thread
processing that last task will finish before the DAGScheduler processes
the task event and notifies the job waiter, but that is not 100%
guaranteed.
How was this patch tested?
ran the test locally a bunch of times, never failed, though admittedly it never failed locally for me either. However I am nearly 100% certain this is what caused the failure of one jenkins build https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68694/consoleFull (which is long gone now, sorry -- I fixed it as part of #14079 initially)