Skip to content

Conversation

@squito
Copy link
Contributor

@squito squito commented Dec 13, 2016

What changes were proposed in this pull request?

There is a small race in SchedulerIntegrationSuite.
The test assumes that the taskscheduler thread
processing that last task will finish before the DAGScheduler processes
the task event and notifies the job waiter, but that is not 100%
guaranteed.

How was this patch tested?

ran the test locally a bunch of times, never failed, though admittedly it never failed locally for me either. However I am nearly 100% certain this is what caused the failure of one jenkins build https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68694/consoleFull (which is long gone now, sorry -- I fixed it as part of #14079 initially)

There is a small race in SchedulerIntegrationSuite.
The test assumes that the taskscheduler thread
processing that last task will finish before the DAGScheduler processes
the task event and notifies the job waiter, but that is not 100%
guaranteed.

import org.scalactic.TripleEquals
import org.scalatest.concurrent.Eventually._
import org.scalatest.concurrent.PatienceConfiguration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need this? (I compiled successfully w/o it)

// and notifies the job waiter before our original thread in the task scheduler finishes
// handling the event and marks the taskset as complete. So its ok if we need to wait a
// *little* bit longer for the original taskscheduler thread to finish up to deal w/ the race.
eventually(timeout(1 second), interval(100 millis)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might do 10 millis here since it seems like there's not much cost here to checking more frequently, and 100 millis is somewhat long to wait

import org.scalactic.TripleEquals
import org.scalatest.concurrent.Eventually._
import org.scalatest.time.SpanSugar._
import org.scalatest.Assertions.AssertionsHelper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

derp sorry just noticed this one is out of order (should be above org.scalatest.concurrent)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no problem, sorry I never get these right -- I actually thought package imports were supposed to be first and checked some other places to realize you are right.

@kayousterhout
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Dec 14, 2016

Test build #70098 has finished for PR 16270 at commit 11b33c3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 14, 2016

Test build #70099 has finished for PR 16270 at commit 28a8bf1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 14, 2016

Test build #70104 has finished for PR 16270 at commit 9cc5d58.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 14, 2016

Test build #70108 has finished for PR 16270 at commit af6ea55.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@squito
Copy link
Contributor Author

squito commented Dec 14, 2016

merged to master

@asfgit asfgit closed this in ac013ea Dec 14, 2016
robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 15, 2016
There is a small race in SchedulerIntegrationSuite.
The test assumes that the taskscheduler thread
processing that last task will finish before the DAGScheduler processes
the task event and notifies the job waiter, but that is not 100%
guaranteed.

ran the test locally a bunch of times, never failed, though admittedly
it never failed locally for me before either.  However I am nearly 100%
certain this is what caused the failure of one jenkins build
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68694/consoleFull
(which is long gone now, sorry -- I fixed it as part of
apache#14079 initially)

Author: Imran Rashid <irashid@cloudera.com>

Closes apache#16270 from squito/sched_integ_flakiness.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
There is a small race in SchedulerIntegrationSuite.
The test assumes that the taskscheduler thread
processing that last task will finish before the DAGScheduler processes
the task event and notifies the job waiter, but that is not 100%
guaranteed.

ran the test locally a bunch of times, never failed, though admittedly
it never failed locally for me before either.  However I am nearly 100%
certain this is what caused the failure of one jenkins build
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68694/consoleFull
(which is long gone now, sorry -- I fixed it as part of
apache#14079 initially)

Author: Imran Rashid <irashid@cloudera.com>

Closes apache#16270 from squito/sched_integ_flakiness.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants