Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit test failure: take10 works correctly on example SAM *** FAILED *** #537

Closed
heuermh opened this issue Dec 26, 2014 · 5 comments
Closed

Comments

@heuermh
Copy link
Member

heuermh commented Dec 26, 2014

After merging upstream to my fork this morning, I'm seeing one unit test failure and lots of warning stack traces (java.net.BindException: Address already in use)

$ mvn clean install
...
- can convert a simple wigfix file
2014-12-26 10:30:24 WARN  AbstractLifeCycle:204 - FAILED SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already in use
java.net.BindException: Address already in use
...
PluginExecutorSuite:
2014-12-26 10:30:24 WARN  QueuedThreadPool:145 - 1 threads could not be stopped
2014-12-26 10:30:24 WARN  Utils:71 - Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
2014-12-26 10:30:25 WARN  SparkContext:92 - Multiple running SparkContexts detected in the same JVM!
org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:70)
org.bdgenomics.adam.cli.ADAMSparkCommand$class.run(ADAMCommand.scala:55)
org.bdgenomics.adam.cli.Features2ADAM.run(Features2ADAM.scala:47)
org.bdgenomics.adam.cli.Features2ADAMSuite$$anonfun$2.apply$mcV$sp(Features2ADAMSuite.scala:91)
org.bdgenomics.adam.cli.Features2ADAMSuite$$anonfun$2.apply(Features2ADAMSuite.scala:68)
org.bdgenomics.adam.cli.Features2ADAMSuite$$anonfun$2.apply(Features2ADAMSuite.scala:68)
...
- take10 works correctly on example SAM *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0
    at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1003)
    at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
    at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
    at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
    at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
    at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
    at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:110)
    at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:107)
    at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
    at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:56)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of broadcast_0
    at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
    at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:136)
    at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
    at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:119)
    at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:174)
    at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1000)
    ... 19 more

Driver stacktrace:
  at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
  at scala.Option.foreach(Option.scala:236)
  at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696)
  ...
...
Run completed in 37 seconds, 209 milliseconds.
Total number of tests run: 14
Suites: completed 4, aborted 0
Tests: succeeded 13, failed 1, canceled 0, ignored 1, pending 0
*** 1 TEST FAILED ***
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] ADAM ............................................... SUCCESS [ 17.525 s]
[INFO] ADAM: Core ......................................... SUCCESS [06:48 min]
[INFO] ADAM: APIs for Java ................................ SUCCESS [ 25.379 s]
[INFO] ADAM: CLI .......................................... FAILURE [01:15 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------

Can anyone else reproduce?

@mylons
Copy link

mylons commented Jan 20, 2015

@heuermh: did you ever come up with a solution? I'm running into this when i run adam-shell right now

@fnothaft
Copy link
Member

This is also causing a build error on #556.

@mylons what sort of environment are you running on? I'm wondering if this issue has to do with the use of the torrent broadcaster?

@JoshRosen
Copy link

It looks like one of your test suites might be failing to clean up the SparkContext that it creates, leading to two active SparkContexts that corrupt each others' state:

The relevant error message here is

2014-12-26 10:30:25 WARN  SparkContext:92 - Multiple running SparkContexts detected in the same JVM!
org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:70)
org.bdgenomics.adam.cli.ADAMSparkCommand$class.run(ADAMCommand.scala:55)
org.bdgenomics.adam.cli.Features2ADAM.run(Features2ADAM.scala:47)
org.bdgenomics.adam.cli.Features2ADAMSuite$$anonfun$2.apply$mcV$sp(Features2ADAMSuite.scala:91)
org.bdgenomics.adam.cli.Features2ADAMSuite$$anonfun$2.apply(Features2ADAMSuite.scala:68)
org.bdgenomics.adam.cli.Features2ADAMSuite$$anonfun$2.apply(Features2ADAMSuite.scala:68)

Is Features2ADAMSuite creating a SparkContext and not calling stop on it?

@fnothaft
Copy link
Member

fnothaft commented Feb 8, 2015

@JoshRosen I do think that was the cause. I patched Features2ADAMSuite recently in #558. Since #558, we did have one build in #560 fail (https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/584/hadoop.version=1.0.4,label=centos/consoleFull) due to the issue above, but I think that was before #560 was rebased on top of the merged #558. I think this issue is probably safe to close; we can always reopen it if it reappears.

@heuermh
Copy link
Member Author

heuermh commented Aug 19, 2015

Closing, haven't seen this since

@heuermh heuermh closed this as completed Aug 19, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants