Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Apr 22, 2016

What changes were proposed in this pull request?

This PR introduces a new accumulator API which is much simpler than before:

  1. the type hierarchy is simplified, now we only have an Accumulator class
  2. Combine initialValue and zeroValue concepts into just one concept: zeroValue
  3. there in only one register method, the accumulator registration and cleanup registration are combined.
  4. the id,name and countFailedValues are combined into an AccumulatorMetadata, and is provided during registration.

SQLMetric is a good example to show the simplicity of this new API.

What we break:

  1. no setValue anymore. In the new API, the intermedia type can be different from the result type, it's very hard to implement a general setValue
  2. accumulator can't be serialized before registered.

Problems need to be addressed in follow-ups:

  1. with this new API, AccumulatorInfo doesn't make a lot of sense, the partial output is not partial updates, we need to expose the intermediate value.
  2. ExceptionFailure should not carry the accumulator updates. Why do users care about accumulator updates for failed cases? It looks like we only use this feature to update the internal metrics, how about we sending a heartbeat to update internal metrics after the failure event?
  3. the public event SparkListenerTaskEnd carries a TaskMetrics. Ideally this TaskMetrics don't need to carry external accumulators, as the only method of TaskMetrics that can access external accumulators is private[spark]. However, SQLListener use it to retrieve sql metrics.

How was this patch tested?

existing tests

@SparkQA
Copy link

SparkQA commented Apr 22, 2016

Test build #56704 has finished for PR 12612 at commit 50ebb24.

  • This patch fails Scala style tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class NewAccumulator[IN, OUT] extends Serializable
    • class IntAccumulator extends NewAccumulator[jl.Integer, jl.Integer]
    • class LongAccumulator extends NewAccumulator[jl.Long, jl.Long]
    • class DoubleAccumulator extends NewAccumulator[jl.Double, jl.Double]
    • class AverageAccumulator extends NewAccumulator[jl.Double, jl.Double]
    • class CollectionAccumulator[T] extends NewAccumulator[T, java.util.List[T]]
    • class LegacyAccumulatorWrapper[R, T](

}


class AverageAccumulator extends NewAccumulator[jl.Double, jl.Double] {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One difference from the previous API: we can't have a general setValue method, as it needs the intermedia type which is not exposed by the new API. For example, AverageAccumulator doesn't have setValue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think getting rid of setValue is great, in the consistent accumulators based under the old API I had to just throw an exception if people were using setValue

@cloud-fan
Copy link
Contributor Author

The SQLMetrics is a good example to show how simple the new API is

@SparkQA
Copy link

SparkQA commented Apr 24, 2016

Test build #56848 has finished for PR 12612 at commit c659199.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 24, 2016

Test build #56850 has finished for PR 12612 at commit 7cc93d1.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

cloud-fan commented Apr 24, 2016

cc @rxin , several questions need to be discussed:

  1. should we provide the metadata while creating accumulators instead of registration? e.g., id can be fixed like private[spark] val id = AccumulatorContext.newId(), name and countFailedValues can be ctor parameters. It's annoying that we have to register an accumulator to call its toInfo method.
  2. should we still have Accumulable and Accumulator? Accumulator is a special Accumulable that input and output types are same.

@SparkQA
Copy link

SparkQA commented Apr 24, 2016

Test build #56860 has finished for PR 12612 at commit f831678.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 24, 2016

Test build #56861 has finished for PR 12612 at commit a1c6865.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan force-pushed the acc branch 2 times, most recently from d4cc938 to 38cb9a1 Compare April 24, 2016 20:59
@SparkQA
Copy link

SparkQA commented Apr 24, 2016

Test build #56862 has finished for PR 12612 at commit 38cb9a1.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 24, 2016

Test build #56863 has finished for PR 12612 at commit 22d4cc6.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

metrics.internalAccums.find(_.name == accum.name).foreach(_.setValueAny(accum.update.get))
definedAccumUpdates.filter(_.internal).foreach { accInfo =>
metrics.internalAccums.find(_.name == accInfo.name).foreach { acc =>
acc.asInstanceOf[Accumulator[Any, Any]].add(accInfo.update.get)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example shows a weakness of the new API: we can't setValue. For this example, we have the final output and we wanna set the value of accumulator so that it can produce the same output. With the new API, we can't guarantee that all accumulators can implement setValue, e.g. the average accumulator. I'm still thinking about how to fix it or work around it, @rxin any ideas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just have a reset and "add"?

I'd argue it doesn't make sense to call setValue, since "set" action is not algebraic (i.e. you cannot compose/merge set operations).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually i don't think we need this if we send accumulators back to the driver.

@SparkQA
Copy link

SparkQA commented Apr 25, 2016

Test build #56900 has finished for PR 12612 at commit 470ec7b.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class NewAccumulator[IN, OUT] extends Serializable
    • case class UpdatedLongValue(l: Long) extends UpdatedValue
    • class LongAccumulator extends NewAccumulator[jl.Long, jl.Long]
    • case class UpdatedDoubleValue(d: Double) extends UpdatedValue
    • class DoubleAccumulator extends NewAccumulator[jl.Double, jl.Double]
    • case class UpdatedAverageValue(sum: Double, count: Long) extends UpdatedValue
    • class AverageAccumulator extends NewAccumulator[jl.Double, jl.Double]
    • case class GenericUpdatedValue[T](value: T) extends UpdatedValue
    • class CollectionAccumulator[T] extends NewAccumulator[T, java.util.List[T]]
    • class LegacyAccumulatorWrapper[R, T](

name: Option[String],
countFailedValues: Boolean) extends Serializable

trait UpdatedValue extends Serializable
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @rxin , I didn't send the accumulator back for a serialization problem.

Basically when we send accumulator from driver to executors, we don't want to send its current value(think about list accumulator, we definitely don't wanna send the current list to executors.).
But when we send accumulator from executors to driver, we do need to send the current value.

One possible solution is to have 2 local variables for each accumulator, one for driver, one for executors. But it's a lot of trouble when accumulators have complex intermedia type, e.g. average accumulator. So I end up with this apporach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another potential problem with sending Accumulators over the wire, with the proposed API from the JIRA, is that the Accumulators register them selves inside of readObject.

Copy link
Contributor

@rxin rxin Apr 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan Why can't we send the current list? The current list as far as I understand will always be zero sized? We can just create a copy of the accumulator for sending to the executors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the accumulator was used in two separate tasks it could have built up some values from the first task in the driver before the second task. But always sending a zeroed copy to the executor would be an OK solution to that.

@SparkQA
Copy link

SparkQA commented Apr 26, 2016

Test build #56981 has finished for PR 12612 at commit 73c91d2.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 26, 2016

Test build #57000 has finished for PR 12612 at commit 77dc3fc.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really a big problem...

We need some serialization hooks to support sending accumulator back from executors, and I tried 2 approaches but both failed:

  1. Add a writing hook, which resets the accumulator before send it from driver to executor. The problem is we can't just reset, the accumulator states should be kept at driver side. And the java serializing hook isn't flex enough to allow us do a copy or something. One possible workaround is to create an AccumulatorWrapper so that we can have full control of accumulator serialization. But this will complicate the hierarchy.
  2. Add a reading hook, which resets the accumlator after deserialization. Unfortunately it doesn't work when Accumulator is a base class. By the time readObject is called, child's fields are not initialized yet. Calling reset here is no-op, the values of child's fileds will be filled later.

Generally speaking, writeObject and readObject is not a good serialization hook. We'd either figure out some tricky to workaround it, or find out other better serialization hooks. (or do not send accumulators back)

@rxin any ideas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed offline, writeReplace

assert(acc.value === Seq(9, 10))
}

test("value is reset on the executors") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is covered by the new test accumulator serialization

assert(newUpdates.size === tm.internalAccums.size + 4)
}

test("from accumulator updates") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is not valid anymore. TaskMetrics.fromAccumulatorUpdates will return a task metrics only containing internal accumulators, no need to worry about unregistered external accumulators.

@cloud-fan cloud-fan changed the title [SPARK-14654][CORE][WIP] New accumulator API [SPARK-14654][CORE] New accumulator API Apr 27, 2016
@SparkQA
Copy link

SparkQA commented Apr 27, 2016

Test build #57130 has finished for PR 12612 at commit be8ff0e.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class ListAccumulator[T] extends NewAccumulator[T, java.util.List[T]]

@SparkQA
Copy link

SparkQA commented Apr 27, 2016

Test build #57132 has finished for PR 12612 at commit 9cdddd0.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 27, 2016

Test build #57135 has finished for PR 12612 at commit c74320d.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

/**
* The base class for accumulators, that can accumulate inputs of type `IN`, and produce output of
* type `OUT`. Implementations must define following methods:
* - isZero: tell if this accumulator is zero value or not. e.g. for a counter accumulator,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should be javadoc of the methods, rather than in the classdoc

@rxin
Copy link
Contributor

rxin commented Apr 28, 2016

This looks pretty good to me. We should get it to pass tests and then merge it asap. Some of the comments can be addressed later.

def localValue: OUT

// Called by Java when serializing an object
final protected def writeReplace(): Any = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be private, however, this hook won't be called if it's private, not sure why, so I use final protected to work around it.

@SparkQA
Copy link

SparkQA commented Apr 28, 2016

Test build #57220 has finished for PR 12612 at commit 124568b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Apr 28, 2016

Merging in master!

@asfgit asfgit closed this in bf5496d Apr 28, 2016
if (atDriverSide) {
if (!isRegistered) {
throw new UnsupportedOperationException(
"Accumulator must be registered before send to executor")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan I'm getting intermittent, but regular, test failures in ALSSuite (not sure if there might be others, this just happens to be something I'm working on now).

e.g.

[info] - exact rank-1 matrix *** FAILED *** (4 seconds, 397 milliseconds)
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: Failed to serialize task 74, not attempting to retry it. Exception during serialization: java.lang.UnsupportedOperationException: Accumulator must be registered before send to executor
[info]   at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1448)
[info]   at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1436)
[info]   at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1435)
[info]   at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
[info]   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
[info]   at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1435)
[info]   at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:809)
[info]   at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:809)
[info]   at scala.Option.foreach(Option.scala:257)
[info]   at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:809)
[info]   at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1657)
[info]   at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1616)
[info]   at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
[info]   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
[info]   at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
[info]   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1873)
[info]   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1936)
[info]   at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:970)
[info]   at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
[info]   at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
[info]   at org.apache.spark.rdd.RDD.withScope(RDD.scala:357)
[info]   at org.apache.spark.rdd.RDD.reduce(RDD.scala:952)
[info]   at org.apache.spark.rdd.DoubleRDDFunctions$$anonfun$stats$1.apply(DoubleRDDFunctions.scala:42)
[info]   at org.apache.spark.rdd.DoubleRDDFunctions$$anonfun$stats$1.apply(DoubleRDDFunctions.scala:42)
[info]   at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
[info]   at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
[info]   at org.apache.spark.rdd.RDD.withScope(RDD.scala:357)
[info]   at org.apache.spark.rdd.DoubleRDDFunctions.stats(DoubleRDDFunctions.scala:41)
[info]   at org.apache.spark.rdd.DoubleRDDFunctions$$anonfun$mean$1.apply$mcD$sp(DoubleRDDFunctions.scala:47)
[info]   at org.apache.spark.rdd.DoubleRDDFunctions$$anonfun$mean$1.apply(DoubleRDDFunctions.scala:47)
[info]   at org.apache.spark.rdd.DoubleRDDFunctions$$anonfun$mean$1.apply(DoubleRDDFunctions.scala:47)
[info]   at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
[info]   at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
[info]   at org.apache.spark.rdd.RDD.withScope(RDD.scala:357)
[info]   at org.apache.spark.rdd.DoubleRDDFunctions.mean(DoubleRDDFunctions.scala:46)
[info]   at org.apache.spark.ml.recommendation.ALSSuite.testALS(ALSSuite.scala:373)
[info]   at org.apache.spark.ml.recommendation.ALSSuite$$anonfun$12.apply$mcV$sp(ALSSuite.scala:385)
[info]   at org.apache.spark.ml.recommendation.ALSSuite$$anonfun$12.apply(ALSSuite.scala:383)
[info]   at org.apache.spark.ml.recommendation.ALSSuite$$anonfun$12.apply(ALSSuite.scala:383)
[info]   at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
[info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
[info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:56)
[info]   at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
[info]   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
[info]   at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
[info]   at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
[info]   at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
[info]   at scala.collection.immutable.List.foreach(List.scala:381)
[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
[info]   at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
[info]   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
[info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
[info]   at org.scalatest.Suite$class.run(Suite.scala:1424)
[info]   at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
[info]   at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
[info]   at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
[info]   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
[info]   at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:28)
[info]   at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
[info]   at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
[info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:28)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:357)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:502)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[info]   at java.lang.Thread.run(Thread.java:745)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also found some tests failed because of this indeterminately, looking into it.

@mengxr
Copy link
Contributor

mengxr commented Apr 29, 2016

Shall we revert this commit? Got some similar errors:

Code:

sc.parallelize(1 until 100, 1).map { i => Array.fill(1e7.toInt)(1.0) }.count()

The job succeeded but error messages got emitted to Spark shell:

16/04/29 11:25:45 ERROR Utils: Uncaught exception in thread heartbeat-receiver-event-loop-thread
java.lang.UnsupportedOperationException: Can't read accumulator value in task
    at org.apache.spark.NewAccumulator.value(NewAccumulator.scala:137)
    at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$5$$anonfun$apply$9$$anonfun$apply$10.apply(TaskSchedulerImpl.scala:394)
    at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$5$$anonfun$apply$9$$anonfun$apply$10.apply(TaskSchedulerImpl.scala:394)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.AbstractTraversable.map(Traversable.scala:104)
    at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$5$$anonfun$apply$9.apply(TaskSchedulerImpl.scala:394)
    at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$5$$anonfun$apply$9.apply(TaskSchedulerImpl.scala:392)
    at scala.Option.map(Option.scala:146)
    at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$5.apply(TaskSchedulerImpl.scala:392)
    at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$5.apply(TaskSchedulerImpl.scala:391)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
    at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:186)
    at org.apache.spark.scheduler.TaskSchedulerImpl.executorHeartbeatReceived(TaskSchedulerImpl.scala:391)
    at org.apache.spark.HeartbeatReceiver$$anonfun$receiveAndReply$1$$anon$2$$anonfun$run$2.apply$mcV$sp(HeartbeatReceiver.scala:128)
    at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1219)
    at org.apache.spark.HeartbeatReceiver$$anonfun$receiveAndReply$1$$anon$2.run(HeartbeatReceiver.scala:127)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
16/04/29 11:25:55 WARN NettyRpcEndpointRef: Error sending message [message = Heartbeat(driver,[Lscala.Tuple2;@1c53df1a,BlockManagerId(driver, 192.168.99.1, 59310))] in 1 attempts
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
    at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
    at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
    at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
    at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:494)
    at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:523)
    at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:523)
    at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:523)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793)
    at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:523)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds]
    at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
    at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
    at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
    at scala.concurrent.Await$.result(package.scala:190)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:81)
    ... 14 more
16/04/29 11:25:58 ERROR Utils: Uncaught exception in thread heartbeat-receiver-event-loop-thread
java.lang.UnsupportedOperationException: Can't read accumulator value in task
    at org.apache.spark.NewAccumulator.value(NewAccumulator.scala:137)
    at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$5$$anonfun$apply$9$$anonfun$apply$10.apply(TaskSchedulerImpl.scala:394)
    at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$5$$anonfun$apply$9$$anonfun$apply$10.apply(TaskSchedulerImpl.scala:394)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.AbstractTraversable.map(Traversable.scala:104)
    at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$5$$anonfun$apply$9.apply(TaskSchedulerImpl.scala:394)
    at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$5$$anonfun$apply$9.apply(TaskSchedulerImpl.scala:392)
    at scala.Option.map(Option.scala:146)
    at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$5.apply(TaskSchedulerImpl.scala:392)
    at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$5.apply(TaskSchedulerImpl.scala:391)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
    at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:186)
    at org.apache.spark.scheduler.TaskSchedulerImpl.executorHeartbeatReceived(TaskSchedulerImpl.scala:391)
    at org.apache.spark.HeartbeatReceiver$$anonfun$receiveAndReply$1$$anon$2$$anonfun$run$2.apply$mcV$sp(HeartbeatReceiver.scala:128)
    at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1219)
    at org.apache.spark.HeartbeatReceiver$$anonfun$receiveAndReply$1$$anon$2.run(HeartbeatReceiver.scala:127)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
16/04/29 11:26:08 WARN NettyRpcEndpointRef: Error sending message [message = Heartbeat(driver,[Lscala.Tuple2;@1c53df1a,BlockManagerId(driver, 192.168.99.1, 59310))] in 2 attempts
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
    at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
    at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
    at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
    at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:494)
    at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:523)
    at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:523)
    at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:523)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1793)
    at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:523)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds]
    at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
    at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
    at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
    at scala.concurrent.Await$.result(package.scala:190)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:81)
    ... 14 more
16/04/29 11:26:11 ERROR Utils: Uncaught exception in thread heartbeat-receiver-event-loop-thread
java.lang.UnsupportedOperationException: Can't read accumulator value in task
    at org.apache.spark.NewAccumulator.value(NewAccumulator.scala:137)
    at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$5$$anonfun$apply$9$$anonfun$apply$10.apply(TaskSchedulerImpl.scala:394)
    at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$5$$anonfun$apply$9$$anonfun$apply$10.apply(TaskSchedulerImpl.scala:394)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.AbstractTraversable.map(Traversable.scala:104)
    at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$5$$anonfun$apply$9.apply(TaskSchedulerImpl.scala:394)
    at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$5$$anonfun$apply$9.apply(TaskSchedulerImpl.scala:392)
    at scala.Option.map(Option.scala:146)
    at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$5.apply(TaskSchedulerImpl.scala:392)
    at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$5.apply(TaskSchedulerImpl.scala:391)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
    at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:186)
    at org.apache.spark.scheduler.TaskSchedulerImpl.executorHeartbeatReceived(TaskSchedulerImpl.scala:391)
    at org.apache.spark.HeartbeatReceiver$$anonfun$receiveAndReply$1$$anon$2$$anonfun$run$2.apply$mcV$sp(HeartbeatReceiver.scala:128)
    at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1219)
    at org.apache.spark.HeartbeatReceiver$$anonfun$receiveAndReply$1$$anon$2.run(HeartbeatReceiver.scala:127)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

@mengxr
Copy link
Contributor

mengxr commented Apr 29, 2016

Created https://issues.apache.org/jira/browse/SPARK-15010 for the reported issue.

taskContext.registerAccumulator(this)
}
} else {
atDriverSide = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this assignment needed ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the accumulator is sent back from executor to driver, we should set the atDriverSide flag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants