Could not compute split [ from JIRA]

## SPARK-12306

https://issues.apache.org/jira/browse/SPARK-12306

> Currently in Spark Streaming, a Receiver stores the received data into some BlockManager and then later the data will be used by a BlockRDD. If this BlockManager were to lost because of some failure, then this BlockRDD would throw a SparkException saying "Could not compute split, block not found".
> In most cases this is the right thing to do. However, in a streaming scenario where it can tolerant small pieces of data loss, maybe just move on silently – instead of throwing an exception – is more preferable.
> This issue proposes to add such an "spark.streaming.ignoreBlockNotFound" option, which defaults to false, to tell whether to throw an exception or just move on when a block is not found.
## SPARK-5001

https://issues.apache.org/jira/browse/SPARK-5001

> I've counted messages using kafkainputstream of spark-1.1.1. The test app failed when the latter batch job completed sooner than the previous. **In the source code, BlockRDDs older than (time-rememberDuration) will be removed in cleanMetaData after one job completed. And the previous job will abort due to block not found**.The relevant log are as follows:
> 2014-12-25 14:07:12(Logging.scala:59)[sparkDriver-akka.actor.default-dispatcher-14] INFO :Starting job streaming job 1419487632000 ms.0 from job set of time 1419487632000 ms
> 2014-12-25 14:07:15(Logging.scala:59)[sparkDriver-akka.actor.default-dispatcher-14] INFO :Starting job streaming job 1419487635000 ms.0 from job set of time 1419487635000 ms
> 2014-12-25 14:07:15(Logging.scala:59)[sparkDriver-akka.actor.default-dispatcher-15] INFO :Finished job streaming job 1419487635000 ms.0 from job set of time 1419487635000 ms
> 2014-12-25 14:07:15(Logging.scala:59)[sparkDriver-akka.actor.default-dispatcher-16] INFO :Removing blocks of RDD BlockRDD[3028] at createStream at TestKafka.java:144 of time 1419487635000 ms from DStream clearMetadata
> java.lang.Exception: Could not compute split, block input-0-1419487631400 not found for 3028
## SPARK-10898

https://issues.apache.org/jira/browse/SPARK-10898

> Setting spark.streaming.concurrentJobs causes blocks to be deleted before read
## SPARK-10210

https://issues.apache.org/jira/browse/SPARK-10210

> When write ahead log is not enabled, a recovered streaming driver still tries to run jobs using pre-failure block ids, and fails as the block do not exists in-memory any more (and cannot be recovered as receiver WAL is not enabled).
> This occurs because the driver-side WAL of ReceivedBlockTracker is recovers that past block information, and ReceiveInputDStream creates BlockRDDs even if those blocks do not exist.
> The solution is to filter out block ids that do not exist before creating the BlockRDD.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Could not compute split [ from JIRA] #1

SPARK-12306

SPARK-5001

SPARK-10898

SPARK-10210

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Could not compute split [ from JIRA] #1

Description

SPARK-12306

SPARK-5001

SPARK-10898

SPARK-10210

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions