ResultTask
is a Task that executes a function on the records in a RDD partition.
ResultTask
is created exclusively when DAGScheduler
submits missing tasks for a ResultStage
.
ResultTask
is created with a broadcast variable with the RDD and the function to execute it on and the partition.
Name | Description |
---|---|
Collection of TaskLocations. Corresponds directly to unique entries in locs with the only rule that when Initialized when Used exclusively when |
ResultTask
takes the following when created:
-
stageId
— the stage the task is executed for -
stageAttemptId
— the stage attempt id -
Broadcast variable with the serialized task (as
Array[Byte]
). The broadcast contains of a serialized pair ofRDD
and the function to execute. -
Partition to compute
-
Collection of TaskLocations, i.e. preferred locations (executors) to execute the task on
-
The stage’s serialized TaskMetrics (as
Array[Byte]
) -
(optional) Job id
ResultTask
initializes the internal registries and counters.
preferredLocations: Seq[TaskLocation]
Note
|
preferredLocations is part of Task contract.
|
preferredLocations
simply returns preferredLocs internal property.
Deserialize RDD and Function (From Broadcast) and Execute Function (on RDD Partition) — runTask
Method
runTask(context: TaskContext): U
Note
|
U is the type of a result as defined when ResultTask is created.
|
runTask
deserializes a RDD and a function from the broadcast and then executes the function (on the records from the RDD partition).
Note
|
runTask is part of Task contract to run a task.
|
Internally, runTask
starts by tracking the time required to deserialize a RDD and a function to execute.
runTask
creates a new closure Serializer
.
Note
|
runTask uses SparkEnv to access the current closure Serializer .
|
runTask
requests the closure Serializer
to deserialize an RDD
and the function to execute (from taskBinary broadcast).
Note
|
taskBinary broadcast is defined when ResultTask is created.
|
runTask
records _executorDeserializeTime and _executorDeserializeCpuTime properties.
In the end, runTask
executes the function (passing in the input context
and the records from partition
of the RDD).
Note
|
partition to use to access the records in a deserialized RDD is defined when ResultTask was created.
|