Skip to content

Latest commit

 

History

History
87 lines (55 loc) · 3.68 KB

spark-blockmanager-StorageStatus.adoc

File metadata and controls

87 lines (55 loc) · 3.68 KB

StorageStatus

StorageStatus is a developer API that Spark uses to pass "just enough" information about registered BlockManagers in a Spark application between Spark services (mostly for monitoring purposes like web UI or SparkListeners).

Note

There are two ways to access StorageStatus about all the known BlockManagers in a Spark application:

StorageStatus is created when:

Table 1. StorageStatus’s Internal Registries and Counters
Name Description

_nonRddBlocks

Lookup table of BlockIds per BlockId.

Used when…​FIXME

_rddBlocks

Lookup table of BlockIds with BlockStatus per RDD id.

Used when…​FIXME

updateStorageInfo Method

Caution
FIXME

Creating StorageStatus Instance

StorageStatus takes the following when created:

StorageStatus initializes the internal registries and counters.

Getting RDD Blocks For RDD — rddBlocksById Method

rddBlocksById(rddId: Int): Map[BlockId, BlockStatus]

rddBlocksById gives the blocks (as BlockId with their status as BlockStatus) that belong to rddId RDD.

Note

rddBlocksById is used when:

Removing Block (From Internal Registries) — removeBlock Internal Method

removeBlock(blockId: BlockId): Option[BlockStatus]

removeBlock removes blockId from _rddBlocks registry and returns it.

Internally, removeBlock updates block status of blockId (to be empty, i.e. removed).

removeBlock branches off per the type of BlockId, i.e. RDDBlockId or not.

For a RDDBlockId, removeBlock finds the RDD in _rddBlocks and removes the blockId. removeBlock removes the RDD (from _rddBlocks) completely, if there are no more blocks registered.

For a non-RDDBlockId, removeBlock removes blockId from _nonRddBlocks registry.

Note
removeBlock is used when StorageStatusListener removes RDD blocks for an unpersisted RDD or updates storage status for an executor.