Skip to content

Latest commit

 

History

History
148 lines (111 loc) · 8.92 KB

spark-SparkListener.adoc

File metadata and controls

148 lines (111 loc) · 8.92 KB

Spark Listeners — Intercepting Events from Spark Scheduler

SparkListener is a mechanism in Spark to intercept events from the Spark scheduler that are emitted over the course of execution of a Spark application.

SparkListener extends SparkListenerInterface with all the callback methods being no-op/do-nothing.

Spark relies on SparkListeners internally to manage communication between internal components in the distributed environment for a Spark application, e.g. web UI, event persistence (for History Server), dynamic allocation of executors, keeping track of executors (using HeartbeatReceiver) and others.

You can develop your own custom SparkListener and register it using SparkContext.addSparkListener method or spark.extraListeners Spark property.

With SparkListener you can focus on Spark events of your liking and process a subset of all scheduling events.

Tip
Developing a custom SparkListener is an excellent introduction to low-level details of Spark’s Execution Model. Check out the exercise Developing Custom SparkListener to monitor DAGScheduler in Scala.
Tip

Enable INFO logging level for org.apache.spark.SparkContext logger to see when custom Spark listeners are registered.

INFO SparkContext: Registered listener org.apache.spark.scheduler.StatsReportListener

SparkListenerInterface — Internal Contract for Spark Listeners

SparkListenerInterface is an private[spark] contract for Spark listeners to intercept events from the Spark scheduler.

Note
SparkListener and SparkFirehoseListener Spark listeners are direct implementations of SparkListenerInterface contract to help developing more sophisticated Spark listeners.
Table 1. SparkListenerInterface Methods (in alphabetical order)
Method Event Reason

onApplicationEnd

SparkListenerApplicationEnd

SparkContext does postApplicationEnd

onApplicationStart

SparkListenerApplicationStart

SparkContext does postApplicationStart

onBlockManagerAdded

SparkListenerBlockManagerAdded

BlockManagerMasterEndpoint has registered a BlockManager.

onBlockManagerRemoved

SparkListenerBlockManagerRemoved

BlockManagerMasterEndpoint has removed a BlockManager (which is when…​FIXME)

onBlockUpdated

SparkListenerBlockUpdated

BlockManagerMasterEndpoint receives a UpdateBlockInfo message (which is when a BlockManager reports a block status update to driver).

onEnvironmentUpdate

SparkListenerEnvironmentUpdate

SparkContext does postEnvironmentUpdate.

onExecutorMetricsUpdate

SparkListenerExecutorMetricsUpdate

onExecutorAdded

SparkListenerExecutorAdded

DriverEndpoint RPC endpoint (of CoarseGrainedSchedulerBackend) receives RegisterExecutor message, MesosFineGrainedSchedulerBackend does resourceOffers, and LocalSchedulerBackendEndpoint starts.

onExecutorBlacklisted

SparkListenerExecutorBlacklisted

FIXME

onExecutorRemoved

SparkListenerExecutorRemoved

DriverEndpoint RPC endpoint (of CoarseGrainedSchedulerBackend) does removeExecutor and MesosFineGrainedSchedulerBackend does removeExecutor.

onExecutorUnblacklisted

SparkListenerExecutorUnblacklisted

FIXME

onJobEnd

SparkListenerJobEnd

DAGScheduler does cleanUpAfterSchedulerStop, handleTaskCompletion, failJobAndIndependentStages, and markMapStageJobAsFinished.

onJobStart

SparkListenerJobStart

DAGScheduler handles JobSubmitted and MapStageSubmitted messages

onNodeBlacklisted

SparkListenerNodeBlacklisted

FIXME

onNodeUnblacklisted

SparkListenerNodeUnblacklisted

FIXME

onStageCompleted

SparkListenerStageCompleted

DAGScheduler marks a stage as finished.

onStageSubmitted

SparkListenerStageSubmitted

DAGScheduler submits the missing tasks of a stage (in a Spark job).

onTaskEnd

SparkListenerTaskEnd

DAGScheduler handles a task completion

onTaskGettingResult

SparkListenerTaskGettingResult

DAGScheduler handles GettingResultEvent event

onTaskStart

SparkListenerTaskStart

DAGScheduler is informed that a task is about to start.

onUnpersistRDD

SparkListenerUnpersistRDD

SparkContext unpersists an RDD, i.e. removes RDD blocks from BlockManagerMaster (that can be triggered explicitly or implicitly).

onOtherEvent

SparkListenerEvent

Catch-all callback that is often used in Spark SQL to handle custom events.

Built-In Spark Listeners

Table 2. Built-In Spark Listeners
Spark Listener Description

EventLoggingListener

Logs JSON-encoded events to a file that can later be read by History Server

StatsReportListener

SparkFirehoseListener

Allows users to receive all SparkListenerEvent events by overriding the single onEvent method only.

ExecutorAllocationListener

HeartbeatReceiver

StreamingJobProgressListener

ExecutorsListener

Prepares information for Executors tab in web UI

StorageStatusListener, RDDOperationGraphListener, EnvironmentListener, BlockStatusListener and StorageListener

For web UI

SpillListener

ApplicationEventListener

StreamingQueryListenerBus

SQLListener / SQLHistoryListener

Support for History Server

StreamingListenerBus

JobProgressListener