JsonRelay is a SparkListener that converts SparkListenerEvents to JSON and forwards them to an external service via RPC.
It is designed to be used with slim, which consumes JsonRelay's emitted events and writes useful statistics about them to Mongo, from whence Spree serves up live-updating web pages.
With Spark >= 1.5.0 you can simply pass the following flags to your spark-shell and spark-submit commands:
--packages org.hammerlab:spark-json-relay:2.0.1
--conf spark.extraListeners=org.apache.spark.JsonRelayIf using earlier versions of Spark, you'll need to first download the JAR:
$ wget https://repo1.maven.org/maven2/org/hammerlab/spark-json-relay/2.0.1/spark-json-relay-2.0.1.jar
Then, pass these flags to your spark-submit or spark-shell commands:
--driver-class-path spark-json-relay-2.0.1.jar
--conf spark.extraListeners=org.apache.spark.JsonRelay
That's it!
Two additional flags, --conf spark.slim.{host,port}, specify the location JsonRelay will attempt to connect and send events to.
JsonRelay just piggybacks on Spark's JsonProtocol for JSON serialization, with two differences:
- It adds an
appIdfield to all events; this allows downstream consumers to process events from multiple Spark applications simultaneously / more easily over time. - It rolls its own serialization of
SparkListenerExecutorMetricsUpdateevents, which is omitted from Spark'sJsonProtocolin Spark prior to1.5.0(cf. SPARK-9036).
Please file an issue if you have any questions about or problems using JsonRelay!