Skip to content

Commit

Permalink
Minor updates to README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
LucaCanali committed Sep 1, 2023
1 parent d51ac7f commit 53ac3b5
Showing 1 changed file with 6 additions and 8 deletions.
14 changes: 6 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,14 @@
[![Maven Central](https://maven-badges.herokuapp.com/maven-central/ch.cern.sparkmeasure/spark-plugins_2.12/badge.svg)](https://maven-badges.herokuapp.com/maven-central/ch.cern.sparkmeasure/spark-plugins_2.12)

This repository contains code and examples of how to use Apache Spark Plugins.
Spark plugins are part of Spark core since version 3.0 and provide an interface,
Spark plugins provide an interface,
and related configuration, for injecting custom code on executors as they are initialized.
Spark plugins can also be used to implement custom extensions to the Spark metrics system.

### Motivations
- Instrumenting parts of the Spark workload with plugins provides additional flexibility compared
to extending instrumentation in the Apache Spark code, as only users who want to activate
it can do so, moreover they can play with configuration that may be customized for their environment,
so not necessarily suitable for all possible uses of Apache Spark code.
- One important use case is extending Spark instrumentation with custom metrics.
- One important use case for deploying Spark Plugins is extending Spark instrumentation with custom metrics.
- Other use cases include running custom actions when the executors start up, typically useful for integrating with
external systems.
- This repo provides code and examples of plugins applied to measuring Spark on K8S,
Spark I/O from cloud Filesystems, OS metrics, and custom application metrics.
- Note: The code in this repo is for Spark 3.x.
Expand Down Expand Up @@ -217,7 +215,7 @@ These plugins use instrumented experimental/custom versions of the Hadoop client
- Instruments the Hadoop S3A client.
- Note: this requires custom S3A client implementation, see experimental code at: [HDFS and S3A custom instrumentation](https://github.com/LucaCanali/hadoop/tree/s3aAndHDFSTimeInstrumentation)
- Spark config:
- Use this with Spark 3.1.x (which uses hadoop version 3.2.0)
- **Use this with Spark 3.1.x (which uses hadoop version 3.2.0)**
- `--conf spark.plugins=ch.cern.experimental.S3ATimeInstrumentation`
- Custom jar needed: `--jars hadoop-aws-3.2.0.jar`
- build [from this fork](https://github.com/LucaCanali/hadoop/tree/s3aAndHDFSTimeInstrumentation)
Expand Down Expand Up @@ -260,7 +258,7 @@ These plugins use instrumented experimental/custom versions of the Hadoop client
- Instruments the Hadoop HDFS client.
- Note: this requires custom HDFS client implementation, see experimental code at: [HDFS and S3A custom instrumentation](https://github.com/LucaCanali/hadoop/tree/s3aAndHDFSTimeInstrumentation)
- Spark config:
- Use this with Spark 3.1.x (which uses hadoop version 3.2.0)
- **Use this with Spark 3.1.x (which uses hadoop version 3.2.0)**
- `--conf spark.plugins=ch.cern.experimental.HDFSTimeInstrumentation`
- `--packages ch.cern.sparkmeasure:spark-plugins_2.12:0.1`
- Non-standard configuration required for using this instrumentation:
Expand Down

0 comments on commit 53ac3b5

Please sign in to comment.