Skip to content

Commit

Permalink
Merge pull request #22097 from mosche/22094-DeprecateSpark2
Browse files Browse the repository at this point in the history
Deprecate runner support for Spark 2.4 (closes #22094)
  • Loading branch information
echauchot authored Jul 1, 2022
2 parents 32efddc + cd6bb95 commit 680ed5b
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 8 deletions.
1 change: 1 addition & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@

## Deprecations

* Support for Spark 2.4.x is deprecated and will be dropped with the release of Beam 2.44.0 or soon after (Spark runner) ([#22094](https://github.com/apache/beam/issues/22094)).
* X behavior is deprecated and will be removed in X versions ([#X](https://github.com/apache/beam/issues/X)).

## Bugfixes
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,12 @@ private static JavaSparkContext createSparkContext(SparkPipelineOptions options)
conf.setAppName(options.getAppName());
// register immutable collections serializers because the SDK uses them.
conf.set("spark.kryo.registrator", SparkRunnerKryoRegistrator.class.getName());
return new JavaSparkContext(conf);
JavaSparkContext jsc = new JavaSparkContext(conf);
if (jsc.sc().version().startsWith("2")) {
LOG.warn(
"Support for Spark 2 is deprecated, this runner will be removed in a few releases.\n"
+ "Spark 2 is reaching its EOL, consider migrating to Spark 3.");
}
return jsc;
}
}
15 changes: 8 additions & 7 deletions website/www/site/content/en/documentation/runners/spark.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,8 @@ the portable Runner. For more information on portability, please visit the

## Spark Runner prerequisites and setup

The Spark runner currently supports Spark's 2.x branch, and more specifically any version greater than 2.4.0.
The Spark runner currently supports Spark's 3.1.x branch.
> **Note:** Support for Spark 2.4.x is deprecated and will be dropped with the release of Beam 2.44.0 (or soon after).
{{< paragraph class="language-java" >}}
You can add a dependency on the latest version of the Spark runner by adding to your pom.xml the following:
Expand All @@ -76,7 +77,7 @@ You can add a dependency on the latest version of the Spark runner by adding to
{{< highlight java >}}
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-spark</artifactId>
<artifactId>beam-runners-spark-3</artifactId>
<version>{{< param release_latest >}}</version>
</dependency>
{{< /highlight >}}
Expand All @@ -90,13 +91,13 @@ In some cases, such as running in local mode/Standalone, your (self-contained) a
{{< highlight java >}}
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<artifactId>spark-core_2.12</artifactId>
<version>${spark.version}</version>
</dependency>

<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<artifactId>spark-streaming_2.12</artifactId>
<version>${spark.version}</version>
</dependency>
{{< /highlight >}}
Expand Down Expand Up @@ -193,7 +194,7 @@ download it on the [Downloads page](/get-started/downloads/).
{{< paragraph class="language-py" >}}
1. Start the JobService endpoint:
* with Docker (preferred): `docker run --net=host apache/beam_spark_job_server:latest`
* or from Beam source code: `./gradlew :runners:spark:2:job-server:runShadow`
* or from Beam source code: `./gradlew :runners:spark:3:job-server:runShadow`
{{< /paragraph >}}

{{< paragraph class="language-py" >}}
Expand Down Expand Up @@ -228,7 +229,7 @@ For more details on the different deployment modes see: [Standalone](https://spa
{{< paragraph class="language-py" >}}
2. Start JobService that will connect with the Spark master:
* with Docker (preferred): `docker run --net=host apache/beam_spark_job_server:latest --spark-master-url=spark://localhost:7077`
* or from Beam source code: `./gradlew :runners:spark:2:job-server:runShadow -PsparkMasterUrl=spark://localhost:7077`
* or from Beam source code: `./gradlew :runners:spark:3:job-server:runShadow -PsparkMasterUrl=spark://localhost:7077`
{{< /paragraph >}}

{{< paragraph class="language-py" >}}3. Submit the pipeline as above.
Expand All @@ -246,7 +247,7 @@ To run Beam jobs written in Python, Go, and other supported languages, you can u

The following example runs a portable Beam job in Python from the Dataproc cluster's master node with Yarn backed.

> Note: This example executes successfully with Dataproc 2.0, Spark 2.4.8 and 3.1.2 and Beam 2.37.0.
> Note: This example executes successfully with Dataproc 2.0, Spark 3.1.2 and Beam 2.37.0.
1. Create a Dataproc cluster with [Docker](https://cloud.google.com/dataproc/docs/concepts/components/docker) component enabled.

Expand Down

0 comments on commit 680ed5b

Please sign in to comment.