@@ -28,32 +28,29 @@ REST endpoints, and provides implementations for Apache Spark's
2828Right now, the plugin only provides support for Spark 3.5, Scala version 2.12 and 2.13,
2929and depends on iceberg-spark-runtime 1.9.0.
3030
31- # Build Plugin Jar
32- A task createPolarisSparkJar is added to build a jar for the Polaris Spark plugin, the jar is named as:
33- ` polaris-spark-<sparkVersion>_<scalaVersion>-<polarisVersion>-bundle.jar ` . For example:
34- ` polaris-spark-3.5_2.12-0.11.0-beta-incubating-SNAPSHOT-bundle.jar ` .
35-
36- - ` ./gradlew :polaris-spark-3.5_2.12:createPolarisSparkJar ` -- build jar for Spark 3.5 with Scala version 2.12.
37- - ` ./gradlew :polaris-spark-3.5_2.13:createPolarisSparkJar ` -- build jar for Spark 3.5 with Scala version 2.13.
38-
39- The result jar is located at plugins/spark/v3.5/build/<scala_version>/libs after the build.
40-
41- # Start Spark with Local Polaris Service using built Jar
42- Once the jar is built, we can manually test it with Spark and a local Polaris service.
43-
31+ # Start Spark with local Polaris service using the Polaris Spark plugin
4432The following command starts a Polaris server for local testing, it runs on localhost:8181 with default
45- realm ` POLARIS ` and root credentials ` root:secret ` :
33+ realm ` POLARIS ` and root credentials ` root:s3cr3t ` :
4634``` shell
4735./gradlew run
4836```
4937
50- Once the local server is running, the following command can be used to start the spark-shell with the built Spark client
51- jar, and to use the local Polaris server as a Catalog.
38+ Once the local server is running, you can start Spark with the Polaris Spark plugin using either the ` --packages `
39+ option with the Polaris Spark package, or the ` --jars ` option with the Polaris Spark bundle JAR.
40+
41+ The following sections explain how to build and run Spark with both the Polaris package and the bundle JAR.
42+
43+ # Build and run with Polaris spark package locally
44+ The Polaris Spark client source code is located in plugins/spark/v3.5/spark. To use the Polaris Spark package
45+ with Spark, you first need to publish the source JAR to your local Maven repository.
46+
47+ Run the following command to build the Polaris Spark project and publish the source JAR to your local Maven repository:
48+ - ` ./gradlew assemble ` -- build the whole Polaris project without running tests
49+ - ` ./gradlew publishToMavenLocal ` -- publish Polaris project source JAR to local Maven repository
5250
5351``` shell
5452bin/spark-shell \
55- --jars < path-to-spark-client-jar> \
56- --packages org.apache.iceberg:iceberg-aws-bundle:1.9.0,io.delta:delta-spark_2.12:3.3.1 \
53+ --packages org.apache.polaris:polaris-spark-< spark_version> _< scala_version> :< polaris_version> ,org.apache.iceberg:iceberg-aws-bundle:1.9.0,io.delta:delta-spark_2.12:3.3.1 \
5754--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \
5855--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \
5956--conf spark.sql.catalog.< catalog-name> .warehouse=< catalog-name> \
@@ -66,17 +63,20 @@ bin/spark-shell \
6663--conf spark.sql.sources.useV1SourceList=' '
6764```
6865
69- Assume the path to the built Spark client jar is
70- ` /polaris/plugins/spark/v3.5/spark/build/2.12/libs/polaris-spark-3.5_2.12-0.11.0-beta-incubating-SNAPSHOT-bundle.jar `
71- and the name of the catalog is ` polaris ` . The cli command will look like following:
66+ The Polaris version is defined in the ` versions.txt ` file located in the root directory of the Polaris project.
67+ Assume the following values:
68+ - ` spark_version ` : 3.5
69+ - ` scala_version ` : 2.12
70+ - ` polaris_version ` : 1.1.0-incubating-SNAPSHOT
71+ - ` catalog-name ` : ` polaris `
72+ The Spark command would look like following:
7273
7374``` shell
7475bin/spark-shell \
75- --jars /polaris/plugins/spark/v3.5/spark/build/2.12/libs/polaris-spark-3.5_2.12-0.11.0-beta-incubating-SNAPSHOT-bundle.jar \
76- --packages org.apache.iceberg:iceberg-aws-bundle:1.9.0,io.delta:delta-spark_2.12:3.3.1 \
76+ --packages org.apache.polaris:polaris-spark-3.5_2.12:1.1.0-incubating-SNAPSHOT,org.apache.iceberg:iceberg-aws-bundle:1.9.0,io.delta:delta-spark_2.12:3.3.1 \
7777--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \
7878--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \
79- --conf spark.sql.catalog.polaris.warehouse=< catalog-name > \
79+ --conf spark.sql.catalog.polaris.warehouse=polaris \
8080--conf spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation=vended-credentials \
8181--conf spark.sql.catalog.polaris=org.apache.polaris.spark.SparkCatalog \
8282--conf spark.sql.catalog.polaris.uri=http://localhost:8181/api/catalog \
@@ -86,6 +86,32 @@ bin/spark-shell \
8686--conf spark.sql.sources.useV1SourceList=' '
8787```
8888
89+ # Build and run with Polaris spark bundle JAR
90+ The polaris-spark project also provides a Spark bundle JAR for the ` --jars ` use case. The resulting JAR will follow this naming format:
91+ polaris-spark-<spark_version>_ <scala_version>-<polaris_version>-bundle.jar
92+ For example:
93+ polaris-spark-bundle-3.5_2.12-1.1.0-incubating-SNAPSHOT-bundle.jar
94+
95+ Run ` ./gradlew assemble ` to build the entire Polaris project without running tests. After the build completes,
96+ the bundle JAR can be found under: plugins/spark/v3.5/spark/build/<scala_version>/libs/.
97+ To start Spark using the bundle JAR, specify it with the ` --jars ` option as shown below:
98+
99+ ``` shell
100+ bin/spark-shell \
101+ --jars < path-to-spark-client-jar> \
102+ --packages org.apache.iceberg:iceberg-aws-bundle:1.9.0,io.delta:delta-spark_2.12:3.3.1 \
103+ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \
104+ --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \
105+ --conf spark.sql.catalog.< catalog-name> .warehouse=< catalog-name> \
106+ --conf spark.sql.catalog.< catalog-name> .header.X-Iceberg-Access-Delegation=vended-credentials \
107+ --conf spark.sql.catalog.< catalog-name> =org.apache.polaris.spark.SparkCatalog \
108+ --conf spark.sql.catalog.< catalog-name> .uri=http://localhost:8181/api/catalog \
109+ --conf spark.sql.catalog.< catalog-name> .credential=" root:secret" \
110+ --conf spark.sql.catalog.< catalog-name> .scope=' PRINCIPAL_ROLE:ALL' \
111+ --conf spark.sql.catalog.< catalog-name> .token-refresh-enabled=true \
112+ --conf spark.sql.sources.useV1SourceList=' '
113+ ```
114+
89115# Limitations
90116The Polaris Spark client supports catalog management for both Iceberg and Delta tables, it routes all Iceberg table
91117requests to the Iceberg REST endpoints, and routes all Delta table requests to the Generic Table REST endpoints.
0 commit comments