@@ -30,11 +30,66 @@ and depends on iceberg-spark-runtime 1.8.1.
3030
3131# Build Plugin Jar
3232A task createPolarisSparkJar is added to build a jar for the Polaris Spark plugin, the jar is named as:
33- "polaris-iceberg-<iceberg_version>- spark-runtime-<spark_major_version> _ < scala_version>.jar"
33+ The result jar is located at plugins/ spark/v3.5/build/< scala_version>/libs after the build.
3434
35- Building the Polaris project produces client jars for both Scala 2.12 and 2.13, and CI runs the Spark
36- client tests for both Scala versions as well .
35+ # Start Spark with Local Polaris Service using built Jar
36+ Once the jar is built, we can manually test it with Spark and a local Polaris service .
3737
38- The Jar can also be built alone with a specific version using target ` :polaris-spark-3.5_<scala_version> ` . For example:
39- - ` ./gradlew :polaris-spark-3.5_2.12:createPolarisSparkJar ` - Build a jar for the Polaris Spark plugin with scala version 2.12.
40- The result jar is located at plugins/spark/build/<scala_version>/libs after the build.
38+ The following command starts a Polaris server for local testing, it runs on localhost:8181 with default
39+ realm ` POLARIS ` and root credentials ` root:secret ` :
40+ ``` shell
41+ ./gradlew run
42+ ```
43+
44+ Once the local server is running, the following command can be used to start the spark-shell with the built Spark client
45+ jar, and to use the local Polaris server as a Catalog.
46+
47+ ``` shell
48+ bin/spark-shell \
49+ --jars < path-to-spark-client-jar> \
50+ --packages org.apache.hadoop:hadoop-aws:3.4.0,io.delta:delta-spark_2.12:3.3.1 \
51+ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \
52+ --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \
53+ --conf spark.sql.catalog.< catalog-name> .warehouse=< catalog-name> \
54+ --conf spark.sql.catalog.< catalog-name> .header.X-Iceberg-Access-Delegation=true \
55+ --conf spark.sql.catalog.< catalog-name> =org.apache.polaris.spark.SparkCatalog \
56+ --conf spark.sql.catalog.< catalog-name> .uri=http://localhost:8181/api/catalog \
57+ --conf spark.sql.catalog.< catalog-name> .credential=" root:secret" \
58+ --conf spark.sql.catalog.< catalog-name> .scope=' PRINCIPAL_ROLE:ALL' \
59+ --conf spark.sql.catalog.< catalog-name> .token-refresh-enabled=true \
60+ --conf spark.sql.catalog.< catalog-name> .type=rest \
61+ --conf spark.sql.sources.useV1SourceList=' '
62+ ```
63+
64+ Assume the path to the built Spark client jar is
65+ ` /polaris/plugins/spark/v3.5/spark/build/2.12/libs/polaris-iceberg-1.8.1-spark-runtime-3.5_2.12-0.10.0-beta-incubating-SNAPSHOT.jar `
66+ and the name of the catalog is ` polaris ` . The cli command will look like following:
67+
68+ ``` shell
69+ bin/spark-shell \
70+ --jars /polaris/plugins/spark/v3.5/spark/build/2.12/libs/polaris-iceberg-1.8.1-spark-runtime-3.5_2.12-0.10.0-beta-incubating-SNAPSHOT.jar \
71+ --packages org.apache.hadoop:hadoop-aws:3.4.0,io.delta:delta-spark_2.12:3.3.1 \
72+ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,io.delta.sql.DeltaSparkSessionExtension \
73+ --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \
74+ --conf spark.sql.catalog.polaris.warehouse=< catalog-name> \
75+ --conf spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation=true \
76+ --conf spark.sql.catalog.polaris=org.apache.polaris.spark.SparkCatalog \
77+ --conf spark.sql.catalog.polaris.uri=http://localhost:8181/api/catalog \
78+ --conf spark.sql.catalog.polaris.credential=" root:secret" \
79+ --conf spark.sql.catalog.polaris.scope=' PRINCIPAL_ROLE:ALL' \
80+ --conf spark.sql.catalog.polaris.token-refresh-enabled=true \
81+ --conf spark.sql.catalog.polaris.type=rest \
82+ --conf spark.sql.sources.useV1SourceList=' '
83+ ```
84+
85+ # Limitations
86+ The Polaris Spark client supports catalog management for both Iceberg and Delta tables, it routes all Iceberg table
87+ requests to the Iceberg REST endpoints, and routes all Delta table requests to the Generic Table REST endpoints.
88+
89+ Following describes the current limitations of the Polaris Spark client:
90+ 1 ) Create table as select (CTAS) is not supported for Delta tables. As a result, the ` saveAsTable ` method of ` Dataframe `
91+ is also not supported, since it relies on the CTAS support.
92+ 2 ) Create a Delta table without explicit location is not supported.
93+ 3 ) Rename a Delta table is not supported.
94+ 4 ) ALTER TABLE ... SET LOCATION/SET FILEFORMAT/ADD PARTITION is not supported for DELTA table.
95+ 5 ) For other non-iceberg tables like csv, there is no specific guarantee provided today.
0 commit comments