Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 15 additions & 3 deletions docs/spark-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,14 @@ spark.sql.catalog.hive_prod.uri = thrift://metastore-host:port
# omit uri to use the same URI as Spark: hive.metastore.uris in hive-site.xml
```

Below is an example for a REST catalog named `rest_prod` that loads tables from REST URL `http://localhost:8080`:

```plain
spark.sql.catalog.rest_prod = org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.rest_prod.type = rest
spark.sql.catalog.rest_prod.uri = http://localhost:8080
```

Iceberg also supports a directory-based catalog in HDFS that can be configured using `type=hadoop`:

```plain
Expand All @@ -66,12 +74,16 @@ Both catalogs are configured using properties nested under the catalog name. Com
| Property | Values | Description |
| -------------------------------------------------- | ----------------------------- | -------------------------------------------------------------------- |
| spark.sql.catalog._catalog-name_.type | `hive`, `hadoop` or `rest` | The underlying Iceberg catalog implementation, `HiveCatalog`, `HadoopCatalog`, `RESTCatalog` or left unset if using a custom catalog |
| spark.sql.catalog._catalog-name_.catalog-impl | | The underlying Iceberg catalog implementation.|
| spark.sql.catalog._catalog-name_.catalog-impl | | The custom Iceberg catalog implementation. If `type` is null, `catalog-impl` must not be null. |
| spark.sql.catalog._catalog-name_.io-impl | | The custom FileIO implementation. |
| spark.sql.catalog._catalog-name_.metrics-reporter-impl | | The custom MetricsReporter implementation. |
| spark.sql.catalog._catalog-name_.default-namespace | default | The default current namespace for the catalog |
| spark.sql.catalog._catalog-name_.uri | thrift://host:port | Metastore connect URI; default from `hive-site.xml` |
| spark.sql.catalog._catalog-name_.uri | thrift://host:port | Hive metastore URL for hive typed catalog, REST URL for REST typed catalog |
| spark.sql.catalog._catalog-name_.warehouse | hdfs://nn:8020/warehouse/path | Base path for the warehouse directory |
| spark.sql.catalog._catalog-name_.cache-enabled | `true` or `false` | Whether to enable catalog cache, default value is `true` |
| spark.sql.catalog._catalog-name_.cache.expiration-interval-ms | `30000` (30 seconds) | Duration after which cached catalog entries are expired; Only effective if `cache-enabled` is `true`. `-1` disables cache expiration and `0` disables caching entirely, irrespective of `cache-enabled`. Default is `30000` (30 seconds) | |
| spark.sql.catalog._catalog-name_.cache.expiration-interval-ms | `30000` (30 seconds) | Duration after which cached catalog entries are expired; Only effective if `cache-enabled` is `true`. `-1` disables cache expiration and `0` disables caching entirely, irrespective of `cache-enabled`. Default is `30000` (30 seconds) |
| spark.sql.catalog._catalog-name_.table-default._propertyKey_ | | Default Iceberg table property value for property key _propertyKey_, which will be set on tables created by this catalog if not overridden |
| spark.sql.catalog._catalog-name_.table-override._propertyKey_ | | Enforced Iceberg table property value for property key _propertyKey_, which cannot be overridden by user |

Additional properties can be found in common [catalog configuration](../configuration#catalog-properties).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,16 +89,23 @@
* <p>This supports the following catalog configuration options:
*
* <ul>
* <li><code>type</code> - catalog type, "hive" or "hadoop". To specify a non-hive or hadoop
* catalog, use the <code>catalog-impl</code> option.
* <li><code>uri</code> - the Hive Metastore URI (Hive catalog only)
* <li><code>type</code> - catalog type, "hive" or "hadoop" or "rest". To specify a non-hive or
* hadoop catalog, use the <code>catalog-impl</code> option.
* <li><code>uri</code> - the Hive Metastore URI for Hive catalog or REST URI for REST catalog
* <li><code>warehouse</code> - the warehouse path (Hadoop catalog only)
* <li><code>catalog-impl</code> - a custom {@link Catalog} implementation to use
* <li><code>io-impl</code> - a custom {@link org.apache.iceberg.io.FileIO} implementation to use
* <li><code>metrics-reporter-impl</code> - a custom {@link
* org.apache.iceberg.metrics.MetricsReporter} implementation to use
* <li><code>default-namespace</code> - a namespace to use as the default
* <li><code>cache-enabled</code> - whether to enable catalog cache
* <li><code>cache.expiration-interval-ms</code> - interval in millis before expiring tables from
* catalog cache. Refer to {@link CatalogProperties#CACHE_EXPIRATION_INTERVAL_MS} for further
* details and significant values.
* <li><code>table-default.$tablePropertyKey</code> - table property $tablePropertyKey default at
* catalog level
* <li><code>table-override.$tablePropertyKey</code> - table property $tablePropertyKey enforced
* at catalog level
* </ul>
*
* <p>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,16 +89,23 @@
* <p>This supports the following catalog configuration options:
*
* <ul>
* <li><code>type</code> - catalog type, "hive" or "hadoop". To specify a non-hive or hadoop
* catalog, use the <code>catalog-impl</code> option.
* <li><code>uri</code> - the Hive Metastore URI (Hive catalog only)
* <li><code>type</code> - catalog type, "hive" or "hadoop" or "rest". To specify a non-hive or
* hadoop catalog, use the <code>catalog-impl</code> option.
* <li><code>uri</code> - the Hive Metastore URI for Hive catalog or REST URI for REST catalog
* <li><code>warehouse</code> - the warehouse path (Hadoop catalog only)
* <li><code>catalog-impl</code> - a custom {@link Catalog} implementation to use
* <li><code>io-impl</code> - a custom {@link org.apache.iceberg.io.FileIO} implementation to use
* <li><code>metrics-reporter-impl</code> - a custom {@link
* org.apache.iceberg.metrics.MetricsReporter} implementation to use
* <li><code>default-namespace</code> - a namespace to use as the default
* <li><code>cache-enabled</code> - whether to enable catalog cache
* <li><code>cache.expiration-interval-ms</code> - interval in millis before expiring tables from
* catalog cache. Refer to {@link CatalogProperties#CACHE_EXPIRATION_INTERVAL_MS} for further
* details and significant values.
* <li><code>table-default.$tablePropertyKey</code> - table property $tablePropertyKey default at
* catalog level
* <li><code>table-override.$tablePropertyKey</code> - table property $tablePropertyKey enforced
* at catalog level
* </ul>
*
* <p>
Expand Down