Skip to content

Spark: Session Configuration for custom snapshot-property #12997

@cccs-jory

Description

@cccs-jory

Feature Request / Improvement

Currently via Spark you can add custom snapshot properties in one of two ways:

  1. Using the Java API, use the withCommitMetadata method to wrap a runnable, e.g. a SQL command.
  2. Using the DataFrameWriterV2, add a runtime configuration of snapshot-property.key=value.

There is currently no way to add custom snapshot properties in a pure SQL context. We run PySpark, and so when issuing deletes we need to do spark.sql("DELETE FROM ..."). My proposal (which AI says is "feasible" :P) is to create a new spark session configuration called spark.sql.iceberg.snapshot-property.key=value, which would get mixed in with other configuration options when creating a new snapshot. If the property is seen it will add the specified snapshot property to the snapshot summary.

This would be useful for better identifying snapshot commits - e.g. you could add properties explaining where the commit came from, e.g. maintenance-job.

Query engine

Spark

Willingness to contribute

  • I can contribute this improvement/feature independently
  • I would be willing to contribute this improvement/feature with guidance from the Iceberg community
  • I cannot contribute this improvement/feature at this time

Associated PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    improvementPR that improves existing functionalitystale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions