-
Notifications
You must be signed in to change notification settings - Fork 3k
Closed as not planned
Labels
Description
Feature Request / Improvement
Currently via Spark you can add custom snapshot properties in one of two ways:
- Using the Java API, use the withCommitMetadata method to wrap a runnable, e.g. a SQL command.
- Using the DataFrameWriterV2, add a runtime configuration of snapshot-property.key=value.
There is currently no way to add custom snapshot properties in a pure SQL context. We run PySpark, and so when issuing deletes we need to do spark.sql("DELETE FROM ..."). My proposal (which AI says is "feasible" :P) is to create a new spark session configuration called spark.sql.iceberg.snapshot-property.key=value, which would get mixed in with other configuration options when creating a new snapshot. If the property is seen it will add the specified snapshot property to the snapshot summary.
This would be useful for better identifying snapshot commits - e.g. you could add properties explaining where the commit came from, e.g. maintenance-job.
Query engine
Spark
Willingness to contribute
- I can contribute this improvement/feature independently
- I would be willing to contribute this improvement/feature with guidance from the Iceberg community
- I cannot contribute this improvement/feature at this time
Reactions are currently unavailable