Spark: Custom snapshot property from session configuration#12999
Spark: Custom snapshot property from session configuration#12999cccs-jory wants to merge 8 commits intoapache:mainfrom
Conversation
|
Considering that the goal is to be able to set it dynamically for each query I think it would be better if there is a way to include it in the query itself instead of having it as a session configuration. |
I don't think that's the goal necessarily. For example an org may create a spark session that executes DML on many tables, therefore creating many commits. That spark session could set this configuration and have its custom snapshot property applied on each table's commits. Setting it dynamically in Spark SQL could be a different task? |
singhpk234
left a comment
There was a problem hiding this comment.
Similar requirements did come up in the past : #4956 is this not sufficient ? or you want this for specifically with the SQLConfs ?
Can you please elaborate your use case :
is the concern this is not supported for DELETE / MERGE and only INSERT ? if yes we should fix that.
|
|
Please only target one Spark Version in this PR for easier reviewing, we can do a fast followup that changes earlier versions after we merge this. |
| @Test | ||
| public void testExtraSnapshotMetadataReflectsSessionConfig() { | ||
| withSQLConf( | ||
| ImmutableMap.of("spark.sql.iceberg.snapshot-property.test-key", "test-value"), |
There was a problem hiding this comment.
Trivial nit: "test-value" -> "session-value" so it mimics the test below. This is the tiniest of nits so feel free to ignore.
RussellSpitzer
left a comment
There was a problem hiding this comment.
I have a tiny nit, and since these changes are so small I don't think we need to split apart the PR into different versions.
|
Looks like there is a build issue @cccs-jory, please update and let me know when to run tests again :) |
|
@RussellSpitzer I removed the feature on 3.4 and addressed the nitpick, however I may need some more eyes to fix the build issue. Essentially gradlew is failing when compiling on 3.5 because it can't find Spark's |
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java
Outdated
Show resolved
Hide resolved
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java
Outdated
Show resolved
Hide resolved
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
I think this is still relevant and not stale. The default version of Spark changed to 4.0 so we may want to move the changes to that branch. @cccs-jory, could you address comments and update the PR? |
RussellSpitzer
left a comment
There was a problem hiding this comment.
Any follow up on this? Looks like we are just waiting on the change from @singhpk234 ?
|
Yep my bad, time is in short supply around here. I'll get that in soon. |
|
:) We are all oversubscribed for sure |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
|
Any update on this PR? We've found this custom snapshot property support from session configuration helpful for our use cases, wondering if anything is blocking the merge. |
|
@owen6314 It would be great if someone could pick this up and finalize it - I have not had a chance to get back around to it |
Adds support to allow custom snapshot properties to be specified in the Spark session configuration. This allows users to add custom snapshot properties even when running spark SQL DML such as DELETE or MERGE INTO, which was not previously supported.