-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26389][SS] Add force delete temp checkpoint configuration #23732
Conversation
Test build #102036 has finished for PR 23732 at commit
|
retest this please |
Test build #102039 has finished for PR 23732 at commit
|
retest this please |
Test build #102042 has finished for PR 23732 at commit
|
thanks for quick improvement |
Two minor comments: The flag description is a bit confusing; I would just describe the behavior when the flag is on, rather than saying it’s “normal” deletion. I think the new flag should be read in DataStreamWriter, so the shouldDelete value we pass into StreamExecution will always be correct. |
Clear, will do.
|
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala
Outdated
Show resolved
Hide resolved
You're right, there's no good way to move the flag the way I wanted. Looks good modulo Dongjoon's comments. |
Test build #102078 has finished for PR 23732 at commit
|
Test build #102083 has finished for PR 23732 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. Left minor comment on documentation.
@@ -907,6 +907,12 @@ object SQLConf { | |||
.stringConf | |||
.createOptional | |||
|
|||
val FORCE_DELETE_TEMP_CHECKPOINT_LOCATION = | |||
buildConf("spark.sql.streaming.forceDeleteTempCheckpointLocation") | |||
.doc("When true, enable temporary checkpoint locations force delete.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd like to have this to be a valid and more described sentence like other configurations, especially this feature is only documented here. One example for me, When true, Spark always deletes temporary checkpoint locations for success queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ur, @HeartSaVioR . That is the behavior before this PR.
The benefit of this configuration is to delete the locations when exception occurs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes my bad I meant failed queries
, maybe better, regardless of query's result - success or failure
. Anyway this option is only documented here so I'd just wanted to make sure there's clear description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. This configuration is useful.
Merged to master.
Thank you, @gaborgsomogyi , @camper42 , @jose-torres , @HeartSaVioR ! |
## What changes were proposed in this pull request? Not all users wants to keep temporary checkpoint directories. Additionally hard to restore from it. In this PR I've added a force delete flag which is default `false`. Additionally not clear for users when temporary checkpoint directory deleted so added log messages to explain this a bit more. ## How was this patch tested? Existing + additional unit tests. Closes apache#23732 from gaborgsomogyi/SPARK-26389. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request? Not all users wants to keep temporary checkpoint directories. Additionally hard to restore from it. In this PR I've added a force delete flag which is default `false`. Additionally not clear for users when temporary checkpoint directory deleted so added log messages to explain this a bit more. ## How was this patch tested? Existing + additional unit tests. Closes apache#23732 from gaborgsomogyi/SPARK-26389. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@@ -907,6 +907,12 @@ object SQLConf { | |||
.stringConf | |||
.createOptional | |||
|
|||
val FORCE_DELETE_TEMP_CHECKPOINT_LOCATION = | |||
buildConf("spark.sql.streaming.forceDeleteTempCheckpointLocation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us follow the other boolean conf naming convention?
spark.sql.streaming.forceDeleteTempCheckpointLocation.enabled
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @Ngone51 could you submit a PR to make a change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Submitted #26981
What changes were proposed in this pull request?
Not all users wants to keep temporary checkpoint directories. Additionally hard to restore from it.
In this PR I've added a force delete flag which is default
false
. Additionally not clear for users when temporary checkpoint directory deleted so added log messages to explain this a bit more.How was this patch tested?
Existing + additional unit tests.