-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-18024][SQL] Introduce an internal commit protocol API #15707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[SPARK-18087] [SQL] Optimize insert to not require REPAIR TABLE
|
This lgtm, modulo the comments in #15696 |
| committer, | ||
| iterator = iter) | ||
| }).flatten.distinct | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the distinct to updatedPartitions?
|
Test build #67855 has finished for PR 15707 at commit
|
|
|
||
| val STREAMING_FILE_COMMIT_PROTOCOL_CLASS = | ||
| SQLConfigBuilder("spark.sql.streaming.commitProtocolClass") | ||
| .internal() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: two spaces
|
This LGTM, just a minor comment |
|
Test build #67865 has finished for PR 15707 at commit
|
|
Looks like the test failed due to a flaky test, but other than that everything else was fine. I'm going to merge this optimistically. |
|
Test build #3384 has finished for PR 15707 at commit
|
|
Test build #3386 has finished for PR 15707 at commit
|
## What changes were proposed in this pull request? This patch introduces an internal commit protocol API that is used by the batch data source to do write commits. It currently has only one implementation that uses Hadoop MapReduce's OutputCommitter API. In the future, this commit API can be used to unify streaming and batch commits. ## How was this patch tested? Should be covered by existing write tests. Author: Reynold Xin <rxin@databricks.com> Author: Eric Liang <ekl@databricks.com> Closes apache#15707 from rxin/SPARK-18024-2.
What changes were proposed in this pull request?
This patch introduces an internal commit protocol API that is used by the batch data source to do write commits. It currently has only one implementation that uses Hadoop MapReduce's OutputCommitter API. In the future, this commit API can be used to unify streaming and batch commits.
How was this patch tested?
Should be covered by existing write tests.