Skip to content

Conversation

@rxin
Copy link
Contributor

@rxin rxin commented Nov 1, 2016

What changes were proposed in this pull request?

This patch introduces an internal commit protocol API that is used by the batch data source to do write commits. It currently has only one implementation that uses Hadoop MapReduce's OutputCommitter API. In the future, this commit API can be used to unify streaming and batch commits.

How was this patch tested?

Should be covered by existing write tests.

@rxin
Copy link
Contributor Author

rxin commented Nov 1, 2016

This is the same as #15696

but rebased with #15633

@ericl
Copy link
Contributor

ericl commented Nov 1, 2016

This lgtm, modulo the comments in #15696

committer,
iterator = iter)
}).flatten.distinct
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the distinct to updatedPartitions?

@rxin rxin changed the title [SPARK-18024][SQL] Introduce an internal commit protocol API - rebased [SPARK-18024][SQL] Introduce an internal commit protocol API Nov 1, 2016
@SparkQA
Copy link

SparkQA commented Nov 1, 2016

Test build #67855 has finished for PR 15707 at commit 0647959.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


val STREAMING_FILE_COMMIT_PROTOCOL_CLASS =
SQLConfigBuilder("spark.sql.streaming.commitProtocolClass")
.internal()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: two spaces

@ericl
Copy link
Contributor

ericl commented Nov 1, 2016

This LGTM, just a minor comment

@SparkQA
Copy link

SparkQA commented Nov 1, 2016

Test build #67865 has finished for PR 15707 at commit 65ba5c1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor Author

rxin commented Nov 1, 2016

Looks like the test failed due to a flaky test, but other than that everything else was fine. I'm going to merge this optimistically.

@asfgit asfgit closed this in d9d1465 Nov 1, 2016
@SparkQA
Copy link

SparkQA commented Nov 1, 2016

Test build #3384 has finished for PR 15707 at commit 0177ded.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class HadoopCommitProtocolWrapper(path: String, isAppend: Boolean)

@SparkQA
Copy link

SparkQA commented Nov 1, 2016

Test build #3386 has finished for PR 15707 at commit 65ba5c1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?
This patch introduces an internal commit protocol API that is used by the batch data source to do write commits. It currently has only one implementation that uses Hadoop MapReduce's OutputCommitter API. In the future, this commit API can be used to unify streaming and batch commits.

## How was this patch tested?
Should be covered by existing write tests.

Author: Reynold Xin <rxin@databricks.com>
Author: Eric Liang <ekl@databricks.com>

Closes apache#15707 from rxin/SPARK-18024-2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants