Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -462,6 +462,9 @@ private[spark] object SparkHadoopUtil {
for ((key, value) <- conf.getAll if key.startsWith("spark.hadoop.")) {
hadoopConf.set(key.substring("spark.hadoop.".length), value)
}
if (conf.getOption("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version").isEmpty) {
hadoopConf.set("mapreduce.fileoutputcommitter.algorithm.version", "1")
}
}

private def appendSparkHiveConfigs(conf: SparkConf, hadoopConf: Configuration): Unit = {
Expand Down
10 changes: 2 additions & 8 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -1761,16 +1761,10 @@ Apart from these, the following properties are also available, and may be useful
</tr>
<tr>
<td><code>spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version</code></td>
<td>Dependent on environment</td>
<td>1</td>
<td>
The file output committer algorithm version, valid algorithm version number: 1 or 2.
Version 2 may have better performance, but version 1 may handle failures better in certain situations,
as per <a href="https://issues.apache.org/jira/browse/MAPREDUCE-4815">MAPREDUCE-4815</a>.
The default value depends on the Hadoop version used in an environment:
1 for Hadoop versions lower than 3.0
2 for Hadoop versions 3.0 and higher
It's important to note that this can change back to 1 again in the future once <a href="https://issues.apache.org/jira/browse/MAPREDUCE-7282">MAPREDUCE-7282</a>
is fixed and merged.
Comment on lines -1767 to -1773

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why this is deleted? It is a very comprehensive comments about the hadoop version background. @dongjoon-hyun

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR aims to provide a consistent view for Apache Spark users. For example, The default value depends on the Hadoop version used in an environment is not valid any more. After this PR, Apache Spark users will use v1 consistently by default.

Note that 2 may cause a correctness issue like MAPREDUCE-7282.
</td>
<td>2.2.0</td>
</tr>
Expand Down