Skip to content

Commit 3bc6fe4

Browse files
committed
[SPARK-34809][CORE] Enable spark.hadoopRDD.ignoreEmptySplits by default
### What changes were proposed in this pull request? This PR aims to enable `spark.hadoopRDD.ignoreEmptySplits` by default for Apache Spark 3.2.0. ### Why are the changes needed? Although this is a safe improvement, this hasn't been enabled by default to avoid the explicit behavior change. This PR aims to switch the default explicitly in Apache Spark 3.2.0. ### Does this PR introduce _any_ user-facing change? Yes, the behavior change is documented. ### How was this patch tested? Pass the existing CIs. Closes #31909 from dongjoon-hyun/SPARK-34809. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
1 parent 3c32b54 commit 3bc6fe4

File tree

2 files changed

+3
-1
lines changed

2 files changed

+3
-1
lines changed

core/src/main/scala/org/apache/spark/internal/config/package.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1037,7 +1037,7 @@ package object config {
10371037
.doc("When true, HadoopRDD/NewHadoopRDD will not create partitions for empty input splits.")
10381038
.version("2.3.0")
10391039
.booleanConf
1040-
.createWithDefault(false)
1040+
.createWithDefault(true)
10411041

10421042
private[spark] val SECRET_REDACTION_PATTERN =
10431043
ConfigBuilder("spark.redaction.regex")

docs/core-migration-guide.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@ license: |
2424

2525
## Upgrading from Core 3.1 to 3.2
2626

27+
- Since Spark 3.2, `spark.hadoopRDD.ignoreEmptySplits` is set to `true` by default which means Spark will not create empty partitions for empty input splits. To restore the behavior before Spark 3.2, you can set `spark.hadoopRDD.ignoreEmptySplits` to `false`.
28+
2729
- Since Spark 3.2, `spark.eventLog.compression.codec` is set to `zstd` by default which means Spark will not fallback to use `spark.io.compression.codec` anymore.
2830

2931
- Since Spark 3.2, `spark.storage.replication.proactive` is enabled by default which means Spark tries to replenish in case of the loss of cached RDD block replicas due to executor failures. To restore the behavior before Spark 3.2, you can set `spark.storage.replication.proactive` to `false`.

0 commit comments

Comments
 (0)