diff --git a/docs/configuration.md b/docs/configuration.md index 981170d8b49b7..55d8ae0a867ef 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -256,6 +256,14 @@ Apart from these, the following properties are also available, and may be useful spark.storage.memoryFraction. + + spark.shuffle.safetyFraction + 0.8 + + An additional margin of safety fraction of Java heap to use for aggregation and cogroups during + shuffles, in case the size estimation of maps used for shuffle is not sufficiently accurate. + + spark.shuffle.compress true @@ -286,11 +294,18 @@ Apart from these, the following properties are also available, and may be useful HASH Implementation to use for shuffling data. A hash-based shuffle manager is the default, but - starting in Spark 1.1 there is an experimental sort-based shuffle manager that is more + starting in Spark 1.1 there is an experimental sort-based shuffle manager that is more memory-efficient in environments with small executors, such as YARN. To use that, change this value to SORT. + + spark.shuffle.spill.batchSize + 10000 + + Size of object batches when reading/writing from serializers. + + spark.shuffle.sort.bypassMergeThreshold 200