Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -631,17 +631,19 @@ package object config {
private[spark] val MAX_REMOTE_BLOCK_SIZE_FETCH_TO_MEM =
ConfigBuilder("spark.maxRemoteBlockSizeFetchToMem")
.doc("Remote block will be fetched to disk when size of the block is above this threshold " +
"in bytes. This is to avoid a giant request takes too much memory. We can enable this " +
"config by setting a specific value(e.g. 200m). Note this configuration will affect " +
"both shuffle fetch and block manager remote block fetch. For users who enabled " +
"external shuffle service, this feature can only be worked when external shuffle" +
"service is newer than Spark 2.2.")
"in bytes. This is to avoid a giant request takes too much memory. Note this " +
"configuration will affect both shuffle fetch and block manager remote block fetch. " +
"For users who enabled external shuffle service, this feature can only work when " +
"external shuffle service is at least 2.3.0.")
.bytesConf(ByteUnit.BYTE)
// fetch-to-mem is guaranteed to fail if the message is bigger than 2 GB, so we might
// as well use fetch-to-disk in that case. The message includes some metadata in addition
// to the block data itself (in particular UploadBlock has a lot of metadata), so we leave
// extra room.
.createWithDefault(Int.MaxValue - 512)
.checkValue(
_ <= Int.MaxValue - 512,
"maxRemoteBlockSizeFetchToMem cannot be larger than (Int.MaxValue - 512) bytes.")
.createWithDefaultString("200m")

private[spark] val TASK_METRICS_TRACK_UPDATED_BLOCK_STATUSES =
ConfigBuilder("spark.taskMetrics.trackUpdatedBlockStatuses")
Expand Down
24 changes: 11 additions & 13 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -626,19 +626,6 @@ Apart from these, the following properties are also available, and may be useful
You can mitigate this issue by setting it to a lower value.
</td>
</tr>
<tr>
<td><code>spark.maxRemoteBlockSizeFetchToMem</code></td>
<td>Int.MaxValue - 512</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to clarify, you intentionally moved this from shuffle section to network section since it affects both the shuffle fetch and block manager fetches?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea

<td>
The remote block will be fetched to disk when size of the block is above this threshold in bytes.
This is to avoid a giant request that takes too much memory. By default, this is only enabled
for blocks > 2GB, as those cannot be fetched directly into memory, no matter what resources are
available. But it can be turned down to a much lower value (eg. 200m) to avoid using too much
memory on smaller blocks as well. Note this configuration will affect both shuffle fetch
and block manager remote block fetch. For users who enabled external shuffle service,
this feature can only be used when external shuffle service is newer than Spark 2.2.
</td>
</tr>
<tr>
<td><code>spark.shuffle.compress</code></td>
<td>true</td>
Expand Down Expand Up @@ -1519,6 +1506,17 @@ Apart from these, the following properties are also available, and may be useful
you can set larger value.
</td>
</tr>
<tr>
<td><code>spark.maxRemoteBlockSizeFetchToMem</code></td>
<td>200m</td>
<td>
Remote block will be fetched to disk when size of the block is above this threshold
in bytes. This is to avoid a giant request takes too much memory. Note this
configuration will affect both shuffle fetch and block manager remote block fetch.
For users who enabled external shuffle service, this feature can only work when
external shuffle service is at least 2.3.0.
</td>
</tr>
</table>

### Scheduling
Expand Down