Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/sql-performance-tuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -267,7 +267,7 @@ This feature coalesces the post shuffle partitions based on the map output stati
<td><code>spark.sql.adaptive.coalescePartitions.parallelismFirst</code></td>
<td>true</td>
<td>
When true, Spark ignores the target size specified by <code>spark.sql.adaptive.advisoryPartitionSizeInBytes</code> (default 64MB) when coalescing contiguous shuffle partitions, and only respect the minimum partition size specified by <code>spark.sql.adaptive.coalescePartitions.minPartitionSize</code> (default 1MB), to maximize the parallelism. This is to avoid performance regression when enabling adaptive query execution. It's recommended to set this config to false and respect the target size specified by <code>spark.sql.adaptive.advisoryPartitionSizeInBytes</code>.
When true, Spark ignores the target size specified by <code>spark.sql.adaptive.advisoryPartitionSizeInBytes</code> (default 64MB) when coalescing contiguous shuffle partitions, and only respect the minimum partition size specified by <code>spark.sql.adaptive.coalescePartitions.minPartitionSize</code> (default 1MB), to maximize the parallelism. This is to avoid performance regressions when enabling adaptive query execution. It's recommended to set this config to true on a busy cluster to make resource utilization more efficient (not many small tasks).
</td>
<td>3.2.0</td>
</tr>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -721,8 +721,9 @@ object SQLConf {
"shuffle partitions, but adaptively calculate the target size according to the default " +
"parallelism of the Spark cluster. The calculated size is usually smaller than the " +
"configured target size. This is to maximize the parallelism and avoid performance " +
"regression when enabling adaptive query execution. It's recommended to set this config " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maryannxue is it really recommended?

Copy link
Contributor

@cloud-fan cloud-fan Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just say It's recommended to set this config to true on a busy cluster to make resource utilization more efficient (not many small tasks).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggestion contains a mistake right? It should be set to false in a busy cluster? #45437

"to false and respect the configured target size.")
"regressions when enabling adaptive query execution. It's recommended to set this " +
"config to true on a busy cluster to make resource utilization more efficient (not many " +
"small tasks).")
.version("3.2.0")
.booleanConf
.createWithDefault(true)
Expand Down