-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Enable shuffle by default #881
Conversation
// TODO we should no longer be disabling COALESCE_PARTITIONS_ENABLED | ||
conf.set(SQLConf.COALESCE_PARTITIONS_ENABLED.key, "false") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR, but we should fix this in a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it in https://github.com/apache/datafusion-comet/pull/553/files#r1730694991, in order to trigger a test case failed in current code.
spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala
Outdated
Show resolved
Hide resolved
docs/source/user-guide/configs.md
Outdated
@@ -50,8 +50,7 @@ Comet provides the following configuration settings. | |||
| spark.comet.exec.memoryFraction | The fraction of memory from Comet memory overhead that the native memory manager can use for execution. The purpose of this config is to set aside memory for untracked data structures, as well as imprecise size estimation during memory acquisition. Default value is 0.7. | 0.7 | | |||
| spark.comet.exec.project.enabled | Whether to enable project by default. | true | | |||
| spark.comet.exec.shuffle.codec | The codec of Comet native shuffle used to compress shuffle data. Only zstd is supported. | zstd | | |||
| spark.comet.exec.shuffle.enabled | Whether to enable Comet native shuffle. By default, this config is false. Note that this requires setting 'spark.shuffle.manager' to 'org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager'. 'spark.shuffle.manager' must be set before starting the Spark application and cannot be changed during the application. | false | | |||
| spark.comet.exec.shuffle.mode | The mode of Comet shuffle. This config is only effective if Comet shuffle is enabled. Available modes are 'native', 'jvm', and 'auto'. 'native' is for native shuffle which has best performance in general. 'jvm' is for jvm-based columnar shuffle which has higher coverage than native shuffle. 'auto' is for Comet to choose the best shuffle mode based on the query plan. By default, this config is 'auto'. | auto | | |||
| spark.comet.exec.shuffle.enabled | Whether to enable Comet native shuffle. By default, this config is false. Note that this requires setting 'spark.shuffle.manager' to 'org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager'. 'spark.shuffle.manager' must be set before starting the Spark application and cannot be changed during the application. | true | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| spark.comet.exec.shuffle.enabled | Whether to enable Comet native shuffle. By default, this config is false. Note that this requires setting 'spark.shuffle.manager' to 'org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager'. 'spark.shuffle.manager' must be set before starting the Spark application and cannot be changed during the application. | true | | |
| spark.comet.exec.shuffle.enabled | Whether to enable Comet native shuffle. By default, this config is true. Note that this requires setting 'spark.shuffle.manager' to 'org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager'. 'spark.shuffle.manager' must be set before starting the Spark application and cannot be changed during the application. | true | |
@@ -189,7 +189,7 @@ object CometConf extends ShimCometConf { | |||
"'spark.shuffle.manager' must be set before starting the Spark application and " + | |||
"cannot be changed during the application.") | |||
.booleanConf | |||
.createWithDefault(false) | |||
.createWithDefault(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to fix default value in description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably automate adding the text stating the default value. Thanks for catching that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @andygrove some minors for defaults
* enable shuffle by default * disable shuffle in CometTestBase * format * fix regressions * fix * fix more * fix more * fix regression * fix regressions * Revert refactor * format * update docs (cherry picked from commit be10fee)
Which issue does this PR close?
N/A
Rationale for this change
We would like to enalbe shuffle by default.
What changes are included in this PR?
spark.comet.exec.shuffle.enabled
fromfalse
totrue
ORDER BY
clause to some queries to make the tests deterministicHow are these changes tested?