-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Only allow incompatible cast expressions to run in comet if a config is enabled #362
Changes from 33 commits
d592417
4641540
39ac177
c7775a6
4e803cc
d255879
b2d3d2d
889c754
35bcbf9
5be20c6
a27f846
23847cd
4bf6c16
ed07913
5255d6c
0c8da56
22c6394
7c39b05
787d17b
1b833e2
7ad2a43
ee77b14
6d903c3
4c24e41
b8df40f
4b98bc2
6675c72
b67d571
83a70b7
e5226f0
6738544
a1bfdee
15c4989
8c40e17
be94495
581d92e
4374fc0
2e5e33b
128e763
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
<!--- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
|
||
# Compatibility Guide | ||
|
||
Comet aims to provide consistent results with the version of Apache Spark that is being used. | ||
|
||
This guide offers information about areas of functionality where there are known differences. | ||
|
||
## ANSI mode | ||
|
||
Comet currently ignores ANSI mode in most cases, and therefore can produce different results than Spark. By default, | ||
Comet will fall back to Spark if ANSI mode is enabled. To enable Comet to accelerate queries when ANSI mode is enabled, | ||
specify `spark.comet.ansi.enabled=true` in the Spark configuration. Comet's ANSI support is experimental and should not | ||
be used in production. | ||
|
||
There is an [epic](https://github.com/apache/datafusion-comet/issues/313) where we are tracking the work to fully implement ANSI support. | ||
|
||
## Cast | ||
|
||
Cast operations in Comet fall into three levels of support: | ||
|
||
- **Compatible**: The results match Apache Spark | ||
- **Incompatible**: The results may match Apache Spark for some inputs, but there are known issues where some inputs | ||
will result in incorrect results or exceptions. The query stage will fall back to Spark by default. Setting | ||
`spark.comet.cast.allowIncompatible=true` will allow all incompatible casts to run natively in Comet, but this is not | ||
recommended for production use. | ||
- **Unsupported**: Comet does not provide a native version of this cast expression and the query stage will fall back to | ||
Spark. | ||
|
||
The following table shows the current cast operations supported by Comet. Any cast that does not appear in this | ||
table (such as those involving complex types and timestamp_ntz, for example) are not supported by Comet. | ||
|
||
<!--CAST_TABLE--> |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,7 +25,7 @@ Comet provides the following configuration settings. | |
|--------|-------------|---------------| | ||
| spark.comet.ansi.enabled | Comet does not respect ANSI mode in most cases and by default will not accelerate queries when ansi mode is enabled. Enable this setting to test Comet's experimental support for ANSI mode. This should not be used in production. | false | | ||
| spark.comet.batchSize | The columnar batch size, i.e., the maximum number of rows that a batch can contain. | 8192 | | ||
| spark.comet.cast.stringToTimestamp | Comet is not currently fully compatible with Spark when casting from String to Timestamp. | false | | ||
| spark.comet.cast.allowIncompatible | Comet is not currently fully compatible with Spark for all cast operations. Set this config to true to allow them anyway. See compatibility guide for more information. | true | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't this document that it defaults to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks. Updated. |
||
| spark.comet.columnar.shuffle.async.enabled | Whether to enable asynchronous shuffle for Arrow-based shuffle. By default, this config is false. | false | | ||
| spark.comet.columnar.shuffle.async.max.thread.num | Maximum number of threads on an executor used for Comet async columnar shuffle. By default, this config is 100. This is the upper bound of total number of shuffle threads per executor. In other words, if the number of cores * the number of shuffle threads per task `spark.comet.columnar.shuffle.async.thread.num` is larger than this config. Comet will use this config as the number of shuffle threads per executor instead. | 100 | | ||
| spark.comet.columnar.shuffle.async.thread.num | Number of threads used for Comet async columnar shuffle per shuffle task. By default, this config is 3. Note that more threads means more memory requirement to buffer shuffle data before flushing to disk. Also, more threads may not always improve performance, and should be set based on the number of cores available. | 3 | | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -58,6 +58,11 @@ under the License. | |
<groupId>org.scala-lang</groupId> | ||
<artifactId>scala-library</artifactId> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.scala-lang</groupId> | ||
<artifactId>scala-reflect</artifactId> | ||
<scope>provided</scope> | ||
</dependency> | ||
Comment on lines
+61
to
+65
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't fully understand why this was needed, but I could not compile without this. I found this answer in https://stackoverflow.com/questions/37505380/java-lang-noclassdeffounderror-scala-reflect-api-typecreator |
||
<dependency> | ||
<groupId>com.google.protobuf</groupId> | ||
<artifactId>protobuf-java</artifactId> | ||
|
@@ -270,17 +275,13 @@ under the License. | |
<version>3.2.0</version> | ||
<executions> | ||
<execution> | ||
<id>generate-config-docs</id> | ||
<id>generate-user-guide-reference-docs</id> | ||
<phase>package</phase> | ||
<goals> | ||
<goal>java</goal> | ||
</goals> | ||
<configuration> | ||
<mainClass>org.apache.comet.CometConfGenerateDocs</mainClass> | ||
<arguments> | ||
<argument>docs/source/user-guide/configs-template.md</argument> | ||
<argument>docs/source/user-guide/configs.md</argument> | ||
</arguments> | ||
<mainClass>org.apache.comet.GenerateDocs</mainClass> | ||
<classpathScope>compile</classpathScope> | ||
</configuration> | ||
</execution> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code had to move to the spark module so that it can access the cast information