-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Support aggregates when ANSI mode is enabled #5114
Labels
feature request
New feature or request
Comments
nartal1
added
feature request
New feature or request
? - Needs Triage
Need team to review and classify
labels
Mar 31, 2022
mythrocks
added a commit
to mythrocks/spark-rapids
that referenced
this issue
Jun 17, 2024
Fixes NVIDIA#11019. Window function tests fail on Spark 4.0 because of NVIDIA#5114 (and NVIDIA#5120 broadly), because spark-rapids does not support SUM, COUNT, and certain other aggregations in ANSI mode. This commit disables ANSI mode tests for the failing window function tests. These may be revisited, once error/overflow checking is available for ANSI mode in spark-rapids. Signed-off-by: MithunR <mithunr@nvidia.com>
razajafri
pushed a commit
that referenced
this issue
Jun 26, 2024
* Disable ANSI mode for window function tests. Fixes #11019. Window function tests fail on Spark 4.0 because of #5114 (and #5120 broadly), because spark-rapids does not support SUM, COUNT, and certain other aggregations in ANSI mode. This commit disables ANSI mode tests for the failing window function tests. These may be revisited, once error/overflow checking is available for ANSI mode in spark-rapids. Signed-off-by: MithunR <mithunr@nvidia.com> * Switch from @ansi_mode_disabled to @disable_ansi_mode. --------- Signed-off-by: MithunR <mithunr@nvidia.com>
wjxiz1992
added a commit
to nvliyuan/yuali-spark-rapids
that referenced
this issue
Jun 26, 2024
* optimzing Expand+Aggregate in sqlw with many count distinct Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> * Add GpuBucketingUtils shim to Spark 4.0.0 (NVIDIA#11092) * Add GpuBucketingUtils shim to Spark 4.0.0 * Signing off Signed-off-by: Raza Jafri <rjafri@nvidia.com> --------- Signed-off-by: Raza Jafri <rjafri@nvidia.com> * Improve the diagnostics for 'conv' fallback explain (NVIDIA#11076) * Improve the diagnostics for 'conv' fallback explain Signed-off-by: Jihoon Son <ghoonson@gmail.com> * don't use nil Signed-off-by: Jihoon Son <ghoonson@gmail.com> * the bases should not be an empty string in the error message when the user input is not Signed-off-by: Jihoon Son <ghoonson@gmail.com> * more user-friendly message * Update sql-plugin/src/main/scala/org/apache/spark/sql/rapids/stringFunctions.scala Co-authored-by: Gera Shegalov <gshegalov@nvidia.com> --------- Signed-off-by: Jihoon Son <ghoonson@gmail.com> Co-authored-by: Gera Shegalov <gshegalov@nvidia.com> * Disable ANSI mode for window function tests [databricks] (NVIDIA#11073) * Disable ANSI mode for window function tests. Fixes NVIDIA#11019. Window function tests fail on Spark 4.0 because of NVIDIA#5114 (and NVIDIA#5120 broadly), because spark-rapids does not support SUM, COUNT, and certain other aggregations in ANSI mode. This commit disables ANSI mode tests for the failing window function tests. These may be revisited, once error/overflow checking is available for ANSI mode in spark-rapids. Signed-off-by: MithunR <mithunr@nvidia.com> * Switch from @ansi_mode_disabled to @disable_ansi_mode. --------- Signed-off-by: MithunR <mithunr@nvidia.com> --------- Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org> Signed-off-by: Raza Jafri <rjafri@nvidia.com> Signed-off-by: Jihoon Son <ghoonson@gmail.com> Signed-off-by: MithunR <mithunr@nvidia.com> Co-authored-by: Hongbin Ma (Mahone) <mahongbin@apache.org> Co-authored-by: Raza Jafri <razajafri@users.noreply.github.com> Co-authored-by: Jihoon Son <jihoonson@apache.org> Co-authored-by: Gera Shegalov <gshegalov@nvidia.com> Co-authored-by: MithunR <mithunr@nvidia.com>
SurajAralihalli
pushed a commit
to SurajAralihalli/spark-rapids
that referenced
this issue
Jul 12, 2024
* Disable ANSI mode for window function tests. Fixes NVIDIA#11019. Window function tests fail on Spark 4.0 because of NVIDIA#5114 (and NVIDIA#5120 broadly), because spark-rapids does not support SUM, COUNT, and certain other aggregations in ANSI mode. This commit disables ANSI mode tests for the failing window function tests. These may be revisited, once error/overflow checking is available for ANSI mode in spark-rapids. Signed-off-by: MithunR <mithunr@nvidia.com> * Switch from @ansi_mode_disabled to @disable_ansi_mode. --------- Signed-off-by: MithunR <mithunr@nvidia.com>
mythrocks
added a commit
to mythrocks/spark-rapids
that referenced
this issue
Jul 18, 2024
Most of the rest are borked because of exercising aggregations like SUM, COUNT, AVG, etc. in ANSI mode. NVIDIA#5114 sees to it that these aggregations fall to CPU.
mythrocks
added a commit
that referenced
this issue
Jul 18, 2024
) * Fix hash-aggregate tests failing in ANSI mode Fixes #11018. This commit fixes the tests in `hash_aggregate_test.py` to run correctly when run with ANSI enabled. This is essential for running the tests with Spark 4.0, where ANSI mode is on by default. A vast majority of the tests here happen to exercise aggregations like `SUM`, `COUNT`, `AVG`, etc. which fall to CPU, on account of #5114. These tests have been marked with `@disable_ansi_mode`, so that they run to completion correctly. These may be revisited after #5114 has been addressed. In cases where #5114 does not apply, the tests have been modified to run with ANSI on and off. --------- Signed-off-by: MithunR <mithunr@nvidia.com>
mythrocks
added a commit
to mythrocks/spark-rapids
that referenced
this issue
Sep 27, 2024
Fixes NVIDIA#11015. Contributes to NVIDIA#11004. This commit addresses the tests that fail in parquet_test.py, when run on Spark 4. 1. Some of the tests were failing as a result of NVIDIA#5114. Those tests have been disabled, at least until we get around to supporting aggregations with ANSI mode enabled. 2. `test_parquet_check_schema_compatibility` fails on Spark 4 regardless of ANSI mode, because it tests implicit type promotions where the read schema includes wider columns than the write schema. This will require new code. The test is disabled until NVIDIA#11512 is addressed. 3. `test_parquet_int32_downcast` had an erroneous setup phase that fails in ANSI mode. This has been corrected. The test was refactored to run in ANSI and non-ANSI mode. Signed-off-by: MithunR <mithunr@nvidia.com>
mythrocks
added a commit
that referenced
this issue
Oct 8, 2024
* Spark 4: Fix parquet_test.py. Fixes #11015. (Spark 4 failure.) Also fixes #11531. (Databricks 14.3 failure.) Contributes to #11004. This commit addresses the tests that fail in parquet_test.py, when run on Spark 4. 1. Some of the tests were failing as a result of #5114. Those tests have been disabled, at least until we get around to supporting aggregations with ANSI mode enabled. 2. `test_parquet_check_schema_compatibility` fails on Spark 4 regardless of ANSI mode, because it tests implicit type promotions where the read schema includes wider columns than the write schema. This will require new code. The test is disabled until #11512 is addressed. 3. `test_parquet_int32_downcast` had an erroneous setup phase that fails in ANSI mode. This has been corrected. The test was refactored to run in ANSI and non-ANSI mode. Signed-off-by: MithunR <mithunr@nvidia.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is your feature request related to a problem? Please describe.
We are currently falling back to CPU for aggregates if ANSI mode is enabled - #3597 .
This issue to track enabling of aggregates in ANSI mode.
While working on this, we have to look into different versions of Spark i.e 3.1, 3.2 etc to make sure we enable types in those versions only.
For example: sum(apache/spark@12abfe7917) and average(apache/spark@8dc455bba8) for interval types was added in Spark-3.2
The text was updated successfully, but these errors were encountered: