Refactor the aggregate API [databricks] #3910

abellina · 2021-10-25T15:19:44Z

Supersedes #3833
Closes #3194
Closes #3442

This refactor of the aggregate code does the following:

Simplify setupReferences to follow what Spark does. Spark has an inputAttributes val that they pass to the iterators that handles the special cases we were handling in our code, so I've replaced the (rather large) special casing in setupReferences with Spark's approach, added a couple of code references in the code to make it easier to follow.
Refactors computeAggregate by treating the aggregate component, reduction or group by the same. There's always a pre and post step around the cuDF aggregate. An AggHelper class was introduced to encapsulate and break up the code in computeAggregate.
The aggregate functions themselves changed. CudfAggregate is no longer an expression. The invocation of cuDF aggregates is simply done by ordinals (as it was done before, but now the ordinals come from the AggHelper and are handled the same for reductions and group by. The attribute references were cleaned to try and address @revans2's concerns called out here: Remove the special reduction cast in hash aggregate #3833 (comment).

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>

abellina · 2021-10-25T15:21:25Z

build

abellina · 2021-10-25T16:31:12Z

Build failure appears (?) unrelated:

11:20:03  [ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.8:run (create-parallel-world) on project rapids-4-spark_2.12: An Ant BuildException has occured: exec returned: 255
11:20:03  [ERROR] around Ant part ...<exec failonerror="true" dir="/home/jenkins/agent/workspace/jenkins-rapids_premerge-github-3066/dist/target" executable="/home/jenkins/agent/workspace/jenkins-rapids_premerge-github-3066/dist/scripts/binary-dedupe.sh"/>... @ 78:222 in /home/jenkins/agent/workspace/jenkins-rapids_premerge-github-3066/dist/target/antrun/build-main.xml

Investigating..

abellina · 2021-10-25T16:55:08Z

Filed this one to track: #3911

abellina · 2021-10-25T20:44:49Z

build

revans2

Not 100% done yet, but overall it is looking good.

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/AggregateFunctions.scala

abellina · 2021-10-29T03:54:21Z

Thanks @revans2 I believe I have addressed your comments.

abellina · 2021-10-29T03:54:28Z

build

abellina added 2 commits October 25, 2021 09:00

Remove the special reduction cast in hash aggregate

3c6c62b

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>

Refactor the aggregate API

0ea0d56

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>

abellina mentioned this pull request Oct 25, 2021

Remove the special reduction cast in hash aggregate #3833

Closed

abellina changed the title ~~Refactor the aggregate API~~ Refactor the aggregate API [databricks] Oct 25, 2021

sameerz added the task Work required that improves the product but is not user facing label Oct 25, 2021

abellina added this to the Oct 18 - Oct 29 milestone Oct 25, 2021

abellina mentioned this pull request Oct 26, 2021

Add Std dev samp for windowing [databricks] #3869

Merged

revans2 reviewed Oct 27, 2021

View reviewed changes

abellina added 4 commits October 28, 2021 22:46

Address review comments

3fe1961

Remove TODOs as those came up in review

48d3424

Fix comment that I missed

258938b

Address comment on evaluateExpression

474647e

revans2 approved these changes Oct 29, 2021

View reviewed changes

abellina mentioned this pull request Oct 29, 2021

[FEA] Hash Aggregate Cleanup to make closer to spark #3

Closed

abellina merged commit 0d88779 into NVIDIA:branch-21.12 Oct 29, 2021

abellina deleted the agg/agg_function_refactor branch October 29, 2021 16:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor the aggregate API [databricks] #3910

Refactor the aggregate API [databricks] #3910

abellina commented Oct 25, 2021

abellina commented Oct 25, 2021

abellina commented Oct 25, 2021

abellina commented Oct 25, 2021

abellina commented Oct 25, 2021

revans2 left a comment

abellina commented Oct 29, 2021

abellina commented Oct 29, 2021

Refactor the aggregate API [databricks] #3910

Refactor the aggregate API [databricks] #3910

Conversation

abellina commented Oct 25, 2021

abellina commented Oct 25, 2021

abellina commented Oct 25, 2021

abellina commented Oct 25, 2021

abellina commented Oct 25, 2021

revans2 left a comment

Choose a reason for hiding this comment

abellina commented Oct 29, 2021

abellina commented Oct 29, 2021