Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the aggregate API [databricks] #3910

Merged
merged 6 commits into from
Oct 29, 2021

Conversation

abellina
Copy link
Collaborator

Supersedes #3833
Closes #3194
Closes #3442

This refactor of the aggregate code does the following:

  1. Simplify setupReferences to follow what Spark does. Spark has an inputAttributes val that they pass to the iterators that handles the special cases we were handling in our code, so I've replaced the (rather large) special casing in setupReferences with Spark's approach, added a couple of code references in the code to make it easier to follow.

  2. Refactors computeAggregate by treating the aggregate component, reduction or group by the same. There's always a pre and post step around the cuDF aggregate. An AggHelper class was introduced to encapsulate and break up the code in computeAggregate.

  3. The aggregate functions themselves changed. CudfAggregate is no longer an expression. The invocation of cuDF aggregates is simply done by ordinals (as it was done before, but now the ordinals come from the AggHelper and are handled the same for reductions and group by. The attribute references were cleaned to try and address @revans2's concerns called out here: Remove the special reduction cast in hash aggregate #3833 (comment).

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
@abellina abellina changed the title Refactor the aggregate API Refactor the aggregate API [databricks] Oct 25, 2021
@abellina
Copy link
Collaborator Author

build

@sameerz sameerz added the task Work required that improves the product but is not user facing label Oct 25, 2021
@abellina abellina added this to the Oct 18 - Oct 29 milestone Oct 25, 2021
@abellina
Copy link
Collaborator Author

Build failure appears (?) unrelated:

11:20:03  [ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.8:run (create-parallel-world) on project rapids-4-spark_2.12: An Ant BuildException has occured: exec returned: 255
11:20:03  [ERROR] around Ant part ...<exec failonerror="true" dir="/home/jenkins/agent/workspace/jenkins-rapids_premerge-github-3066/dist/target" executable="/home/jenkins/agent/workspace/jenkins-rapids_premerge-github-3066/dist/scripts/binary-dedupe.sh"/>... @ 78:222 in /home/jenkins/agent/workspace/jenkins-rapids_premerge-github-3066/dist/target/antrun/build-main.xml

Investigating..

@abellina
Copy link
Collaborator Author

Filed this one to track: #3911

@abellina
Copy link
Collaborator Author

build

Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% done yet, but overall it is looking good.

@abellina
Copy link
Collaborator Author

Thanks @revans2 I believe I have addressed your comments.

@abellina
Copy link
Collaborator Author

build

@abellina abellina merged commit 0d88779 into NVIDIA:branch-21.12 Oct 29, 2021
@abellina abellina deleted the agg/agg_function_refactor branch October 29, 2021 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] simplify the casting in hash aggregate [FEA] Simplify the aggregation logic
3 participants