Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add in support for arrays in BroadcastNestedLoopJoinExec and CartesianProductExec #2702

Merged
merged 1 commit into from
Jun 11, 2021

Conversation

revans2
Copy link
Collaborator

@revans2 revans2 commented Jun 10, 2021

When running the profiling tool with some large applications I noticed that it would do a few big joins using either CartesianProductExec or BroadcastNestedLoopJoinExec with an array and array_contains as the conditions. So I enabled it in the check wrote some tests, and this sped it up massively.

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
@revans2 revans2 added SQL part of the SQL/Dataframe plugin task Work required that improves the product but is not user facing labels Jun 10, 2021
@revans2 revans2 added this to the June 7 - June 18 milestone Jun 10, 2021
@revans2 revans2 self-assigned this Jun 10, 2021
@revans2
Copy link
Collaborator Author

revans2 commented Jun 10, 2021

build

@revans2 revans2 changed the title Add in support for lists in BroadcastNestedLoopJoinExec and CartesianProductExec Add in support for arrays in BroadcastNestedLoopJoinExec and CartesianProductExec Jun 10, 2021
Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@revans2 revans2 merged commit 44a48ca into NVIDIA:branch-21.08 Jun 11, 2021
@revans2 revans2 deleted the arrays_in_brute_force_joins branch June 11, 2021 19:00
abellina added a commit that referenced this pull request Jun 18, 2021
abellina added a commit to abellina/spark-rapids that referenced this pull request Jun 18, 2021
This reverts commit 44a48ca.

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
abellina added a commit that referenced this pull request Jun 21, 2021
This reverts commit 44a48ca.

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
abellina added a commit to abellina/spark-rapids that referenced this pull request Jun 25, 2021
mythrocks added a commit to mythrocks/spark-rapids that referenced this pull request Jul 21, 2021
…NVIDIA#2749)"

This reverts commit b3e7c4b.

This re-enables support for lists in Broadcast Nested Loop Joins, and
Cartesian joins, as was intended in 2702.

Signed-off-by: Mithun RK <mythrocks@gmail.com>
mythrocks added a commit that referenced this pull request Jul 23, 2021
…" (#2989)

This reverts commit b3e7c4b.

This re-enables support for lists in Broadcast Nested Loop Joins, and
Cartesian joins, as was intended in 2702.

Signed-off-by: Mithun RK <mythrocks@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SQL part of the SQL/Dataframe plugin task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants