Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support non-literal index for GpuElementAt and GpuGetArrayItem[databricks] #4858

Merged
merged 6 commits into from
Mar 6, 2022

Conversation

firestarman
Copy link
Collaborator

This PR is to add the non-literal index (aka indices as a column, which can let different rows have different indices) support for GpuElementAt and GpuGetArrayItem.

It also updates the relevant tests for them.

close #4814

Signed-off-by: Firestarman firestarmanllc@gmail.com

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

@firestarman
Copy link
Collaborator Author

blossom is down, will trigger pre-merge again once it is back.

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits otherwise lgtm.

@jlowe jlowe added this to the Feb 14 - Feb 25 milestone Feb 24, 2022
revans2
revans2 previously approved these changes Feb 24, 2022
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nits that I could live without

}
withResource(hasLargerIndices) { _ =>
if(BoolUtils.isAnyValidTrue(hasLargerIndices)) {
throw new ArrayIndexOutOfBoundsException("Some indices are out of bound")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not using the RapidsErrorUtils. Is it worth finding the first entry that does not match after this point and getting the exact same error message?

Copy link
Collaborator Author

@firestarman firestarman Feb 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be OK since we are on the way to throw an exception to abort.
The main drawback I can think of is GPU memory, one is we need to hold the 'numElements' column until the whole check is done.

// Check if any index is out of bound only when ansi mode is enabled.
// No exception should be raised if no valid entry (An entry is valid when both
// the array row and its index are not null), the same with what Spark does.
if(hasNegativeIndices && array.getNullCount != array.getRowCount) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: in general I think we want to have a space after the if. There are several places where this is not happening, and what is more it is inconsistent in the patch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

if(isAnyValidTrue(hasLargerIndicesCV)) {
// No need to check the validity of array column here, since the validity info
// is included in this `hasLargerIndicesCV`.
throw new ArrayIndexOutOfBoundsException(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here is it worth it to make this the same as in both code paths?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add it.

@@ -906,6 +906,7 @@ def gen_scalars_for_sql(data_gen, count, seed=0, force_no_nulls=False):

# Some array gens, but not all because of nesting
array_gens_sample = single_level_array_gens + nested_array_gens_sample
array_of_map_gen = ArrayGen(MapGen(StringGen(pattern='key_[0-9]', nullable=False), StringGen(), max_length=10), max_length=10)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this name is a little generic. Nested types makes it difficult to name things well. Perhaps array_map_string_string_gen. It is ugly but there is less ambiguity. Also if this only used in one file, or one place in the code we don't need to create it in data_gen.py

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, this is only used in array_test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@revans2
Copy link
Collaborator

revans2 commented Feb 24, 2022

build

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

build

jlowe
jlowe previously approved these changes Feb 25, 2022
@jlowe
Copy link
Member

jlowe commented Feb 25, 2022

build

@sameerz sameerz added the feature request New feature or request label Feb 26, 2022
@firestarman
Copy link
Collaborator Author

build

@firestarman firestarman changed the title Support non-literal index for GpuElementAt and GpuGetArrayItem Support non-literal index for GpuElementAt and GpuGetArrayItem[databricks] Feb 28, 2022
revans2
revans2 previously approved these changes Feb 28, 2022
@firestarman
Copy link
Collaborator Author

build

1 similar comment
@sameerz
Copy link
Collaborator

sameerz commented Mar 4, 2022

build

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
@firestarman
Copy link
Collaborator Author

firestarman commented Mar 5, 2022

resolved the conflicts

@firestarman
Copy link
Collaborator Author

build

@sperlingxx
Copy link
Collaborator

LGTM

@sperlingxx sperlingxx self-requested a review March 6, 2022 02:50
@firestarman firestarman merged commit 187e584 into NVIDIA:branch-22.04 Mar 6, 2022
@firestarman firestarman deleted the element_at branch March 6, 2022 02:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support element_at with non-literal index
5 participants