-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support non-literal index for GpuElementAt
and GpuGetArrayItem
[databricks]
#4858
Conversation
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
build |
blossom is down, will trigger pre-merge again once it is back. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nits otherwise lgtm.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/BoolUtils.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/collectionOperations.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some nits that I could live without
} | ||
withResource(hasLargerIndices) { _ => | ||
if(BoolUtils.isAnyValidTrue(hasLargerIndices)) { | ||
throw new ArrayIndexOutOfBoundsException("Some indices are out of bound") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not using the RapidsErrorUtils. Is it worth finding the first entry that does not match after this point and getting the exact same error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be OK since we are on the way to throw an exception to abort.
The main drawback I can think of is GPU memory, one is we need to hold the 'numElements' column until the whole check is done.
// Check if any index is out of bound only when ansi mode is enabled. | ||
// No exception should be raised if no valid entry (An entry is valid when both | ||
// the array row and its index are not null), the same with what Spark does. | ||
if(hasNegativeIndices && array.getNullCount != array.getRowCount) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: in general I think we want to have a space after the if
. There are several places where this is not happening, and what is more it is inconsistent in the patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
if(isAnyValidTrue(hasLargerIndicesCV)) { | ||
// No need to check the validity of array column here, since the validity info | ||
// is included in this `hasLargerIndicesCV`. | ||
throw new ArrayIndexOutOfBoundsException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here is it worth it to make this the same as in both code paths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add it.
@@ -906,6 +906,7 @@ def gen_scalars_for_sql(data_gen, count, seed=0, force_no_nulls=False): | |||
|
|||
# Some array gens, but not all because of nesting | |||
array_gens_sample = single_level_array_gens + nested_array_gens_sample | |||
array_of_map_gen = ArrayGen(MapGen(StringGen(pattern='key_[0-9]', nullable=False), StringGen(), max_length=10), max_length=10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this name is a little generic. Nested types makes it difficult to name things well. Perhaps array_map_string_string_gen
. It is ugly but there is less ambiguity. Also if this only used in one file, or one place in the code we don't need to create it in data_gen.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, this is only used in array_test
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
build |
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
build |
build |
build |
GpuElementAt
and GpuGetArrayItem
GpuElementAt
and GpuGetArrayItem
[databricks]
build |
1 similar comment
build |
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
resolved the conflicts |
build |
LGTM |
This PR is to add the non-literal index (aka indices as a column, which can let different rows have different indices) support for
GpuElementAt
andGpuGetArrayItem
.It also updates the relevant tests for them.
close #4814
Signed-off-by: Firestarman firestarmanllc@gmail.com