-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle empty child columns in row_bit_count() #8791
Handle empty child columns in row_bit_count() #8791
Conversation
`row_bit_count()` handles string and list inputs by computing the sizes of the offsets, and that of the elements in the underlying child column. For cases where the child column is empty (e.g. where the input string/ list column contains only nulls), row_bit_count() erroneously attempts to read the contents of the empty `offsets` and `child.data()`, leading to bad reads and crashes. This commit allows `row_bit_count()` to identify empty child row spans as having `0` size. It also correctly handles empty child columns.
Codecov Report
@@ Coverage Diff @@
## branch-21.08 #8791 +/- ##
================================================
- Coverage 10.50% 10.15% -0.35%
================================================
Files 116 116
Lines 18573 19623 +1050
================================================
+ Hits 1951 1993 +42
- Misses 16622 17630 +1008
Continue to review full report at Codecov.
|
rerun tests |
@gpucibot merge |
Thanks for the reviews, |
Fixes #8775.
Addresses failures seen in NVIDIA/spark-rapids/issues/2723.
row_bit_count()
handles string and list inputs by computing the sizes of the offsets, and that of the elements in the underlying child column.For cases where the child column is empty (e.g. where the input string/list column contains only nulls), row_bit_count() erroneously attempts to read the contents of the empty
offsets
andchild.data()
, leading to bad reads and crashes.This commit allows
row_bit_count()
to identify empty child row spans as having0
size. It also correctly handles empty child columns.