Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle empty child columns in row_bit_count() #8791

Merged

Conversation

mythrocks
Copy link
Contributor

@mythrocks mythrocks commented Jul 20, 2021

Fixes #8775.
Addresses failures seen in NVIDIA/spark-rapids/issues/2723.

row_bit_count() handles string and list inputs by computing the sizes of the offsets, and that of the elements in the underlying child column.
For cases where the child column is empty (e.g. where the input string/list column contains only nulls), row_bit_count() erroneously attempts to read the contents of the empty offsets and child.data(), leading to bad reads and crashes.

This commit allows row_bit_count() to identify empty child row spans as having 0 size. It also correctly handles empty child columns.

`row_bit_count()` handles string and list inputs by computing the sizes
of the offsets, and that of the elements in the underlying child
column.
For cases where the child column is empty (e.g. where the input string/
list column contains only nulls), row_bit_count() erroneously attempts
to read the contents of the empty `offsets` and `child.data()`, leading
to bad reads and crashes.
This commit allows `row_bit_count()` to identify empty child row spans
as having `0` size. It also correctly handles empty child columns.
@mythrocks mythrocks added bug Something isn't working Spark Functionality that helps Spark RAPIDS non-breaking Non-breaking change labels Jul 20, 2021
@mythrocks mythrocks requested a review from a team as a code owner July 20, 2021 05:02
@mythrocks mythrocks self-assigned this Jul 20, 2021
@mythrocks mythrocks requested a review from ttnghia July 20, 2021 05:02
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jul 20, 2021
@codecov
Copy link

codecov bot commented Jul 20, 2021

Codecov Report

Merging #8791 (f500c45) into branch-21.08 (a770589) will decrease coverage by 0.34%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-21.08    #8791      +/-   ##
================================================
- Coverage         10.50%   10.15%   -0.35%     
================================================
  Files               116      116              
  Lines             18573    19623    +1050     
================================================
+ Hits               1951     1993      +42     
- Misses            16622    17630    +1008     
Impacted Files Coverage Δ
python/cudf/cudf/io/hdf.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/orc.py 0.00% <0.00%> (ø)
python/cudf/cudf/_version.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/abc.py 0.00% <0.00%> (ø)
python/cudf/cudf/api/types.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/dlpack.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/index.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/feather.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/parquet.py 0.00% <0.00%> (ø)
... and 45 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a770589...f500c45. Read the comment docs.

cpp/tests/transform/row_bit_count_test.cu Outdated Show resolved Hide resolved
cpp/tests/transform/row_bit_count_test.cu Outdated Show resolved Hide resolved
cpp/tests/transform/row_bit_count_test.cu Outdated Show resolved Hide resolved
cpp/tests/transform/row_bit_count_test.cu Outdated Show resolved Hide resolved
cpp/tests/transform/row_bit_count_test.cu Outdated Show resolved Hide resolved
@mythrocks mythrocks requested a review from ttnghia July 20, 2021 19:08
@mythrocks
Copy link
Contributor Author

rerun tests

@mythrocks
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 7d14334 into rapidsai:branch-21.08 Jul 21, 2021
@mythrocks
Copy link
Contributor Author

mythrocks commented Jul 21, 2021

Thanks for the reviews, chaps folks. I just merged this.

@mythrocks mythrocks deleted the row-bit-count-empty-child-columns branch July 21, 2021 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Invalid __global__ read in row_bit_count()
3 participants