Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Segmented Min/Max Reduction on String Type #10447

Merged
merged 25 commits into from
Apr 29, 2022

Conversation

isVoid
Copy link
Contributor

@isVoid isVoid commented Mar 16, 2022

This PR adds min/max segmented reduction to string type.

Part of #10417

@isVoid isVoid added feature request New feature or request 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. labels Mar 16, 2022
@isVoid isVoid changed the title Add Fixed Point Support to Segmented Reduction Support Min/Max to String, Decimal Type in Segmented Reduction Mar 17, 2022
@isVoid isVoid changed the base branch from branch-22.04 to branch-22.06 March 21, 2022 20:14
@isVoid isVoid changed the title Support Min/Max to String, Decimal Type in Segmented Reduction Support Segmented Min/Max Reduction String Type Apr 11, 2022
@isVoid isVoid changed the title Support Segmented Min/Max Reduction String Type Support Segmented Min/Max Reduction on String Type Apr 11, 2022
isVoid and others added 4 commits April 11, 2022 12:37
Co-authored-by: Bradley Dice <bdice@bradleydice.com>
Co-authored-by: Bradley Dice <bdice@bradleydice.com>
@bdice bdice added 3 - Ready for Review Ready for review by team non-breaking Non-breaking change and removed 2 - In Progress Currently a work in progress labels Apr 11, 2022
@isVoid isVoid removed the 3 - Ready for Review Ready for review by team label Apr 11, 2022
@isVoid isVoid requested review from vuule and nvdbaranec April 12, 2022 00:33
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few small suggestions.

cpp/src/reductions/simple_segmented.cuh Outdated Show resolved Hide resolved
cpp/src/reductions/simple_segmented.cuh Outdated Show resolved Hide resolved
cpp/tests/reductions/segmented_reduction_tests.cpp Outdated Show resolved Hide resolved
cpp/src/reductions/simple_segmented.cuh Outdated Show resolved Hide resolved
cpp/include/cudf/detail/utilities/element_argminmax.cuh Outdated Show resolved Hide resolved
@isVoid
Copy link
Contributor Author

isVoid commented Apr 15, 2022

@vuule All these are great review comments! I wouldn't consider them as optional :)

Copy link
Contributor

@davidwendt davidwendt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

cpp/include/cudf/detail/utilities/element_argminmax.cuh Outdated Show resolved Hide resolved
cpp/include/cudf/detail/reduction.cuh Show resolved Hide resolved
Co-authored-by: David Wendt <45795991+davidwendt@users.noreply.github.com>
@bdice
Copy link
Contributor

bdice commented Apr 20, 2022

rerun tests

@bdice bdice added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Apr 26, 2022
Copy link
Contributor Author

@isVoid isVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything else looks fine to me except one below.

offsets.begin() + 1,
null_handling,
stream,
mr);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A super minor nit: we can determine if segmented_null_mask is a temporary memory based on result->null_count() and choose to use default resource v.s. supplied resource here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to ignore or punt to next one @bdice.

Copy link
Contributor

@bdice bdice Apr 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are already using mr as efficiently (minimally) as possible. If result->null_count() == 0, we return the mr-allocated bitmask. Else, we do an inplace_bitmask_and that alters the mr-allocated bitmask before returning it.

The use of mr in inplace_bitmask_and, which calls inplace_bitmask_binop, is a little misleading. The allocations there are always temporary and are not part of the return value. I think it may be possible to refactor the inplace functions to remove mr as an argument, and require the use of the default memory resource to avoid an implication that it is used for the returned data (it is not, since it acts in-place on bitmasks passed in that were from some other allocator).

Copy link
Contributor Author

@isVoid isVoid Apr 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the gather result contains null, its null_mask is allocated with mr. The inplace_bitmask_and uses this as the target null mask, in this situation, the segmented_null_mask doesn't need to use the supplied mr because this null mask is acting only as an operand in the inplace_nullmask_and, not as the result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that might be the case. The segmented_null_mask_reduction could use result->null_count() == 0 ? mr : rmm::mr::get_current_device_resource(). I'm not sure if I would recommend using a conditional mr for the sake of clarity, but it seems that it could be done.

@bdice
Copy link
Contributor

bdice commented Apr 29, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 9112907 into rapidsai:branch-22.06 Apr 29, 2022
rapids-bot bot pushed a commit that referenced this pull request May 18, 2022
This PR adds support to min/max segmented reduction to fixed point type. Together with #10447, this PR closes #10417 

Besides, this PR refactors `segmented_reduce` to accept output iterators instead of allocating the result column from within.

Authors:
  - Michael Wang (https://github.com/isVoid)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu)
  - David Wendt (https://github.com/davidwendt)

URL: #10794
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants