-
Notifications
You must be signed in to change notification settings - Fork 924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fully support nested types in cudf::contains
#10656
Fully support nested types in cudf::contains
#10656
Conversation
and keep vanilla element comparator public
This comment was marked as off-topic.
This comment was marked as off-topic.
cpp/src/search/contains_nested.cu
Outdated
null_equality::EQUAL, | ||
nan_equality::ALL_EQUAL, | ||
stream, | ||
mr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious why this detail function returns a device_uvector
instead of a column
?
Will the result ever be larger than what a column supports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That detail::contains(table_view, table_view)
function was initially refactored out of semi-join (#11100). Now, it is also used in several other places but the result is only used for temporary computation and never be returned to the user. Thus, device_uvector
is enough at this time.
It is still unclear whether we will expose this function to the public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also say that (especially for internal APIs like this, but occasionally even otherwise) I prefer using a strongly-typed object like a device_uvector
rather than the type-erased (and nullable) column
. We have more information available and convey it. Even for public APIs I like uvectors when we don't need type erasure or nullability because the returned values are clearly defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This discussion is also related to #11356.
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I have only very minor suggestions at this point. Thank you!
Co-authored-by: Bradley Dice <bdice@bradleydice.com>
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
Signed-off-by: Nghia Truong <nghiatruong.vn@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@gpucibot merge |
This extends the
cudf::contains
API to support nested types (lists + structs) with arbitrarily nested levels. As such,cudf::contains
will work with literally any type of input data.In addition, this fixes null handling of
cudf::contains
with structs column + struct scalar input when the structs column contains null rows at the top level while the scalar key is valid but all nulls at children levels.Closes: #8965
Depends on:
cudf::contains
, renaming and switching parameters role #10802cudf::contains
#10997contains
with template key types NVIDIA/cuCollections#172pair_contains
instatic_map
andstatic_multimap
NVIDIA/cuCollections#173left_semi_anti_join
,cudf::contains
, and set operations #11037