Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add overloads for row comparator that automatically handle comparisons between rows of a table with a scalar #10892

Open
ttnghia opened this issue May 19, 2022 · 6 comments
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.

Comments

@ttnghia
Copy link
Contributor

ttnghia commented May 19, 2022

Sometimes, we want to compare rows of a table with a single element given as a scalar. In order to do that, at the caller site, we have to convert the scalar into a column of one row, then convert that column into a table, then pass both tables into the comparator. This is repetitive and tedious.

We should add overloads for the row comparator that can automatically do this. The caller just needs to pass in the input table and input scalar.

@ttnghia ttnghia added feature request New feature or request Needs Triage Need team to review and classify labels May 19, 2022
@ttnghia
Copy link
Contributor Author

ttnghia commented May 19, 2022

Reference: #10883 (comment)

@bdice bdice added the libcudf Affects libcudf (C++/CUDA) code. label May 19, 2022
@ttnghia
Copy link
Contributor Author

ttnghia commented May 19, 2022

This issue depends on #10893

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@GregoryKimball GregoryKimball removed the Needs Triage Need team to review and classify label Jun 29, 2022
@bdice
Copy link
Contributor

bdice commented Aug 16, 2022

This should be worked on. Without a comparator for tables/columns and scalars, lifetime management of a column derived from the scalar gets very awkward.

// If the input search key is a (nested type) scalar, a new column is materialized from that
// scalar before a `table_view` is generated from it. As such, the new created column will also be
// returned to keep the result `table_view` valid.
[[maybe_unused]] auto const [keys_tview, unused_column] =
[&]() -> std::pair<table_view, std::unique_ptr<column>> {
if constexpr (std::is_same_v<SearchKeyType, cudf::scalar>) {
auto tmp_column = make_column_from_scalar(search_keys, 1, stream);
auto const keys_tview = tmp_column->view();
return {table_view{{keys_tview}}, std::move(tmp_column)};
} else {
return {table_view{{search_keys}}, nullptr};
}
}();
auto const child_tview = table_view{{child}};
auto const has_nulls = has_nested_nulls(child_tview) || has_nested_nulls(keys_tview);
auto const comparator =
cudf::experimental::row::equality::two_table_comparator(child_tview, keys_tview, stream);
auto const d_comp = comparator.equal_to(nullate::DYNAMIC{has_nulls});

I commented here:
https://github.com/rapidsai/cudf/pull/10548/files#r946174668

Upon further investigation: the current state is quite yucky, with lifetime management for one code path but not the other. We should be able to handle this a different way, by making a comparator that accepts a scalar, owns the temporary column, and accepts one index (for the column/table input). Centralizing lifetime management in the comparator has been part of our existing designs, so that's probably good to continue.
There aren't a lot of alternatives I can see without improving this at the comparator level, but it's pretty gross in the current state.

@ttnghia
Copy link
Contributor Author

ttnghia commented Aug 16, 2022

Note that this issue is relevant not just for equality comparison but for lexicographic comparison. Usage of that case can be binary searching or many other comparison cases involving scalar.

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

No branches or pull requests

4 participants