Define C API and implement induced subgraph #2854

ChuckHastings · 2022-10-26T18:39:33Z

Defines the C API and provides implementation of induced subgraph.

Closes #2530
Closes #2532
Closes #2533

naimnv · 2022-10-27T13:50:54Z

cpp/src/detail/collect_local_vertex_values.cu

+ d_vertices.begin(),
+ [local_vertex_first] __device__(vertex_t v) { return v - local_vertex_first; });
+
+ d_local_values.resize(local_vertex_last - local_vertex_first, handle.get_stream());


Not a major issue, but
static_cast<size_t>(thrust::distance(local_vertex_last, local_vertex_first) can be used?

I could, but these are integer types (vertex_t) so subtraction feels more natural to me in this case.

Yeah... thrust::distance is for iterators not for scalars.

My bad, they are not pointer/iterator here.

seunghwak

Looks good in overall but I have few suggestions.

And I am still thinking about improving expand_sparse_offsets, so expect few more comments

seunghwak · 2022-10-26T20:02:48Z

cpp/src/detail/collect_local_vertex_values.cu

+
+namespace cugraph {
+namespace detail {
+


a short description (documentation) of this function will be helpful.

Here, d_vertices should cover the entire set of vertices (assigned to this GPU in multi-GPU), right?

This is the implementation file. Documentation for this function is included in the header cugraph/detail/shuffle_wrappers.hpp.

seunghwak · 2022-10-26T20:27:52Z

cpp/src/structure/induced_subgraph_impl.cuh

+ // returns thrust::nullopt if the destination vertex has
+ // a property of 0, return the edge if the destination
+ // vertex has a property of 1
+ //


This comments looks out-dated.

Copy/paste error, corrected in next push.

seunghwak · 2022-10-26T20:28:40Z

cpp/src/structure/induced_subgraph_impl.cuh

+ // vertex has a property of 1
+ //
+ return_type __device__ operator()(
+ thrust::tuple<vertex_t, size_t> src, vertex_t dst, weight_t wgt, property_t sv, property_t dv)


src=>tagged_src might be clearer.

Fixed in next push.

seunghwak · 2022-10-26T20:29:14Z

cpp/src/structure/induced_subgraph_impl.cuh

+ // returns thrust::nullopt if the destination vertex has
+ // a property of 0, return the edge if the destination
+ // vertex has a property of 1
+ //


This comments looks out-dated.

Fixed in next push.

seunghwak · 2022-10-26T20:29:30Z

cpp/src/structure/induced_subgraph_impl.cuh

+ // a property of 0, return the edge if the destination
+ // vertex has a property of 1
+ //
+ return_type __device__ operator()(thrust::tuple<vertex_t, size_t> src,


src=>tagged_src might be clearer.

Fixed in next push

seunghwak · 2022-10-26T20:39:02Z

cpp/src/structure/induced_subgraph_impl.cuh

+ handle.get_stream());
+ thrust::sort(handle.get_thrust_policy(),
+ thrust::make_zip_iterator(graph_ids_v.begin(), dst_subgraph_vertices_v.begin()),
+ thrust::make_zip_iterator(graph_ids_v.end(), dst_subgraph_vertices_v.end()));


This copy might be unnecessary.

We may do something like the following.

raft::device_span<size_t const> dst_subgraph_offsets{}; raft::device_span<vertex_t const> dst_subgraph_vertices{}; std::conditional_t<multi_gpu, rmm::device_uvector<size_t>, byte_t /* dummy */> tmp_dst_subgraph_offsets{}; std::conditional_t<multi_gpu, rmm::device_uvector<vertex_t>, byte_t /* dummy */> tmp_dst_subgraph_vertices{}; if constexpr (multi_gpu) { // update tmp_dst_subgraph_offsets|vertices as before; dst_subgraph_offsets = raft::device_span<size_t const>(tmp_dst_subgraph_offsets.data(), tmp_dst_subgraph_offsets.size()); dst_subgraph_vertices = raft::device_span<vertex_t const>(tmp_dst_subgraph_vertices.data(), tmp_dst_subgraph_vertices.size()); } else { dst_subgraph_offsets = raft::device_span<size_t const>(subgraph_offsets.data(), subgraph_offsets.size()); dst_subgraph_vertices = raft::device_span<vertex_t const>(subgraph_vertices.data(), subgraph_vertices.size()); }

If we do this, the above deleted input parameter check is still relevant.

And group_ids_v is relevant only in the mutli-GPU pass, so it's declaration may better be moved inside the if constexpr (multi_gpu) {}

I started with something like your suggestion. I could go back to that, if we prefer that approach.

During testing, I discovered that the MG path does not require the input subgraph specifications to be sorted, because we have to sort the data after we shuffle. By adding this copy/sort we can actually drop the requirement that the input be pre-sorted. I dropped the check from the expensive checks - although I just observed that I did not update the documentation in graph_functions.hpp.

If we think that this extra memory/sorting cost isn't worth it in SG, I can go back to requiring the input to be sorted and just use the provided input in SG.

Based on the work below, I think the graph_ids_v will still be required for populating the frontier.

I see, I will let you make the call. This might be more of a strategic decision than purely technical. This really depends on whether we want to sacrifice little performance/memory footprint of SG path for better usability in both SG & MG paths.

I took out the requirement (from the documentation) that the data be sorted and made both paths sort. We can always revisit this later if we decide the sorting overhead needs to be removed in the SG case.

seunghwak · 2022-10-26T20:44:57Z

cpp/src/structure/induced_subgraph_impl.cuh

- // fill the edge list buffer (to be returned) for each vetex in the aggregate subgraph vertex
- // list (use the offsets computed in the Phase 1)
- thrust::for_each(
+ // FIXME: Shouldn't there be a dummy property equivalent here?


Yes, see https://github.com/rapidsai/cugraph/pull/2843/files#diff-b2a35a44490799651af1cd026a5fa1f862d8fa1889531767b30fec31c924802bR101

Need to do something similar.

Thanks for the reference. I guess I was just missing the .view(). Fixed in next push.

seunghwak · 2022-10-26T20:46:06Z

cpp/src/structure/induced_subgraph_impl.cuh

+ graph_view,
+ vertex_frontier.bucket(0),
+ // edge_src_dummy_property_t{},
+ // edge_dst_dummy_property_t{},


edge_src_dummy_property_t{}.view()

seunghwak · 2022-10-27T17:58:51Z

cpp/src/structure/induced_subgraph_impl.cuh

- : std::nullopt;
+ vertex_frontier.bucket(0).insert(pair_begin + h_subgraph_offsets[bucket_idx],
+ pair_begin + h_subgraph_offsets[bucket_idx + 1]);
+ });


A concern with this code is that, if we are extracting thousands of subgraphs, this will create a long chain of kernel launches with little work in a single kernel.

Actually, we can use the same mechanism to create graph_ids_v above, and create a zip iterator of subgraph vertices and graph indices, insert everything in a single kernel call.

I can do that and defer releasing the graph_ids_v memory until after we've populated the frontier. Will look at that for the next push.

seunghwak · 2022-10-27T18:18:37Z

cpp/src/structure/detail/structure_utils.cuh

+template <typename vertex_t, typename edge_t>
+rmm::device_uvector<vertex_t> expand_sparse_offsets(raft::device_span<edge_t const> offsets,
+ vertex_t base_vertex_id,
+ rmm::cuda_stream_view stream_view)


Can we use typename vertex_t here?

auto graph_ids_v = cugraph::detail::expand_sparse_offsets( raft::device_span<size_t const>(d_subgraph_edge_offsets.data(), d_subgraph_edge_offsets.size())); d_subgraph_edge_offsets.size()), vertex_t{0}, handle_->get_stream());

Here, d_subgraph_edge_offsets.size() can theoretically overflow vertex_t, right? # subgraphs to extract can be more than the number of vertices in the graph.

And offsets.back() isn't really the number of edges in this context, but the total number of vertices in all the subgraphs, and this can overflow edge_t as well (may not happen in most cases, but we really can't say this will never happen).

Should we better use either idx_t or size_t for vertex_t and offset_t for edge_t?

This is actually just a label in the templated function... I used vertex_t and edge_t out of habit more than anything else. idx_t and offset_t would definitely be better labels for this function.

Changed in the next push. Also updated some variable names.

seunghwak · 2022-10-27T18:22:29Z

cpp/src/structure/detail/structure_utils.cuh

+{
+ edge_t num_edges{0};
+
+ if (offsets.size() > 0) {


So, offsets.size() should be at least 0 + 1, so I guess we'd better assume offsets.size() > 0? (add assert(offsetes.size() > 0?)

There was - during my debugging - a case in MG where on some of the partitions the input was empty, I added this check to make sure that didn't result in a crash. I will change it to an assert.

Even if the input is empty (i.e. # subgraphs is 0), the offsets.size() should be 0 + 1 = 1 and > 0, right? In this case offsets should be [0] if I am not mistaken.

If it is specified correctly :-) . I believe it was a bug in my test code which results in the error.

seunghwak · 2022-10-27T18:26:38Z

cpp/src/structure/detail/structure_utils.cuh

+ offsets.begin() + 1,
+ offsets.end(),
+ [d_vertices = vertices.data(), n_vertices = vertices.size()] __device__(auto offset) {
+ if (offset < n_vertices) { atomicAdd(&d_vertices[offset], vertex_t{1}); }


Shouldn't this (offset < n_vertices) be always true? I guess this check is unnecessary.

If the last subgraph list is empty we end up with multiple entries in the array where offset == n_vertices. This check is to catch that condition and not update out-of-bounds in the vertices array.

Oh, yes, you're right.

ChuckHastings · 2022-11-01T22:51:29Z

rerun tests

ChuckHastings · 2022-11-02T02:19:47Z

rerun tests

ChuckHastings · 2022-11-02T22:45:24Z

rerun tests

codecov-commenter · 2022-11-03T01:13:17Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.12@856e3ba). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files

@@               Coverage Diff               @@
##             branch-22.12    #2854   +/-   ##
===============================================
  Coverage                ?   62.61%           
===============================================
  Files                   ?      114           
  Lines                   ?     6395           
  Branches                ?        0           
===============================================
  Hits                    ?     4004           
  Misses                  ?     2391           
  Partials                ?        0

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

ChuckHastings · 2022-11-03T01:14:52Z

@gpucibot merge

Add capability to C API to create an SG graph from an existing CSR. Dependent on PR #2854 Closes #2508 Authors: - Chuck Hastings (https://github.com/ChuckHastings) Approvers: - Seunghwa Kang (https://github.com/seunghwak) - Joseph Nke (https://github.com/jnke2016) - Naim (https://github.com/naimnv) - Rick Ratzel (https://github.com/rlratzel) URL: #2856

ChuckHastings added 4 commits October 18, 2022 14:52

first cut at C API definition

73f8157

finish induced subgraph debugging and testing

fa57df1

separate some modules to improve compile speed

96aa935

remove obsolete code

2b7f16e

ChuckHastings requested review from a team as code owners October 26, 2022 18:39

ChuckHastings self-assigned this Oct 26, 2022

ChuckHastings added 3 - Ready for Review improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Oct 26, 2022

ChuckHastings added this to the 22.12 milestone Oct 26, 2022

ChuckHastings requested review from seunghwak, jnke2016 and naimnv October 26, 2022 18:41

naimnv approved these changes Oct 27, 2022

View reviewed changes

ChuckHastings mentioned this pull request Oct 27, 2022

CAPI create graph from CSR #2856

Merged

seunghwak reviewed Oct 27, 2022

View reviewed changes

ChuckHastings added 2 commits October 27, 2022 12:28

respond to PR comments

1b6274b

debugged latest change

758ff02

seunghwak approved these changes Oct 27, 2022

View reviewed changes

ChuckHastings mentioned this pull request Oct 31, 2022

Define and implement C/C++ for MNMG Egonet #2864

Merged

jnke2016 approved these changes Nov 1, 2022

View reviewed changes

jnke2016 mentioned this pull request Nov 2, 2022

Update egonet implementation #2874

Merged

rapids-bot bot merged commit cc2048d into rapidsai:branch-22.12 Nov 3, 2022

ChuckHastings deleted the define_capi_induced_subgraph branch December 2, 2022 18:35


		namespace cugraph {
		namespace detail {

Define C API and implement induced subgraph #2854

Define C API and implement induced subgraph #2854

Conversation

ChuckHastings commented Oct 26, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seunghwak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChuckHastings commented Nov 1, 2022

ChuckHastings commented Nov 2, 2022

ChuckHastings commented Nov 2, 2022

codecov-commenter commented Nov 3, 2022

Codecov Report

ChuckHastings commented Nov 3, 2022