nx-cugraph: add weakly connected components #4071

eriknw · 2023-12-28T11:19:12Z

This doesn't currently work, because plc.weakly_connected_components only works on symmetric graphs (so it's not actually performing wcc now is it?):

RuntimeError: non-success value returned from cugraph_weakly_connected_components: CUGRAPH_UNKNOWN_ERROR cuGraph failure at file=[...]/cugraph/cpp/src/components/weakly_connected_components_impl.cuh line=283: Invalid input argument: input graph should be symmetric for weakly connected components.

These are high-priority algorithms for nx-cugraph, because they are widely used by networkx dependents.

ChuckHastings · 2024-01-03T23:10:53Z

So, I suppose this goes to your definition of weakly connected components. Networkx defines this as you describe:

components is what we have implemented and finds the connected components of an undirected graph
weakly_connected_components operates only on a directed graph but computes as if the graph were undirected
strongly_connected_components operates only on a directed graph and actively considers direction

A quick scan of the literature suggests that while this is perhaps a good labeling (matches Knuth, always a plus in my mind), that it's hardly universal.

Our implementation at the C++ level computes components as defined by networkx. In order to get weakly_connected_components as defined by networkx you would need to symmetrize the edge list. This will give you exactly the correct answer. If you look at the networkx implementation, all it does is traverse the incoming and outgoing edges instead of just one direction, by symmetrizing the graph we get all of the edges going in both directions so we can do it in one pass.

We don't really have an efficient way of traversing the edges backwards from whatever orientation that we have. Our CSR/DCSR representation only efficiently reflects one direction.

We have several options we can pursue:

You could symmetrize the input edge list prior to creating the graph (less work for me, but probably not a great long term solution)
We have a C++ function that will do this efficiently. We could expose this function via the C API
We could add a parameter to the graph construction to let the graph construction call know that we should symmetrize the input prior to constructing the graph.
We could add logic to the C API that would symmetrize the graph locally for just the WCC call. If you were to call our WCC algorithm with a directed graph we would execute this logic. The downside of this is that we would double the memory used, since we would presumably have to maintain the unsymmetrized graph as well as the symmetrized graph. This seems like a bad option.

I'm leaning toward option 3, but am open to other options.

eriknw · 2024-01-05T14:09:37Z

Thanks for the thoughtful and helpful reply @ChuckHastings!

I went ahead with option 1 where we symmetrize in Python before creating the PLC graph. This is probably good enough for a while. Nevertheless, I did it in a way that will let us easily switch to options 2 or 3 if/when available.

I chose the keyword argument symmetrize="union" (used by WCC) and symmetrize="intersection" (will probably use for another algorithm soon).

Related topic: is SCC on your radar to do any time soon? I'm less familiar with SCC algorithms and literature, but maybe https://doi.org/10.14778/2733085.2733089

ChuckHastings · 2024-01-05T16:22:21Z

Related topic: is SCC on your radar to do any time soon? I'm less familiar with SCC algorithms and literature, but maybe https://doi.org/10.14778/2733085.2733089

We have a low priority activity exploring SCC. Constructing a good algorithm for the GPU is complicated as the best serial algorithms use depth-first search which doesn't parallelize well. I wouldn't expect a good implementation this year unless we find some customer that urgently wants it.

rlratzel

LGTM. I did point out something this PR might need from this PR that might make it worth waiting for and updating.

python/nx-cugraph/nx_cugraph/algorithms/components/connected.py

rlratzel

LGTM. I had one question below which need not hold up approval.

rlratzel · 2024-01-16T21:03:59Z

python/nx-cugraph/nx_cugraph/algorithms/components/strongly_connected.py

+def number_strongly_connected_components(G):
+    G = _to_directed_graph(G)
+    if G.src_indices.size == 0:
+        return len(G)


just curious, no action needed: is this expected to always return 0 in this case? If so, is there a reason calling len() is preferred over just returning 0?

Maybe I should use G.number_of_edges() instead of G.src_indices.size (but for some reason the latter is easier for me to remember and reason about). Anyway, if the number of edges are zero, the the number of components is the number of nodes, hence we can't simply return 0.

I may update to use number_of_edges lots of places for clarity in a different PR. I agree this shouldn't hold up this PR.

Oh I see, number_of_edges actually does a lot more work. If we want to know if there are exactly 0 edges, G.src_indices.size works great.

rlratzel · 2024-01-16T22:10:26Z

/merge

NetworkX tests are somewhat underspecified regarding how to handle self-loops for these algorithms. Also, I'm not sure if transitivity is supposed to work on directed graphs. Once #4071 is merged, it should be easy to add `is_bipartite` function (and maybe others?). Authors: - Erik Welch (https://github.com/eriknw) Approvers: - Rick Ratzel (https://github.com/rlratzel) URL: #4093

nx-cugraph: add weakly connected components (PLC needs updated!)

1879f63

eriknw added DO NOT MERGE Hold off on merging; see PR for details python labels Dec 28, 2023

eriknw requested a review from a team as a code owner December 28, 2023 11:19

eriknw added the improvement Improvement / enhancement to an existing function label Dec 28, 2023

Update WCC to symmetrize before creating PLC graph

c1d1a40

eriknw removed the DO NOT MERGE Hold off on merging; see PR for details label Jan 5, 2024

Merge branch 'branch-24.02' into wcc

bdf66dc

eriknw changed the title ~~nx-cugraph: add weakly connected components (PLC needs updated!)~~ nx-cugraph: add weakly connected components Jan 5, 2024

rlratzel approved these changes Jan 8, 2024

View reviewed changes

python/nx-cugraph/nx_cugraph/algorithms/components/connected.py Outdated Show resolved Hide resolved

rlratzel assigned eriknw Jan 8, 2024

rlratzel added the non-breaking Non-breaking change label Jan 8, 2024

eriknw added 4 commits January 11, 2024 15:20

Merge branch 'branch-24.02' into wcc

be2eb80

Add version_added to wcc algos

3754094

Merge branch 'branch-24.02' into wcc

9f3cd27

Add strongly connected components (via legacy API)

d2d61e5

This was referenced Jan 15, 2024

nx-cugraph: PLC now handles isolated nodes; clean up our workarounds #4092

Merged

nx-cugraph: add triangles and clustering algorithms #4093

Merged

Merge branch 'branch-24.02' into wcc

213610b

rlratzel approved these changes Jan 16, 2024

View reviewed changes

rapids-bot bot merged commit 8672534 into rapidsai:branch-24.02 Jan 17, 2024
98 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nx-cugraph: add weakly connected components #4071

nx-cugraph: add weakly connected components #4071

eriknw commented Dec 28, 2023 •

edited

Loading

ChuckHastings commented Jan 3, 2024

eriknw commented Jan 5, 2024

ChuckHastings commented Jan 5, 2024

rlratzel left a comment

rlratzel left a comment

rlratzel Jan 16, 2024

eriknw Jan 16, 2024

eriknw Jan 16, 2024

eriknw Jan 17, 2024

rlratzel commented Jan 16, 2024

nx-cugraph: add weakly connected components #4071

nx-cugraph: add weakly connected components #4071

Conversation

eriknw commented Dec 28, 2023 • edited Loading

ChuckHastings commented Jan 3, 2024

eriknw commented Jan 5, 2024

ChuckHastings commented Jan 5, 2024

rlratzel left a comment

Choose a reason for hiding this comment

rlratzel left a comment

Choose a reason for hiding this comment

rlratzel Jan 16, 2024

Choose a reason for hiding this comment

eriknw Jan 16, 2024

Choose a reason for hiding this comment

eriknw Jan 16, 2024

Choose a reason for hiding this comment

eriknw Jan 17, 2024

Choose a reason for hiding this comment

rlratzel commented Jan 16, 2024

eriknw commented Dec 28, 2023 •

edited

Loading