-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MG python implementation of Leiden #3566
Merged
rapids-bot
merged 186 commits into
rapidsai:branch-23.06
from
jnke2016:branch-23.06_fea-mg_leiden
May 26, 2023
Merged
Changes from all commits
Commits
Show all changes
186 commits
Select commit
Hold shift + click to select a range
7903391
Renames variables
063821a
intermediate commit: Primitives call to compute some components of Le…
6791a65
Adds leiden to louvain mapping.
1a043be
Adds leiden to louvain mapping.
fa086ab
Returns leiden to louvain mapping from refined partition
b461d0b
Return renumbering map from graph_contraction
4ac3f47
Preserve original louvain partition of refined graph
40a99f4
Merge branch 'branch-22.12' of github.com:rapidsai/cugraph into leide…
2b4f37d
Fix errors that resulted in merging new changes from 22.12
46db0ca
Add view_concat for edge_minor_property_view_t, change transform_redu…
8ef174b
Isolate refine_clustering in a separate file to make compilation fast…
43b6605
Merge branch 'branch-22.12' of github.com:rapidsai/cugraph into leide…
4f5d3ea
Use per_v_transform_reduce_dst_key_aggregated_outgoing_e with kv_stor…
5057a83
Merge branch 'rapidsai:branch-22.12' into leiden-on-22-12
naimnv ec765b8
MIS to find a non-conflicting set of moves
90b8371
Fix merge issues
4c3875f
Fix merge issues
ba9233e
Merge branch 'leiden-on-22-12' of github.com:naimnv/cugraph-forked in…
b436333
Removes duplicate code and unused functors
fb4f080
Merge branch 'rapidsai:branch-22.12' into leiden-on-22-12
naimnv 040f40e
Removes duplicate code
4770fe1
Renames function
6733030
Rename function
18cfa9c
Update leiden assignment according to MIS chosen moves
60d4c67
Change break condition to compute MIS and assign louvain partition to…
865df23
Adds MG instances of refine, mis and leiden, fixes import issues
346b60c
Adds missing SG MG specialization for lookup_primitive_values_for_key…
a8135b9
Fixes mis implementation
16fb1fc
Use env local nvcc and specific version of gcc/g++
8440d17
Initial implementation of the Leiden C API
ChuckHastings 1f638ba
Debug problem related to weight change
7fb7e10
Debug problem related to weight change
5cefd78
Debug problem related to weight change
f1f5c95
Debug problem related to weight change
b37b669
Debug problem related to weight change
ce09b7c
Debug problem related to weight change
45a295a
Debug problem related to weight change
a648f8e
Debug problem related to weight change
7a130f1
Merge branch 'branch-23.04' into define_leiden_c_api
ChuckHastings 87b9a5f
pre-commit
ChuckHastings 1f28045
Debug problem related to MIS
4455c3f
Debug problem related to MIS/refine loop
449628c
Debug problem related to MIS/refine loop
e9aefcc
Debug problem related to MIS/refine loop
73225a8
TODO: MIS implementaion needs to be rechecked
fae443e
TODO: MIS implementaion needs to be rechecked
02575df
Merge Leiden with 23.04
b501da9
Fix merge issues
f17dbb7
TODO: Fix MIS implementaion
6206c0a
Merge branch 'leiden-on-23.04' into leiden-on-22-12
baba5c7
Fix merge issue
0aa06e4
Re-implement MIS
f8b3633
Cleanup MIS code
6b42b71
TODO: Check MIS implementation
9768abe
TODO: Check MIS implementation
fa91923
TODO: Check MIS implementation
fbd0120
TODO: Check MIS implementation
b56362b
TODO: Check MIS implementation
3246d4e
TODO: Check MIS implementation
5d57768
TODO: Check MIS implementation
84b9947
TODO: Check MIS implementation
9baaa88
Change fraction of vertices slected in each iteration of MIS loop
632af8b
remove unused import
jnke2016 50f0442
remove unused import
jnke2016 ae5e6c0
Merge remote-tracking branch 'upstream/define_leiden_c_api' into bran…
jnke2016 3b6ab3e
add plc implementation of Leiden
jnke2016 ae22cdc
add python implementation of Leiden
jnke2016 b6b20d4
fix style
jnke2016 fa41d51
Debug max reduction
7f90281
Merge branch 'branch-23.04' of github.com:rapidsai/cugraph into leide…
ec7bedd
Debug max reduction
562022b
Debug MIS with int ranks
c554705
Debug MIS with int ranks
6699a8d
Debug max reduction
d57d62a
Debug MIS with int ranks
6318b3d
With a working version of MIS
3582131
fetch latest changes
jnke2016 8950e1a
drop mg leiden
jnke2016 11f8329
fix style
jnke2016 c262b1d
fix style
jnke2016 28f1471
clean code to make a PR
10be7a7
Resize and shrink device vectors in refine.cuh
7ca5696
Resize and shrink device vectors in leiden_impl.cuh
44660c8
Remove debug statements
4dc7ffe
Remove unused variable
9f1ed0c
Change variable name, modify comments, remove unused file
ad33955
Fix copyright for Leiden and maximal independent set
5baa124
Merge branch 'branch-23.04' of github.com:rapidsai/cugraph into leide…
e5f8aca
Fix Leiden PR issues
3e4fcb2
Pass value for missing parameter #2980
5074360
Merge branch 'branch-23.04' of github.com:rapidsai/cugraph into leide…
1b266c2
Merge branch 'branch-23.04' of github.com:rapidsai/cugraph into leide…
3b6f81c
Merge branch 'leiden-on-22-12' of github.com:naimnv/cugraph-forked in…
35f3472
Merge branch 'branch-23.04' of github.com:rapidsai/cugraph into leide…
ae455ab
Add missing placeholer return value #2980
52d15cb
Add missing template parameter to function call #2980
868b113
Merge remote-tracking branch 'upstream/branch-23.04' into branch-23.0…
8afcd99
Merge remote-tracking branch 'upstream/leiden-on-22-12' into branch-2…
1ea3f62
remove legacy 'leiden' and replace it with the one leveraging the CAPI
fbcb767
fix typo
5b2bc92
add more tests
0315c50
remove temporary import
b910b4d
enable leiden in C API
ChuckHastings 82c49e7
update leiden test results
c875bba
update test
38277dc
return original results if renumbering did not occur
a9418ea
fix style
41c7137
Merge remote-tracking branch 'upstream/branch-23.04' into branch-23.0…
cbf26b0
fix style
57c55cb
fix copyright
31e6adf
Unit test MG MIS
ca17c54
fetch latest change
e2b7560
fetch latest change
4caab42
fetch latest change
696d51c
update type annotation
e22fa87
update tests
dd0a286
fix style
58c36ea
Merge branch 'branch-23.06' of github.com:rapidsai/cugraph into mnmg_mis
b4e93ad
Add MG leiden test structure
6ccc2ba
Merge branch 'branch-23.06' of github.com:rapidsai/cugraph into mnmg_mis
4ad2275
collect_values_for_keys test
0f44c46
collect_values_for_keys test
e207d0b
update docstrings
6070c7a
fix style
4bd6d38
fix typo
dde35ea
Merge remote-tracking branch 'upstream/branch-23.06' into branch-23.0…
7febd79
Merge remote-tracking branch 'upstream/branch-23.06' into branch-23.0…
1fd04e5
Debug branch for MG Leiden
98bf51a
Clean mis, leiden mg test
15cb8c8
Merge remote-tracking branch 'upstream/branch-23.06' into branch-23.0…
802548c
Merge remote-tracking branch 'upstream/branch-23.06' into branch-23.0…
797823d
add mg implementation of leiden
e53df38
add MG Leiden tests
b5fabb8
Merge remote-tracking branch 'upstream/branch-23.04_fea-plc_leiden' i…
4e730e0
Clean mis, leiden mg test, debug mg leiden with random moves
7c610e9
MG Leiden and MG MIS
582eeb3
MG Leiden, MG MIS
528fd9f
MG Leiden, MG MIS
8b90bd5
Copyright fix
4471aef
Merge remote-tracking branch 'upstream/mnmg_mis' into branch-23.06_fe…
6764476
Removes debug stuffs from refine_impl.cuh
9809f42
Removes tests using local files
318357b
Remove print statement
86074ef
Remove test with lcoal file
ab7ad07
Address some PR comments
6c9aedb
Use thrust random generator inside device code
cfca807
Use thrust random generator inside device code
3d52822
enable mg leiden in the CAPI
2769233
add mg leiden
a3128a6
Use raft random gernerator instead of thrust
8e2ffe5
Add RNG state parameter to Leiden
f3356b1
Add doc string for theta
9e59a1d
Change pylibcugraph api due to chagne in c-api
810d2f4
Pull upstream changes
575bde4
Rename compute_mis to maximal_independent_set
ee7ad8a
style fix
67b33ae
Replace for_each with transform_if
6a630fa
Make the cluster ids consecutive
7a3019e
Update c_api code for leiden
e2c33ae
update future extraction
fea752d
remove outdated comment
2a0f857
update type annotation
68080b1
remove unsued import
9071f00
Merge remote-tracking branch 'upstream/mnmg_mis' into branch-23.06_fe…
c3fa04d
update plc leiden call at the python layer as it now support a random…
1f36df7
update docstrings
ca408c5
update dask 'future' extraction and add type annotation
972c514
remove unused import
a9ebb2c
fix style
e0f1217
Fix sg rng_state initialization
788517e
Fix mg leiden c-api test
b12b619
Rename sg leiden cpp test and cosmetic fix for CMakeLists.txt
afbe156
Expose theta to c and python api
22f2df6
Merge remote-tracking branch 'upstream/mnmg_mis' into branch-23.06_fe…
4737002
Merge remote-tracking branch 'upstream/branch-23.06' into branch-23.0…
837870f
fix circular import error
2b0fb14
Merge remote-tracking branch 'upstream/branch-23.06' into branch-23.0…
25a5c11
update docstrings
c2c6a04
fix style
14f04eb
update leiden tests
39a87eb
update leiden API
f6c303b
add comments
b09f410
fix style
f49c423
Merge remote-tracking branch 'upstream/branch-23.06' into branch-23.0…
aae8fb7
remove unused code
1d98ee3
Merge branch 'branch-23.06' into branch-23.06_fea-mg_leiden
BradReesWork File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,189 @@ | ||
# Copyright (c) 2022-2023, NVIDIA CORPORATION. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
from __future__ import annotations | ||
|
||
from dask.distributed import wait, default_client | ||
import cugraph.dask.comms.comms as Comms | ||
import dask_cudf | ||
import dask | ||
from dask import delayed | ||
import cudf | ||
|
||
from pylibcugraph import ResourceHandle | ||
from pylibcugraph import leiden as pylibcugraph_leiden | ||
import numpy | ||
import cupy as cp | ||
from typing import Tuple, TYPE_CHECKING | ||
|
||
if TYPE_CHECKING: | ||
from cugraph import Graph | ||
|
||
|
||
def convert_to_cudf(result: cp.ndarray) -> Tuple[cudf.DataFrame, float]: | ||
""" | ||
Creates a cudf DataFrame from cupy arrays from pylibcugraph wrapper | ||
""" | ||
cupy_vertex, cupy_partition, modularity = result | ||
df = cudf.DataFrame() | ||
df["vertex"] = cupy_vertex | ||
df["partition"] = cupy_partition | ||
|
||
return df, modularity | ||
|
||
|
||
def _call_plc_leiden( | ||
sID: bytes, | ||
mg_graph_x, | ||
max_iter: int, | ||
resolution: int, | ||
random_state: int, | ||
theta: int, | ||
do_expensive_check: bool, | ||
) -> Tuple[cp.ndarray, cp.ndarray, float]: | ||
return pylibcugraph_leiden( | ||
resource_handle=ResourceHandle(Comms.get_handle(sID).getHandle()), | ||
random_state=random_state, | ||
graph=mg_graph_x, | ||
max_level=max_iter, | ||
resolution=resolution, | ||
theta=theta, | ||
do_expensive_check=do_expensive_check, | ||
) | ||
|
||
|
||
def leiden( | ||
input_graph: Graph, | ||
max_iter: int = 100, | ||
resolution: int = 1.0, | ||
random_state: int = None, | ||
theta: int = 1.0, | ||
) -> Tuple[dask_cudf.DataFrame, float]: | ||
""" | ||
Compute the modularity optimizing partition of the input graph using the | ||
Leiden method | ||
|
||
Traag, V. A., Waltman, L., & van Eck, N. J. (2019). From Louvain to Leiden: | ||
guaranteeing well-connected communities. Scientific reports, 9(1), 5233. | ||
doi: 10.1038/s41598-019-41695-z | ||
|
||
Parameters | ||
---------- | ||
G : cugraph.Graph | ||
The graph descriptor should contain the connectivity information | ||
and weights. The adjacency list will be computed if not already | ||
present. | ||
The current implementation only supports undirected graphs. | ||
|
||
max_iter : integer, optional (default=100) | ||
This controls the maximum number of levels/iterations of the Leiden | ||
algorithm. When specified the algorithm will terminate after no more | ||
than the specified number of iterations. No error occurs when the | ||
algorithm terminates early in this manner. | ||
|
||
resolution: float, optional (default=1.0) | ||
Called gamma in the modularity formula, this changes the size | ||
of the communities. Higher resolutions lead to more smaller | ||
communities, lower resolutions lead to fewer larger communities. | ||
Defaults to 1. | ||
|
||
random_state: int, optional(default=None) | ||
Random state to use when generating samples. Optional argument, | ||
defaults to a hash of process id, time, and hostname. | ||
|
||
theta: float, optional (default=1.0) | ||
Called theta in the Leiden algorithm, this is used to scale | ||
modularity gain in Leiden refinement phase, to compute | ||
the probability of joining a random leiden community. | ||
|
||
Returns | ||
------- | ||
parts : dask_cudf.DataFrame | ||
GPU data frame of size V containing two columns the vertex id and the | ||
partition id it is assigned to. | ||
|
||
ddf['vertex'] : cudf.Series | ||
Contains the vertex identifiers | ||
ddf['partition'] : cudf.Series | ||
Contains the partition assigned to the vertices | ||
|
||
modularity_score : float | ||
a floating point number containing the global modularity score of the | ||
partitioning. | ||
|
||
Examples | ||
-------- | ||
>>> from cugraph.experimental.datasets import karate | ||
>>> G = karate.get_graph(fetch=True) | ||
>>> parts, modularity_score = cugraph.leiden(G) | ||
|
||
""" | ||
|
||
if input_graph.is_directed(): | ||
raise ValueError("input graph must be undirected") | ||
|
||
# Return a client if one has started | ||
client = default_client() | ||
|
||
do_expensive_check = False | ||
|
||
result = [ | ||
client.submit( | ||
_call_plc_leiden, | ||
Comms.get_session_id(), | ||
input_graph._plc_graph[w], | ||
max_iter, | ||
resolution, | ||
random_state, | ||
theta, | ||
do_expensive_check, | ||
workers=[w], | ||
allow_other_workers=False, | ||
) | ||
for w in Comms.get_workers() | ||
] | ||
|
||
wait(result) | ||
|
||
part_mod_score = [client.submit(convert_to_cudf, r) for r in result] | ||
wait(part_mod_score) | ||
|
||
vertex_dtype = input_graph.edgelist.edgelist_df.dtypes[0] | ||
empty_df = cudf.DataFrame( | ||
{ | ||
"vertex": numpy.empty(shape=0, dtype=vertex_dtype), | ||
"partition": numpy.empty(shape=0, dtype="int32"), | ||
} | ||
) | ||
|
||
part_mod_score = [delayed(lambda x: x, nout=2)(r) for r in part_mod_score] | ||
|
||
ddf = dask_cudf.from_delayed( | ||
[r[0] for r in part_mod_score], meta=empty_df, verify_meta=False | ||
).persist() | ||
|
||
mod_score = dask.array.from_delayed( | ||
part_mod_score[0][1], shape=(1,), dtype=float | ||
).compute() | ||
|
||
wait(ddf) | ||
wait(mod_score) | ||
|
||
wait([r.release() for r in part_mod_score]) | ||
|
||
if input_graph.renumbered: | ||
ddf = input_graph.unrenumber(ddf, "vertex") | ||
|
||
return ddf, mod_score |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also provide the value of
parts
sodoctest
(and users) can verify?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parts
is a dataframe of length proportional to the number of vertices (in this case 33 for the karate datasets). That's pretty long