-
-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
first commit, isolates and betweenness #2
Conversation
isolates passes all tests still need to pass a few betweenness tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice -- does it work? :} I guess we should get a way to test these ideas on CI.
Can you explain a little more about why all the isolates functions need to be included in the file to make it work?
What happens to the doc_strings in these files? Does this get pulled into IPython's isolates?
? That's pretty fancy so maybe not. It looks like it just copies the docs from networkx. Maybe we could do that programmatically to avoid long term maintenance. No need to touch anything now, I'm just rambling about the future and long term implications, etc.
Agreed! I think figuring out how to run the networkx test suite with the parallel backend (and adding this as CI for this repo) should be the top priority. Maybe we can look to the dispatching docs and/or |
I'm looking into getting CI tests set up for this repo. |
I made some changes on the nx_parallel repo to turn on CI testing there. It tests Python 3.10 and 3.11 with linux, windows and macos. That means you will need to pull from this branch in your repo down to your local repo. I'm not expecting any conflicts so hopefully that will be easy. :} |
There is something funky with the betweeness centrality implementation: In [1]: import nx_parallel as nxp
In [2]: import networkx as nx
In [3]: G = nx.DiGraph()
In [4]: nx.add_path(G, [0, 1, 2])
In [5]: GP = nxp.ParallelGraph(G)
In [6]: nx.betweenness_centrality(GP)
Out[6]: {0: 0.0, 1: 1.0, 2: 0.0}
In [7]: nx.betweenness_centrality(G)
Out[7]: {0: 0.0, 1: 0.5, 2: 0.0} |
I'm finishing up parallelizing closeness_vitality and the functions in tournament.py with TODOs that say easily parallelizable...I just realized though that those functions do not all have the @nx._dispatch decorator. Thus I'm not able to use the implementations because the dispatcher doesn't dispatch to what I made. I'm thinking I either 1) just stick to parallelizing functions with the dispatch decorator or 2) consider adding the @nx._dispatch decorator to the functions I want to get around this? Any thoughts? |
For deciding about adding So, I think you should add the You might also consider splitting your nx_parallel PR into a part that adds support for functions that do have the |
Alright I messed around a bit...I think I will stick to making a commit that works with PR #6688. I've been able to set up and have nx_parallel running with the PR. The functions I want to parallelize have all had decorators added in the PR, so don't think I need to make my own local changes to the networkx repo. Only issue is that since many more functions have the dispatch decorator, I now have to include their implementations (or else I get where it says "not implemented by parallel" as discussed earlier with Mridul). But it should be fine, don't see an immediate workaround |
Just to help me understand this -- it is giving that error when code from one of your implementations calls another function that has Could it just be that you are passing a ParallelGraph into those functions instead of a NetworkX graph? Is there a way to unwrap the networkx graph enough to send it to the other functions while not messing up the parallel nature of what is being done? |
Yup, it is only giving the error when I call another function with the dispatch decorator. There are no errors for functions I don’t use. I think that trying to unwrap the graph is a good idea (maybe some extra overhead but probably not too much). I’ll try that and fiddling with pytest |
A quick way to unwrap is to use |
…tality and tournament - Decided to just make things work with PR #6688 (had all the functions I needed marked with dispatch decorator) - More graph types and small interface changes - parallel implementations of closeness_vitality + tournament (I am a bit ahead of schedule) - Made networkx tests into my own tests for nx_parallel (same directories as in networkx. can be easily run w pytest for CI) - ended up having to include all the functions, but didn't have to reimplement (see isolates or tournament for example) - added utils/chunk.py
- Fixed betweenness tests, had some small errors in them - Minor changes to graph class constructors - Changed betweenness_centrality implementation, passes all tests
Can you try an even easier way to unwrap -- that we found out more about at the SciPy meeting is: Can you try this? |
@20kavishs I have created a PR on your repo 20kavishs#1 which uses the |
…able, and cleanly annotated base classes to permit easy iteration
Redid betweenness without convert function Tried to use __wrapped__ but it only worked for isolates...for consistency I kept everything the same Errors for using __wrapped__ were because various methods were "not implemented by parallel"
Passes all tests
I think this version of nx_parallel is slow due to copying the networkx graph when we instantiate the ParallelGraph instance. Indeed we often end up converting back to networkx.Graph again so it is really not worth it. Can we try making a slimmer version of this:
Something like: class ParallelGraph:
def __init__(self, input_graph):
self.G = input_graph
__networkx_plugin__ = "parallel"
def number_of_isolates(PG):
isolates_list = list(nx.isolates.__wrapped__(PG.G))
num_chunks = max(len(isolates_list) // cpu_count(), 1)
isolate_chunks = chunks(isolates_list, num_chunks)
results = Parallel(n_jobs=-1)(delayed(len)(chunk) for chunk in isolate_chunks)
return sum(results) Also note that this will only work on the main branch of networkx after PR 6688 was merged. Let's try to get this implemented and timed. It should reduce overhead by a lot. |
added originalGraph to parallel classes, added heatmaps + their code in the timing folder WIP for heatmap
Basic structure
Added isolates
parallelized number_of_isolates, copied over other functions from isolate.py because they also had a dispatch decorator and that was the only way to make things work
Added betweenness centrality
Tried multiple methods, used the fastest implementation (will put more details in blog post)
Still need to pass some more tests