Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first commit, isolates and betweenness #2

Merged
merged 27 commits into from
Sep 11, 2023

Conversation

20kavishs
Copy link
Contributor

  • Basic structure

  • Added isolates
    parallelized number_of_isolates, copied over other functions from isolate.py because they also had a dispatch decorator and that was the only way to make things work

  • Added betweenness centrality
    Tried multiple methods, used the fastest implementation (will put more details in blog post)
    Still need to pass some more tests

isolates passes all tests
still need to pass a few betweenness tests
Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice -- does it work? :} I guess we should get a way to test these ideas on CI.

Can you explain a little more about why all the isolates functions need to be included in the file to make it work?

What happens to the doc_strings in these files? Does this get pulled into IPython's isolates?? That's pretty fancy so maybe not. It looks like it just copies the docs from networkx. Maybe we could do that programmatically to avoid long term maintenance. No need to touch anything now, I'm just rambling about the future and long term implications, etc.

nx_parallel/algorithms/centrality/betweenness.py Outdated Show resolved Hide resolved
nx_parallel/algorithms/centrality/betweenness.py Outdated Show resolved Hide resolved
@rossbar
Copy link
Collaborator

rossbar commented Jun 27, 2023

Nice -- does it work? :} I guess we should get a way to test these ideas on CI.

Agreed! I think figuring out how to run the networkx test suite with the parallel backend (and adding this as CI for this repo) should be the top priority. Maybe we can look to the dispatching docs and/or python-graphblas for inspiration.

@dschult
Copy link
Member

dschult commented Jun 28, 2023

I'm looking into getting CI tests set up for this repo.
I notice that the entry_points entry for nx_parallel is parallel. Should that be nx_parallel? The name parallel is quite general and there may be other parallel backends in the future. But I don't know the implications of decisions like this. Where does this name get used once we add it as an entry_point? (I'm new at entry_point stuff :)

@dschult
Copy link
Member

dschult commented Jun 28, 2023

I made some changes on the nx_parallel repo to turn on CI testing there. It tests Python 3.10 and 3.11 with linux, windows and macos.

That means you will need to pull from this branch in your repo down to your local repo. I'm not expecting any conflicts so hopefully that will be easy. :}

@MridulS
Copy link
Member

MridulS commented Jul 1, 2023

There is something funky with the betweeness centrality implementation:

In [1]: import nx_parallel as nxp

In [2]: import networkx as nx

In [3]: G = nx.DiGraph()

In [4]: nx.add_path(G, [0, 1, 2])

In [5]: GP = nxp.ParallelGraph(G)

In [6]: nx.betweenness_centrality(GP)
Out[6]: {0: 0.0, 1: 1.0, 2: 0.0}

In [7]: nx.betweenness_centrality(G)
Out[7]: {0: 0.0, 1: 0.5, 2: 0.0}

@20kavishs
Copy link
Contributor Author

I'm finishing up parallelizing closeness_vitality and the functions in tournament.py with TODOs that say easily parallelizable...I just realized though that those functions do not all have the @nx._dispatch decorator. Thus I'm not able to use the implementations because the dispatcher doesn't dispatch to what I made.

I'm thinking I either 1) just stick to parallelizing functions with the dispatch decorator or 2) consider adding the @nx._dispatch decorator to the functions I want to get around this?

Any thoughts?

@dschult
Copy link
Member

dschult commented Jul 7, 2023

For deciding about adding @nx._dispatch, can you look at networkx-#6688 which adds lots of functions to the set that can be dispatched. At the moment, the dispatch feature is implemented in a fairly small set because we were just rolling it out. But with #6688 we are wrapping many of the functions.

So, I think you should add the @nx._dispatch operator to the functions you are adding for nx_parallel. That means you will have two connected PRs -- one in nx_parallel and one in networkx. That said, if the functions you are looking at already show up in #6688, you won't need a separate PR. But you will need a way to be testing against that PR's version of networkx rather than a release or the main branch. I would suggest starting with a local copy of both repos -- either get #6688 locally, or clone @eriknw repo locally. Then get a config where you can use your nx_parallel along with that PR. Another approach would be to open your own local networkx repo and make a branch that copies the relevant parts of #6688 and adds any other functions you want to have supported. Then you've got a local (and potential PR) to complement #6688 that you can use locally.

You might also consider splitting your nx_parallel PR into a part that adds support for functions that do have the @nx._dispatch operator and a separate branch/PR for functions that don't have that yet.

@20kavishs
Copy link
Contributor Author

20kavishs commented Jul 8, 2023

Alright I messed around a bit...I think I will stick to making a commit that works with PR #6688. I've been able to set up and have nx_parallel running with the PR. The functions I want to parallelize have all had decorators added in the PR, so don't think I need to make my own local changes to the networkx repo.

Only issue is that since many more functions have the dispatch decorator, I now have to include their implementations (or else I get where it says "not implemented by parallel" as discussed earlier with Mridul). But it should be fine, don't see an immediate workaround

@dschult
Copy link
Member

dschult commented Jul 8, 2023

Just to help me understand this -- it is giving that error when code from one of your implementations calls another function that has@nx._dispatch. That is, it doesn't give an error for functions your function doesn't use, right?

Could it just be that you are passing a ParallelGraph into those functions instead of a NetworkX graph? Is there a way to unwrap the networkx graph enough to send it to the other functions while not messing up the parallel nature of what is being done?

@20kavishs
Copy link
Contributor Author

Yup, it is only giving the error when I call another function with the dispatch decorator. There are no errors for functions I don’t use.

I think that trying to unwrap the graph is a good idea (maybe some extra overhead but probably not too much). I’ll try that and fiddling with pytest

@MridulS
Copy link
Member

MridulS commented Jul 8, 2023

A quick way to unwrap is to use ParallelGraph.to_networkx(), or you can add a class variable which stores the graph at __init__

…tality and tournament

- Decided to just make things work with PR #6688 (had all the functions I needed marked with dispatch decorator)
- More graph types and small interface changes
- parallel implementations of closeness_vitality + tournament (I am a bit ahead of schedule)
- Made networkx tests into my own tests for nx_parallel (same directories as in networkx. can be easily run w pytest for CI)
- ended up having to include all the functions, but didn't have to reimplement (see isolates or tournament for example)
- added utils/chunk.py
- Fixed betweenness tests, had some small errors in them
- Minor changes to graph class constructors
- Changed betweenness_centrality implementation, passes all tests
@dschult
Copy link
Member

dschult commented Jul 15, 2023

Can you try an even easier way to unwrap -- that we found out more about at the SciPy meeting is:
Instead of calling nx.function, call nx.function.__wrapped__. This will get around the trouble of not being able to call a networkx function with the @nx.dispatch decorator from the backend code.

Can you try this?

@MridulS
Copy link
Member

MridulS commented Jul 15, 2023

@20kavishs I have created a PR on your repo 20kavishs#1 which uses the __wrapped__ attribute to access the networkx functions. This will fail with the main branch of networkx as the internal functions are not decorated just yet but it should work (I tried it locally). Maybe we should keep a fork of networkx repo to iterate quickly on this repo.

dPys added a commit to dPys/nx_parallel that referenced this pull request Jul 16, 2023
…able, and cleanly annotated base classes to permit easy iteration
Redid betweenness without convert function

Tried to use __wrapped__ but it only worked for isolates...for consistency I kept everything the same

Errors for using __wrapped__ were because various methods were "not implemented by parallel"
@dschult
Copy link
Member

dschult commented Aug 8, 2023

I think this version of nx_parallel is slow due to copying the networkx graph when we instantiate the ParallelGraph instance. Indeed we often end up converting back to networkx.Graph again so it is really not worth it.

Can we try making a slimmer version of this:

  • The ParallelGraph class can just have the __networkx_plugin__ attribute and in __init__ it can store the input graph as an attribute.
  • Only implement the outer level function we want to parallelize. Not need to copy any functions in their entirety from networkx.
  • Whenever we call a networkx function that needs access to G we give the __wrapped__ version of the function an input of PG.G

Something like:

class ParallelGraph:
    def __init__(self, input_graph):
        self.G = input_graph
    __networkx_plugin__ = "parallel"

def number_of_isolates(PG):
    isolates_list = list(nx.isolates.__wrapped__(PG.G))
    num_chunks = max(len(isolates_list) // cpu_count(), 1)
    isolate_chunks = chunks(isolates_list, num_chunks)
    results = Parallel(n_jobs=-1)(delayed(len)(chunk) for chunk in isolate_chunks)
    return sum(results)

Also note that this will only work on the main branch of networkx after PR 6688 was merged. Let's try to get this implemented and timed. It should reduce overhead by a lot.

@dschult dschult merged commit 241fbac into networkx:main Sep 11, 2023
@jarrodmillman jarrodmillman added the type: Enhancement New feature or request label Oct 13, 2023
@jarrodmillman jarrodmillman added this to the 0.1 milestone Oct 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: Enhancement New feature or request
Development

Successfully merging this pull request may close these issues.

5 participants