Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH : Added get_chunks to betweenness_centrality and create_iterables in utils #29

Merged
merged 17 commits into from
Mar 29, 2024

Conversation

Schefflera-Arboricola
Copy link
Member

@Schefflera-Arboricola Schefflera-Arboricola commented Jan 6, 2024

Please do "Squash and merge" while merging this PR

  1. Added get_chunks kwarg to betweenness_centrality :
    • description: A user-defined function to create node_chunks(instead of how we currently do it i.e. by creating partitions on the G.nodes; if get_chunks is "chunks"(default) then this is executed). To understand the motivation for this see the additional pytest added
    • also added additional tests(specific to nx-parallel) to test additional kwargs
  2. added create_iterables(ref. Pr [WIP]: Refactor-- consolidate and simplify #7)

@Schefflera-Arboricola Schefflera-Arboricola marked this pull request as draft January 6, 2024 12:20
@Schefflera-Arboricola Schefflera-Arboricola marked this pull request as ready for review January 6, 2024 12:39
@Schefflera-Arboricola Schefflera-Arboricola marked this pull request as draft January 15, 2024 14:00
@Schefflera-Arboricola Schefflera-Arboricola changed the title ENH : Adding joblib related kwargs to functions ENH : Adding get_chunk kwargs to functions Jan 19, 2024
@Schefflera-Arboricola Schefflera-Arboricola marked this pull request as ready for review January 19, 2024 20:09
@Schefflera-Arboricola Schefflera-Arboricola changed the title ENH : Adding get_chunk kwargs to functions ENH : Adding get_chunks kwarg to functions Jan 19, 2024
Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach. It is general (and works for nodes, edges and isolated_nodes) and allows the user to override the choice of node chunks. I'm not sure how much difference it makes in betweenness centrality to be able to split up the "hard" nodes, but in general it will make a difference if you can split them up. And betweenness centrality could have some nodes take a lot longer than others. So this is a good one to start with.

I don't know of a way to test that this is working. Maybe you should have a test that computes it using the built-in default chunking and also computes it with a user-defined method. They could be "smoke-tests" to see if using that option breaks it so badly that smoke starts coming out. If smoke doesn't come out, it might still be wrong. But you have at least tested that there are no fundamental problems. Have you thought about tests for this?

@dschult dschult added the type: Enhancement New feature or request label Feb 20, 2024
@dschult
Copy link
Member

dschult commented Feb 20, 2024

This PR has style issues. Next time you get to it, I think it is just one line with a space at the end of the line.

@Schefflera-Arboricola Schefflera-Arboricola marked this pull request as draft February 23, 2024 16:52
@Schefflera-Arboricola Schefflera-Arboricola changed the title ENH : Adding get_chunks kwarg to functions ENH : Added get_chunks to betweenness_centrality and create_iterables in utils Mar 12, 2024
@dschult
Copy link
Member

dschult commented Mar 13, 2024

It might be assuming too much to assert that the time should be less when chunking is used.
Of course, asserting that could allow you to track down why it is slower, I suppose.

But should it always be slower? How might get_chunks slow it down?

@Schefflera-Arboricola
Copy link
Member Author

It might be assuming too much to assert that the time should be less when chunking is used. Of course, asserting that could allow you to track down why it is slower, I suppose.

But should it always be slower? How might get_chunks slow it down?

here, I have compared the time for running the parallel implementation of betweenness_centrality with the default chunking and with a custom chunking(done based on the edge_betweenness_centrality values). And I locally ran it on 8 cores and in github workflow test it runs on 2 cores, and for some machines, this test is failing and for some it is passing.
Screenshot 2024-03-13 at 6 58 08 PM

This could be a nice example of why get_chunk might be useful as an option, that would let a user do custom chunking when they have a really big graph. But, I don't think we should put it as a test because it takes a lot of time and it also doesn't pass for all machines.

@Schefflera-Arboricola Schefflera-Arboricola marked this pull request as ready for review March 13, 2024 13:37
@dschult
Copy link
Member

dschult commented Mar 13, 2024

That makes good sense to me.
Timing is dependent on so many things... Hard to do well...
Nice to have parameters to tweak for each situation and get_chunk is a good one to make available. :)

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready to be merged. So I'm approving it.

Are there any more changes planned for this PR?

@Schefflera-Arboricola
Copy link
Member Author

Are there any more changes planned for this PR?
No, I think it's ready

Thanks @dschult !

@dschult dschult merged commit 21d4158 into networkx:main Mar 29, 2024
11 checks passed
@jarrodmillman jarrodmillman added this to the 0.1 milestone Mar 29, 2024
@jarrodmillman jarrodmillman modified the milestones: 0.1, 0.2 May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: Enhancement New feature or request
Development

Successfully merging this pull request may close these issues.

3 participants