-
-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisiting nxp algorithms #63
Conversation
return nxp.chunks(_nodes, num_in_chunk) | ||
|
||
get_chunks_funcs = get_functions_with_get_chunks() | ||
ignore_funcs = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the functions that require some argument(s) other than G
to be passed in the function call are ignored.
Should the get_chunk
for these be tested in test_get_chunks.py
or in a separate test
folder in the respective modules to which these functions belong?
I think it would be better to have all the get_chunk
smoke tests in one place. But, I don't know how many functions in networkx would fall in this category(i.e. algos that require args other than just G
). LMK what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about having tests of the functionality of get_chunks as a function? Those would be separate from any tests that the other functions are using get_chunks correctly (which is I think what the smoke tests are doing... right?).
From a theoretical perspective, if the smoke tests are testing that the nxp functions are using get_chunks correctly, that should probably be in a test module for that function.
And how many functions in networkx have parameters other than G is less important than how many functions supported by nx-parallel do. I suspect that most functions do have some input other than G.
Do we need to test all the functions' get_chunk parameter? I guess it is a check of whether each function is using chunks correctly.
Manually listing functions that can (or can't) be run with G as the only parameter doesn't scale nicely to all networkx functions. Another option is to set up test files for each module -- similar to networkx. But that adds a lot of testing files. Yet another option is to find some way to run the networkx test suite with all calls to nx-parallel setting a specified value for the get_chunks parameter. That might not be practical, and I haven't thought much about it. This is tricky... :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it would be ok to have all the smoke tests for all the functions' get_chunks
in one file, right now, bcoz we don't have a lot of functions in ignore_func
, and having so many test files with just one smoke test(all of which can be summed up in a loop like this) seems unnecessary and redundant. But this is definitely not scalable, so, we can handle this differently once we have a bunch of functions in nx-parallel that require args other than just G
. And we can have separate tests
folders if necessary, like we currently have for betweenness_centrality
(which not just tests get_chunks
but also shows how custom chunking can be useful). LMK what you think of this.
And should we add the smoke tests for the functions in ignore_func
in this file only or in separate tests
folders in respective modules?
What do you think about having tests of the functionality of get_chunks as a function? Those would be separate from any tests that the other functions are using get_chunks correctly (which is I think what the smoke tests are doing... right?).
I think we can test get_chunks
separately for all functions in separate tests folders, once we have other additional tests for nx-parallel functions. But right now get_chunks
smoke test is the only additional test for all nx-parallel functions.
Yet another option is to find some way to run the networkx test suite with all calls to nx-parallel setting a specified value for the get_chunks parameter.
But, then this wouldn't check our default chunking.
@@ -1,6 +1,4 @@ | |||
import time | |||
import random | |||
import types | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially created a separate pull request for updating the timing script (PR #61), as I believed it to be an independent PR. This decision was also influenced by the mention in my SciPyCon poster (to be presented on 10th July'24) stating "refer the timing folder to know more about speedups." At the time, having it as a separate PR made sense.
However, with the recent updates and enhancements to the tournament algorithms in this PR, it has become apparent that PR #61 is now dependent on these changes.
Could you please advise whether I should continue working on the PR #61 branch or start updating the timing script in the current PR's branch?
Please LMK what would be the most convenient approach for you to review these timely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to keep the timing PR as a separate PR. It is true that there will likely be merge conflicts. But large PRs are hard to review and #63 is already very big. (maybe #63 should be split up, but I don't know -- it depends how easy that is to do). I think it is better to embrace rebasing/merging with conflicts than to try to avoid it. It gives more reviewable PRs. But it is definitely more work for making the PRs.
The changes in this module of this PR don't seem difficult to rebase/merge when the time comes, so I would just keep the PRs separate based on this module at least...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But large PRs are hard to review and #63 is already very big. (maybe #63 should be split up, but I don't know -- it depends how easy that is to do).
All the commits in this PR correspond to the checklist points in the first comment of this PR. So, I think reviewing this PR commit-wise would be easier than reviewing it in one go.
The changes in this module of this PR don't seem difficult to rebase/merge when the time comes, so I would just keep the PRs separate based on this module at least...
I can work on updating the timing script PR separately and the merge conflicts won't be that difficult for me to resolve, so then this PR will need to be merged before the timing script PR. My main concern was just to have the updated heatmaps in the repo before SciPyCon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The renaming looks good.
I have some suggestions for the __init__
method of ParallelGraph.
Otherwise it seems fine.
get_chunks
to all algorithms(in my gsoc proposal this was an "expanding" goal, not a "revisiting" goal, but it seemed more apt to include it here)get_chunks
together(rn ignores funcs that require args other than G)is_reachable
andtournament_is_strongly_connected
intournament.py
(can be experimented with and enhanced further but probably should leave that for a different PR)pip install
s intest.yml
in gh wfDispatcher
class toBackendInterface
(ref. comment) and updated it so that runningnxp.ParallelGraph()
ornxp.ParallelGraph([(1, 2), (2, 3)])
doesn't give any error and added__str__
so thatParallelGraph with n nodes and m edges
is printed instead of<nx_parallel.interface.ParallelGraph object at 0x10495e6d0>
when aParallelGraph
is passed inprint()
(This is in experimental section of the proposal)