Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH : Adding backend_info entry point #27

Merged
merged 26 commits into from
Jan 28, 2024

Conversation

Schefflera-Arboricola
Copy link
Member

@Schefflera-Arboricola Schefflera-Arboricola commented Jan 1, 2024

[UPDATED]

  1. updated docs of all functions and classes and fixed __init__.py files and tried to make the code consistent in a few places
  2. added get_info() function in utils.backend (to show nx-parallel docs on the main networkx docs webpage)[Added get_info() in utils.backend instead of nx_parallel/__init__.py because to extract the docstrings of all the algorithms we need to import all algos in nx_parallel and doing that in __init__.py gives circular import error]
  3. removed python -m pytest --doctest-modules --pyargs nx_parallel from wf tests, bcoz error due to no examples in docstring to test

Assumption(syntax) : The docstring of functions should be as follows

def betweenness_centrality(
    G, k=None, normalized=True, weight=None, endpoints=False, seed=None
):
"""[FIRST PARA DISPLAYED ON MAIN NETWORKX DOCS AS FUNC DESC]
    The parallel computation is implemented by dividing the
    nodes into chunks and computing betweenness centrality for each chunk concurrently.
    
    Parameters
    
    ------------ [EVERYTHING BELOW THIS LINE AND BEFORE THE NETWORKX LINK WILL BE DISPLAYED IN ADDITIONAL PARAMETER'S SECTION ON NETWORKX MAIN DOCS]
    get_chunks : function (default = None)
         A function that takes in nodes as input and returns node_chuncks
    parameter 2 : int
        desc ....
.
.
.

    networkx.betweenness_centrality : https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.betweenness_centrality.html
    """

@Schefflera-Arboricola Schefflera-Arboricola changed the title Updating all functions ENH : Updating all functions Jan 2, 2024
@dschult dschult added the type: Enhancement New feature or request label Jan 2, 2024
Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks nice! Thanks!
Here are some comments -- but mostly questions...

  • perhaps we should raise an exception for invalid n_jobs. My thinking is that If they specified an input that doesn't make sense they probably meant something else.
  • Having n_jobs=-2 means 1 CPU is left untouched. That off-by-one value might cause confusion. Maybe -1 should mean all-but-one. That would match indexing where a[:-1] returns all but the last element.
  • The case n_jobs==0 is currently not valid. What do you think about making that mean use-all-cpus? With this and the previous comment the docstring would say "if n_jobs <= 0, then n_cpus + n_jobs cpus are used."
  • I guess the doc_string for n_jobs should say what n_cpus means.
  • you removed some examples from the doc_strings. I'm fine with that, but would like to write down the criteria for what part of the NX function doc_string we will keep for the nx-parallel doc_string. What criteria did you use? (this will be flexible for a while until we figure out what is best, but probably good to start discussing it).

:)

@Schefflera-Arboricola
Copy link
Member Author

Schefflera-Arboricola commented Jan 2, 2024

perhaps we should raise an exception for invalid n_jobs. My thinking is that If they specified an input that doesn't make sense they probably meant something else.

I initially thought of raising an exception, but in joblib when you pass n_jobs > os.cpu_count() then there are no warnings or errors(Idk exactly how joblib handles this case.) so I thought that maybe that is the norm.

  • Having n_jobs=-2 means 1 CPU is left untouched. That off-by-one value might cause confusion. Maybe -1 should mean all-but-one. That would match indexing where a[:-1] returns all but the last element.
  • The case n_jobs==0 is currently not valid. What do you think about making that mean use-all-cpus? With this and the previous comment the docstring would say "if n_jobs <= 0, then n_cpus + n_jobs cpus are used."

Here also I was trying to follow the n_jobs from joblib. In sklearn also the default is -1 and it also means using all the CPUs. But, I know it makes more sense to make 0 or all the cores as default. In this issue(joblib/joblib#1504) they have mentioned they cannot make such a change because it's a big project and it'll impact a lot of people. But, we can make 0 as default if we want. :)

you removed some examples from the doc_strings. I'm fine with that, but would like to write down the criteria for what part of the NX function doc_string we will keep for the nx-parallel doc_string. What criteria did you use? (this will be flexible for a while until we figure out what is best, but probably good to start discussing it).

I didn't remove a lot of things(only examples related networkx and not nx-parallel and see also section, i'll add nx-parallel specific egs soon), as i wanted to discuss it with you and @MridulS . I don't think we should remove all of the Parameters section, but we can make it shorter(by removing extra info about the parameter like weight in all_pairs_bellman_ford_path), and also remove like the formula explanation things(like in Betweenness centrality) because that will never change, i think.
Please let me know your thoughts on this.

Thanks!

@Schefflera-Arboricola Schefflera-Arboricola changed the title ENH : Updating all functions [ENH, MAINT] : Updating all functions Jan 3, 2024
def all_pairs_bellman_ford_path(G, weight="weight"):
"""Compute shortest paths between all nodes in a weighted graph.
def all_pairs_bellman_ford_path(G, weight="weight", n_jobs=-1):
"""Parallelly computes shortest paths between all nodes in a weighted graph.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that we need to find better wording here. The NetworkX wording obviously doesn't include parallel. But "parallelly" is almost never used for computing. (I could be wrong here, but I haven't seen it. The definition of the word exists, but it doesn't seem to be used in computing. Please let me know if I'm wrong with this.)
Other possible words/phrases:

    """Computes in parallel shortest paths between all nodes in a weighted graph.
    """Computes, in parallel, shortest paths between all nodes in a weighted graph.
    """Computes shortest paths between all nodes in a weighted graph in parallel
    """Computes shortest paths between all nodes in a weighted graph (parallel version)
    """Parallel version of networkx.all_pairs_bellman_ford_path
    """Parallel algorithm for shortest_paths between all nodes in a weighted graph

Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here’s another idea for wording:
Parallel computation of …..

@Schefflera-Arboricola
Copy link
Member Author

I made a separate PR for adding n_jobs, because if we want n_jobs to work like how it does in joblib then we will have to add more joblib.Parallel's kwargs, because what n_jobs means depends on the backend we are using in joblib("loky", "multiprocessing", "threading" etc.)

@Schefflera-Arboricola
Copy link
Member Author

Schefflera-Arboricola commented Jan 6, 2024

Also, I have updated the docs and removed a lot of them. Please let me know if you think I have removed too much somewhere.

Thanks!

Also, this failed workflow test seems unrelated to the changes I've made.

@Schefflera-Arboricola
Copy link
Member Author

I had a question and a suggestion related to the documentation issue:

Question :
Is it even possible to automatically transfer docstring of networkx functions to nx-parallel, like we write the docs related to parallel implementation and nxp examples, and all the other docs are automatically imported from the same function in the main networkx repo? Are there any tools that allow this kind of docstring transfer?

Suggestion :
Another way to approach that I could come up with to solve the documentation issue was to first display all the main things(like the function header, a description of the parallel implementation, one-line summary of what function does/returns and nxp examples) and then below that have a drop-down for all the other things(like parameters, notes, references etc.). This would probably be more user-friendly but there will be redundancy on the developer side. Please let me know your thoughts on this.

@Schefflera-Arboricola Schefflera-Arboricola changed the title [ENH, MAINT] : Updating all functions [DOCS, MAINT] : Updating all functions Jan 8, 2024
@dschult
Copy link
Member

dschult commented Jan 9, 2024

I understand your comment about making n_job changes in a separate PR. We will have to think about it more.

I don't think you have removed too much -- but we do need to figure out what we will use the doc_strings for before we know how much to remove or keep.

Python allows the assignment and reassignment of doc_strings on functions within a single python session. The doc_string is held as a string in the __doc__ attribute of the function. But that doesn't make it easy to do. It would be quite a task just to create a system that went through the NX functions, found which are also in nx-parallel, and then tried to match the two.

We should figure out who we are aiming for with the doc_strings. Are we wanting them in the code where people reading the code will see? Are we wanting them at the python interpretter level so people in Ipython can type func? to see the doc_string? Or are we wanting this to be a website of documentation for nx-parallel that people look up with their browser?

I think an early step is to figure out how to add a box with a few lines to the NX documentation. Like cugraph does for this:https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.betweenness_centrality.html

@Schefflera-Arboricola Schefflera-Arboricola changed the title [DOCS, MAINT] : Updating all functions [ENH, DOCS, MAINT] : Updating all functions Jan 11, 2024
@Schefflera-Arboricola
Copy link
Member Author

In the recent commit, I've made it so that the docstring of all functions is extracted, and then from that the Parallel Computation, Additional Parameters and the Examples sections are extracted and then those are stored in a dictionary(with function name as key) and then at the end this dictionary is returned by the get_funcs_info function and I've also made it so that the url to the function header is there on the networkx page(as Learn more, see PR #7219 for more)

@Schefflera-Arboricola
Copy link
Member Author

I've made this PR independent from PR#7219 and updated the first comment indicating all the changes made.

Thanks!

@Schefflera-Arboricola Schefflera-Arboricola changed the title [ENH, DOCS, MAINT] : Updating all functions ENH : Adding backend_info entry point Jan 21, 2024
@MridulS
Copy link
Member

MridulS commented Jan 28, 2024

Alright, let's try this out with nx main docs.

@MridulS MridulS merged commit 4852fc8 into networkx:main Jan 28, 2024
7 of 11 checks passed
@jarrodmillman jarrodmillman added this to the 0.1 milestone Jan 28, 2024
@MridulS
Copy link
Member

MridulS commented Jan 28, 2024

Hmm, this doesn't seem to work, I can't see nx-parallel show up in the docs.

I'll debug this later this week.

@Schefflera-Arboricola
Copy link
Member Author

Hmm, this doesn't seem to work, I can't see nx-parallel show up in the docs.

the problem might be from this line for param in sorted(extra_parameters):, (an error like - TypeError: 'NoneType' object is not callable), bcoz both graphblas and nx-cugraphs don't include the extra_parameters and extra_docstring keys in a function's dictionary if they don't have any content. But in nx-parallel the default value for all keys of a function is None, and while building the backend docs only the existence of the key is checked and not if its value is None or not.

So, to solve this we can check if the given values for keys(extra_parameters and extra_docstring) are None (like here in the renaming PR, I am also checking for all the keys that if they have None values.) or we can modify the in the get_info function to remove all the key-value pairs where the value is None.

@Schefflera-Arboricola
Copy link
Member Author

running this script shows what get_info returns :

import nx_parallel as nxp

l = nxp.get_info()

for i in l:
    if i != "functions":
        print(i)
        print(l[i])
        print()
    else:
        print(i)
        for j in l[i]:
            print(j)
            print(l[i][j])
            print()

output :

backend_name
parallel

project
nx-parallel

package
nx_parallel

url
https://github.com/networkx/nx-parallel

short_summary
Parallel backend for NetworkX algorithms

functions
betweenness_centrality
{'extra_docstring': 'The parallel computation is implemented by dividing the\nnodes into chunks and computing betweenness centrality for each chunk concurrently.', 'extra_parameters': None}

local_efficiency
{'extra_docstring': 'The parallel computation is implemented by dividing the\nnodes into chunks and then computing and adding global efficiencies of all node\nin all chunks, in parallel, and then adding all these sums and dividing by the\ntotal number of nodes at the end.', 'extra_parameters': None}

number_of_isolates
{'extra_docstring': 'The parallel computation is implemented by dividing the list\nof isolated nodes into chunks and then finding the length of each chunk in parallel\nand then adding all the lengths at the end.', 'extra_parameters': None}

is_reachable
{'extra_docstring': 'The function parallelizes the calculation of two\nneighborhoods of vertices in `G` and checks closure conditions for each\nneighborhood subset in parallel.', 'extra_parameters': None}

tournament_is_strongly_connected
{'extra_docstring': 'The parallel computation is implemented by dividing the\nnodes into chunks and then checking whether each node is reachable from each\nother node in parallel.', 'extra_parameters': None}

closeness_vitality
{'extra_docstring': 'The parallel computation is implemented only when the node\nis not specified. The closeness vitality for each node is computed concurrently.', 'extra_parameters': None}

all_pairs_bellman_ford_path
{'extra_docstring': 'The parallel computation is implemented by computing the\nshortest paths for each node concurrently.', 'extra_parameters': None}

@Schefflera-Arboricola
Copy link
Member Author

...... So, to solve this we can check if the given values for keys(extra_parameters and extra_docstring) are None (like here in the renaming PR, I am also checking for all the keys that if they have None values.) or we can modify the in the get_info function to remove all the key-value pairs where the value is None.

I don't think this is the issue bcoz after re-running the tests under networkx/networkx#7219 , the nx-parallel note was still not appearing in the documentation artifact.

https://output.circle-artifacts.com/output/job/806b45be-b4a6-4b59-8c74-d950b0de934c/artifacts/0/doc/build/html/reference/algorithms/generated/networkx.algorithms.centrality.betweenness_centrality.html#betweenness-centrality

@Schefflera-Arboricola
Copy link
Member Author

Hmm, this doesn't seem to work, I can't see nx-parallel show up in the docs.

#55 resolves this

@jarrodmillman jarrodmillman modified the milestones: 0.1, 0.2 May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: Enhancement New feature or request
Development

Successfully merging this pull request may close these issues.

4 participants