Skip to content

Commit

Permalink
DOC: Adding CONTRIBUTING.md, updating readme, etc. (#46)
Browse files Browse the repository at this point in the history
* docs-initial commit

* updated docs

* updated doc

* updated CONTRIBUTING.md

* added a few notes

* Updated CONTRIBUTING.md

* updated readme

* added a few pts
  • Loading branch information
Schefflera-Arboricola authored Mar 13, 2024
1 parent 5bfc422 commit 4368caf
Show file tree
Hide file tree
Showing 5 changed files with 191 additions and 67 deletions.
124 changes: 124 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# Welcome to nx-parallel!

Hi, Thanks for stopping by!

This project is part of the larger NetworkX project. If you're interested in contributing to nx-parallel, you can first go through the [NetworkX's contributing guide](https://github.com/networkx/networkx/blob/main/CONTRIBUTING.rst) for general guidelines on contributing, setting up the development environment, and adding tests/docs, etc.

## Setting up the development environment

To set the local development environment:

- Fork this repository.
- Clone the forked repository locally.

```.sh
git clone git@github.com:<your_username>/networkx.git
```

- Create a fresh conda/mamba virtualenv ([learn more](https://github.com/networkx/networkx/blob/main/CONTRIBUTING.rst#development-workflow))

```.sh
# Creating a virtual environment
python -m venv nxp-dev

# Activating the venv
source nxp-dev/bin/activate
```

- Install the dependencies using the following command

```.sh
pip install -e ".[developer]"
```

- Install pre-commit actions that will run the linters before making a commit

```.sh
pre-commit install
```

- Create a new branch for your changes using

```.sh
git checkout -b <branch_name>
```

- Stage your changes, run `pre-commit` and then commit and push them and create a PR

```.sh
git add .
pre-commit
git add .
git commit -m "Your commit message"
git push origin <branch_name>
```

## Testing nx-parallel

The following command runs all the tests in networkx with a `ParallelGraph` object and for algorithms not in nx-parallel, it falls back to networkx's sequential implementations. This is to ensure that the parallel implementation follows the same API as networkx's.

```.sh
PYTHONPATH=. \
NETWORKX_TEST_BACKEND=parallel \
NETWORKX_FALLBACK_TO_NX=True \
pytest --pyargs networkx "$@"
```

For running additional tests:

```.sh
pytest nx_parallel
```

To add any additional tests, **specific to nx_parallel**, you can follow the way test folders are structured in networkx and add your specific test(s) accordingly.

## Documentation syntax

For displaying a small note about nx-parallel's implementation at the end of the main NetworkX documentation, we use the `backend_info` [entry_point](https://packaging.python.org/en/latest/specifications/entry-points/#entry-points) (in the `pyproject.toml` file). The [`get_info` function](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/utils/backend.py#L8) is used to parse the docstrings of algorithms in nx-parallel and display the nx-parallel specific documentation on the NetworkX's main docs, in the "Additional Backend implementations" box, as shown in the screenshot below.

![backend_box_ss](https://github.com/networkx/nx-parallel/blob/main/assets/images/backend_box_ss.png)

Here is how the docstring should be formatted in nx-parallel:

```.py
def betweenness_centrality(
G, k=None, normalized=True, weight=None, endpoints=False, seed=None, get_chunks="chunks"
):
"""[FIRST PARA DISPLAYED ON MAIN NETWORKX DOCS AS FUNC DESC]
The parallel computation is implemented by dividing the
nodes into chunks and computing betweenness centrality for each chunk concurrently.
Parameters
------------ [EVERYTHING BELOW THIS LINE AND BEFORE THE NETWORKX LINK WILL BE DISPLAYED IN ADDITIONAL PARAMETER'S SECTION ON NETWORKX MAIN DOCS]
get_chunks : function (default = "chunks")
A function that takes in nodes as input and returns node_chuncks
parameter 2 : int
....
.
.
.
networkx.betweenness_centrality : https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.betweenness_centrality.html
"""
```

## Chunking

In parallel computing, "chunking" refers to dividing a large task into smaller, more manageable chunks that can be processed simultaneously by multiple computing units, such as CPU cores or distributed computing nodes. It's like breaking down a big task into smaller pieces so that multiple workers can work on different pieces at the same time, and in the case of nx-parallel, this usually speeds up the overall process.

The default chunking in nx-parallel is done by first determining the number of available CPU cores and then allocating the nodes (or edges or any other iterator) per chunk by dividing the total number of nodes by the total CPU cores available. (ref. [chunk.py](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/utils/chunk.py)). This default chunking can be overridden by the user by passing a custom `get_chunks` function to the algorithm as a kwarg. While adding a new algorithm, you can change this default chunking, if necessary (ref. [PR](https://github.com/networkx/nx-parallel/pull/33)). Also, when [the `config` PR](https://github.com/networkx/networkx/pull/7225) is merged in networkx, and the `config` will be added to nx-parallel, then the user would be able to control the number of CPU cores they would want to use and then the chunking would be done accordingly.

## General guidelines on adding a new algorithm

- To get started with adding a new algorithm, you can refer to the existing implementations in nx-parallel and also refer to the [joblib's documentation on embarrassingly parallel `for` loops](https://joblib.readthedocs.io/en/latest/parallel.html).
- The algorithm that you are considering to add to nx-parallel should be in the main networkx repository and it should have the `_dispatchable` decorator. If not, you can consider adding a sequential implementation in networkx first.
- check-list for adding a new function:
- [ ] Add the parallel implementation(make sure API doesn't break), the file structure should be the same as that in networkx.
- [ ] add the function to the `Dispatcher` class in [interface.py](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/interface.py) (take care of the `name` parameter in `_dispatchable` (ref. [docs](https://networkx.org/documentation/latest/reference/generated/networkx.utils.backends._dispatchable.html#dispatchable)))
- [ ] update the `__init__.py` files accordingly
- [ ] docstring following the above format
- [ ] run the [timing script](https://github.com/networkx/nx-parallel/blob/main/timing/timing_individual_function.py) to get the performance heatmap
- [ ] add additional test(if any)
- [ ] add benchmark(s) for the new function(ref. the README in benchmarks folder for more details)

Happy contributing! 🎉
108 changes: 51 additions & 57 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,78 +1,72 @@
## nx-parallel
# nx-parallel

nx-parallel is a NetworkX backend that uses joblib for parallelization. This project aims to provide parallelized implementations of various NetworkX functions to improve performance.

## Features

nx-parallel provides parallelized implementations for the following NetworkX functions:

- [betweeness_centrality](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/centrality/betweenness.py#L17)
- [local_efficiency](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/efficiency_measures.py#L12)
- [number_of_isolates](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/isolate.py#L9)
- [all_pairs_bellman_ford_path](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/shortest_paths/weighted.py#L9)
- [is_reachable](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/tournament.py#L11)
- [tournament_is_strongly_connected](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/tournament.py#L103)
- [closeness_vitality](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/vitality.py#L9)

![alt text](timing/heatmap_all_functions.png)

See the `/timing` folder for more heatmaps and code for heatmap generation!

### Development install

To setup a local development:

- Fork this repository.
- Clone the forked repository locally.

```
git clone git@github.com:<your_username>/networkx.git
## Algorithms in nx-parallel

- [betweenness_centrality](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/betweenness.py#15)
- [square_clustering](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/cluster.py#10)
- [local_efficiency](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/efficiency_measures.py#9)
- [number_of_isolates](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/isolate.py#8)
- [node_redundancy](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/redundancy.py#11)
- [is_reachable](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/tournament.py#10)
- [tournament_is_strongly_connected](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/tournament.py#54)
- [closeness_vitality](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/vitality.py#9)
- [all_pairs_bellman_ford_path](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/weighted.py#16)
- [johnson](https://github.com/networkx/nx-parallel/blob/main/nx_parallel/algorithms/weighted.py#59)

<details>
<summary>Script used to generate the above list</summary>

```.py
import nx_parallel as nxp
d = nxp.get_info()
for func in d.get("functions", {}):
print(f"- [{func}]({d['functions'][func]['url']})")
```

- Create a fresh conda/mamba virtualenv and install the dependencies
</details>

```
pip install -e ".[developer]"
```
## Backend usage

- Install pre-commit actions that will run the linters before making a commit
```.py
import networkx as nx
import nx_parallel as nxp

```
pre-commit install
```
G = nx.path_graph(4)
H = nxp.ParallelGraph(G)

## Usage
# method 1 : passing ParallelGraph object in networkx function
nx.betweenness_centrality(H)

Here's an example of how to use nx-parallel:
# method 2 : using the 'backend' kwarg
nx.betweenness_centrality(G, backend="parallel")

```python
import networkx as nx
import nx_parallel
# method 3 : using nx-parallel implementation with networkx object
nxp.betweenness_centrality(G)

# method 4 : using nx-parallel implementation with ParallelGraph object
nxp.betweenness_centrality(H)

G = nx.path_graph(4)
H = nx_parallel.ParallelGraph(G)
nx.betweenness_centrality(H)
# output : {0: 0.0, 1: 0.6666666666666666, 2: 0.6666666666666666, 3: 0.0}
```

## Testing

To run tests for the project, use the following command:
### Notes

```
PYTHONPATH=. \
NETWORKX_TEST_BACKEND=parallel \
NETWORKX_FALLBACK_TO_NX=True \
pytest --pyargs networkx "$@"
```
1. Some functions in networkx have the same name but different implementations, so to avoid these name conflicts we differentiate them by the `name` parameter in `_dispatchable` at the time of dispatching (ref. [docs](https://networkx.org/documentation/latest/reference/generated/networkx.utils.backends._dispatchable.html#dispatchable)). So, mentioning either the full path of the implementation or the `name` parameter is recommended. For example:

## Contributing
```.py
# using full path
nx.algorithms.connectivity.connectivity.all_pairs_node_connectivity(H)
nx.algorithms.approximation.connectivity.all_pairs_node_connectivity(H)

We'd love to have you contribute to nx-parallel! Here are some guidelines on how to do that:
# using `name` parameter
nx.all_pairs_node_connectivity(H) # runs the parallel implementation in `connectivity/connectivity`
nx.approximate_all_pairs_node_connectivity(H) # runs the parallel implementation in `approximation/connectivity`
```

- **Issues:** Feel free to open issues for any problems you face, or for new features you'd like to see implemented.
- **Pull requests:** If you'd like to implement a feature or fix a bug yourself, we'd be happy to review a pull request. Please make sure to explain the changes you made in the pull request description.
2. Right now there isn't much difference between `nx.Graph` and `nxp.ParallelGraph` so `method 3` would work fine but it is not recommended because in future that might not be the case.

## Additional Information
Feel free to contribute to nx-parallel. You can find the contributing guidelines [here](https://github.com/networkx/nx-parallel/blob/main/CONTRIBUTING.md). If you'd like to implement a feature or fix a bug, we'd be happy to review a pull request. Please make sure to explain the changes you made in the pull request description. And feel free to open issues for any problems you face, or for new features you'd like to see implemented.

This project is part of the larger NetworkX project. If you're interested in contributing to NetworkX, you can find more information in the [NetworkX contributing guidelines](https://github.com/networkx/networkx/blob/main/CONTRIBUTING.rst).
Thank you :)
Binary file added assets/images/backend_box_ss.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
20 changes: 13 additions & 7 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,20 @@
# Benchmarks

These asv benchmarks are not just good to see how parallel implementations are improving over the commits but you can also compare how much better nx-parallel implementations are as compared to the networkx implementations, by switching the x-axis parameter to be `backends`. There are also heatmaps in the `/timing` folder that show the speedups of the parallel and networkx implementations of the same function.

## Preview benchmarks locally

1. clone this repo
2. `cd benchmarks`
3. If you are working on a different branch then update the `branches` in the `asv.conf.json` file.
4. `asv run` will run the benchmarks on the last commit
1. clone this repo and setup the development algorithm(ref. [README](https://github.com/networkx/nx-parallel?tab=readme-ov-file#development-install))
2. run `pip install asv`
3. navigate using `cd benchmarks`
4. If you are working on a different branch then update the value of `branches` in the `asv.conf.json` file.
5. `asv run` will run the benchmarks on the last commit
- or use `asv continuous base_commit_hash test_commit_hash` to run the benchmark to compare two commits
- or `asv run -b <benchmark_file_name> -k <benchmark_name>` to run a particular benchmark.
- or `asv run -b <benchmark_file_name> -k <benchmark_name>` to run a particular benchmark in a file.
- or `asv run -b BenchmarkClassName.time_benchmark_func_name` to run a specific benchmark in a benchmark class.
- if you are running benchmarks for the first time, you will be asked to enter your machine information after this command.
5. `asv publish` will create a `html` folder with the results
6. `asv preview` will host the results locally at http://127.0.0.1:8080/
6. `asv publish` will create an `html` folder with the results.
7. `asv preview` will host the results locally at http://127.0.0.1:8080/

<hr>

Expand Down
6 changes: 3 additions & 3 deletions timing/timing_comparison.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Timing Comparisons
# Timing Comparisons

---

Expand All @@ -10,11 +10,11 @@ RAM: 16 GB LPDDR4X at 3733 MHz

Code to generate heatmaps in timing_individual_function.py and timing_all_functions.py.

### All parallelized functions at this time:
## All parallelized functions at this time:

![alt text](heatmap_all_functions.png)

### Individual functions:
## Individual functions:

betweenness_centrality
![alt text](heatmap_betweenness_centrality_timing.png)
Expand Down

0 comments on commit 4368caf

Please sign in to comment.