Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add floyd_warshall #42

Merged
merged 4 commits into from
Feb 2, 2023
Merged

Add floyd_warshall #42

merged 4 commits into from
Feb 2, 2023

Conversation

eriknw
Copy link
Member

@eriknw eriknw commented Jan 31, 2023

CC @jim22k @SultanOrazbayev @LuisFelipeRamos

I made a few minor modifications to the algorithms from what we wrote together today. I'm happy to answer any questions. I tinkered around a little to make things faster.

We can probably get this to work with dask-graphblas too, which I think would be pretty interesting, because it can create a massive, distributed matrix. It may not be the best way to compute APSP, but a way is better than no way at all :)

See the original LAGraph version of Floyd-Warshall here:
https://github.com/GraphBLAS/LAGraph/blob/ed55a49ee7138d2b5a6c5eb4329ccd0bf9e4ac17/old/experimental_algorithm/LAGraph_FW.c

I'll try to benchmark and compare this with a NumPy implementation on a beefy machine with lots of memory.

@SultanOrazbayev
Copy link
Member

A potential optimization is to loop over n, where n is rows with non-zero values.

@codecov-commenter
Copy link

codecov-commenter commented Jan 31, 2023

Codecov Report

Base: 72.91% // Head: 72.25% // Decreases project coverage by -0.66% ⚠️

Coverage data is based on head (b02851b) compared to base (140bea8).
Patch coverage: 38.29% of modified lines in pull request are covered.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #42      +/-   ##
==========================================
- Coverage   72.91%   72.25%   -0.66%     
==========================================
  Files          70       72       +2     
  Lines        2573     2617      +44     
  Branches      475      479       +4     
==========================================
+ Hits         1876     1891      +15     
- Misses        528      557      +29     
  Partials      169      169              
Impacted Files Coverage Δ
...blas_algorithms/algorithms/shortest_paths/dense.py 16.12% <16.12%> (ø)
graphblas_algorithms/nxapi/shortest_paths/dense.py 57.14% <57.14%> (ø)
...s_algorithms/algorithms/shortest_paths/__init__.py 100.00% <100.00%> (ø)
graphblas_algorithms/interface.py 97.46% <100.00%> (+0.13%) ⬆️
...phblas_algorithms/nxapi/shortest_paths/__init__.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@eriknw
Copy link
Member Author

eriknw commented Jan 31, 2023

A potential optimization is to loop over n, where n is rows with non-zero values.

Good suggestion. I added an optimization where we only iterate over vertices that have nonempty rows and nonempty columns. I think this behaves correctly, but would appreciate if somebody could verify it.

I also introduced another temporary matrix to hold the outer product. We then drop the diagonal values from them.

All this performs better or similar in my limited benchmarking. Our strategy of keeping things sparse is probably reasonable, because what if there are multiple groups of connected components? The final result frequently may not be dense.

With the goal of "keeping things sparse", I wonder if there are any heuristics we could employ, such as iterating over vertices with small degrees first.

A, row_degrees, column_degrees = G.get_properties("offdiag row_degrees- column_degrees-")
nonempty_nodes = binary.pair(row_degrees & column_degrees).new(name="nonempty_nodes")
else:
A, nonempty_nodes = G.get_properties("offdiag degrees-")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we use some shorthand notation here. "degrees-" does not include self-edges (i.e., diagonals), but "degrees+" does include self-edges.

Row = Matrix(dtype, nrows=1, ncols=n, name="Row")
Col = Matrix(dtype, nrows=n, ncols=1, name="Col")
Outer = Matrix(dtype, nrows=n, ncols=n, name="temp")
for i in nonempty_nodes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@SultanOrazbayev
Copy link
Member

With the goal of "keeping things sparse", I wonder if there are any heuristics we could employ, such as iterating over vertices with small degrees first.

Not sure about this, it could lead to gotchas if users don't expect sorting to happen. (and sorting itself could be an issue, e.g. when using distributed graphs)

@eriknw eriknw merged commit 6dd93bd into python-graphblas:main Feb 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants