-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add floyd_warshall
#42
Conversation
A potential optimization is to loop over n, where n is rows with non-zero values. |
Codecov ReportBase: 72.91% // Head: 72.25% // Decreases project coverage by
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more Additional details and impacted files@@ Coverage Diff @@
## main #42 +/- ##
==========================================
- Coverage 72.91% 72.25% -0.66%
==========================================
Files 70 72 +2
Lines 2573 2617 +44
Branches 475 479 +4
==========================================
+ Hits 1876 1891 +15
- Misses 528 557 +29
Partials 169 169
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Good suggestion. I added an optimization where we only iterate over vertices that have nonempty rows and nonempty columns. I think this behaves correctly, but would appreciate if somebody could verify it. I also introduced another temporary matrix to hold the outer product. We then drop the diagonal values from them. All this performs better or similar in my limited benchmarking. Our strategy of keeping things sparse is probably reasonable, because what if there are multiple groups of connected components? The final result frequently may not be dense. With the goal of "keeping things sparse", I wonder if there are any heuristics we could employ, such as iterating over vertices with small degrees first. |
A, row_degrees, column_degrees = G.get_properties("offdiag row_degrees- column_degrees-") | ||
nonempty_nodes = binary.pair(row_degrees & column_degrees).new(name="nonempty_nodes") | ||
else: | ||
A, nonempty_nodes = G.get_properties("offdiag degrees-") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we use some shorthand notation here. "degrees-"
does not include self-edges (i.e., diagonals), but "degrees+"
does include self-edges.
Row = Matrix(dtype, nrows=1, ncols=n, name="Row") | ||
Col = Matrix(dtype, nrows=n, ncols=1, name="Col") | ||
Outer = Matrix(dtype, nrows=n, ncols=n, name="temp") | ||
for i in nonempty_nodes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Not sure about this, it could lead to gotchas if users don't expect sorting to happen. (and sorting itself could be an issue, e.g. when using distributed graphs) |
CC @jim22k @SultanOrazbayev @LuisFelipeRamos
I made a few minor modifications to the algorithms from what we wrote together today. I'm happy to answer any questions. I tinkered around a little to make things faster.
We can probably get this to work with
dask-graphblas
too, which I think would be pretty interesting, because it can create a massive, distributed matrix. It may not be the best way to compute APSP, but a way is better than no way at all :)See the original LAGraph version of Floyd-Warshall here:
https://github.com/GraphBLAS/LAGraph/blob/ed55a49ee7138d2b5a6c5eb4329ccd0bf9e4ac17/old/experimental_algorithm/LAGraph_FW.c
I'll try to benchmark and compare this with a NumPy implementation on a beefy machine with lots of memory.