Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add {all_pairs,single_source}_bellman_ford_path_length #44

Merged
merged 13 commits into from
Feb 17, 2023

Conversation

eriknw
Copy link
Member

@eriknw eriknw commented Feb 11, 2023

That is, add these to nxapi:

  • all_pairs_bellman_ford_path_length (only implemented in nxapi for now)
  • single_source_bellman_ford_path_length

I did a slightly different implementation of Bellman-Ford. It seems to run faster in my limited testing. This has not been heavily benchmarked and tuned, so there could be room for improvement.

This algorithm could probably be optimized further when the adjacency matrix is known to be iso-valued.

That is, add these to `nxapi`:
- `all_pairs_bellman_ford_path_length`
- `single_source_bellman_ford_path_length`
@codecov-commenter
Copy link

codecov-commenter commented Feb 11, 2023

Codecov Report

Base: 72.48% // Head: 68.87% // Decreases project coverage by -3.62% ⚠️

Coverage data is based on head (bbcdd07) compared to base (0b649b2).
Patch coverage: 32.46% of modified lines in pull request are covered.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #44      +/-   ##
==========================================
- Coverage   72.48%   68.87%   -3.62%     
==========================================
  Files          72       74       +2     
  Lines        2777     3023     +246     
  Branches      516      574      +58     
==========================================
+ Hits         2013     2082      +69     
- Misses        587      753     +166     
- Partials      177      188      +11     
Impacted Files Coverage Δ
graphblas_algorithms/nxapi/exception.py 26.08% <0.00%> (-2.49%) ⬇️
graphblas_algorithms/classes/_caching.py 41.46% <16.66%> (-2.29%) ⬇️
...s_algorithms/algorithms/shortest_paths/weighted.py 20.14% <20.14%> (ø)
...phblas_algorithms/nxapi/shortest_paths/weighted.py 24.39% <24.39%> (ø)
graphblas_algorithms/classes/graph.py 56.53% <38.80%> (-3.01%) ⬇️
graphblas_algorithms/classes/digraph.py 38.61% <45.45%> (-1.39%) ⬇️
graphblas_algorithms/algorithms/centrality/katz.py 91.17% <50.00%> (ø)
...lgorithms/algorithms/link_analysis/pagerank_alg.py 82.45% <50.00%> (-0.31%) ⬇️
graphblas_algorithms/nxapi/cluster.py 76.69% <64.28%> (-0.86%) ⬇️
graphblas_algorithms/algorithms/core.py 100.00% <100.00%> (ø)
... and 11 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Comment on lines 29 to 34
# Mask is True where cur not in d or cur < d
mask(cur.S, replace) << True # or: `mask << unary.one[bool](cur)`
mask(binary.second) << binary.lt(cur & d)

# Drop values from `cur` that didn't improve
cur(mask.V, replace) << cur
Copy link
Member Author

@eriknw eriknw Feb 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, a very similar pattern to this also showed up in Floyd-Warshall

# Update Outer to only include off-diagonal values that will update D and P.
if is_directed:
Mask << indexunary.offdiag(Outer)
else:
Mask << indexunary.triu(Outer, 1)
Mask(binary.second) << binary.lt(Outer & D)
Outer(Mask.V, replace) << Outer

where we need to set a mask according to a comparison and set the mask to True for new values.

Just thought this was interesting. It's neat to see common patterns.

Comment on lines 12 to 15
def all_pairs_bellman_ford_path_length(G, weight="weight"):
# TODO: what if weight is a function?
# How should we implement and call `algorithms.all_pairs_bellman_ford_path_length`?
# Should we compute in chunks to expose more parallelism?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the best API for a backend implementation (i.e., to have in graphblas_algorithms.algorithms). Returning a Matrix seems risky.

Maybe bellman_ford_path_lengths(G, nodes=None), which will return a Matrix with distances for the specified nodes (default to all nodes). Then, nxapi.all_pairs_bellman_ford_path_length can compute in batches to get increased parallelism (assuming doing so is actually faster).

@eriknw
Copy link
Member Author

eriknw commented Feb 12, 2023

@LuisFelipeRamos, all_pairs_bellman_ford_path_length might be useful to you. Do you only need distances, or do you need more such as predecessors?

I like that we can compare these to scipy.sparse.csgraph algorithms. For SSSP, our Bellman-Ford is about the same speed as (or a little faster than) scipy's Dijkstra SSSP, which I think is impressive. I don't know what's up with scipy.sparse.csgraph.bellman_ford--it's super slow, like 15000x slower.

For all-pairs shortest path, our version with batching Bellman-Ford is 2-3x faster than scipy.sparse.csgraph.dijkstra! Batching gives us more parallelism, and we can do even better with larger batches.

My benchmarks aren't very exhaustive. They're also on my laptop (4 cores, 8 threads). I would expect GraphBLAS to do even better running on a beefier system (such as DGX that has 40 cores / 80 threads).

My variant of SSSP Bellman-Ford is about 3-4x faster than the "simple" GraphBLAS implementation that uses d(min) << min_plus(d @ A) and comparison to previous value. It's interesting that we're able to find optimizations for most algorithms in graphblas-algorithms with just a little bit of experimentation.

I don't know how our all-pairs shortest path with Bellman-Ford compares to Floyd-Warshall except that Bellman-ford is a generator and uses much less memory.

]


def all_pairs_bellman_ford_path_length(G, weight="weight", *, chunksize=128):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're adding a new keyword-only argument to nxapi. @jim22k, do you like chunksize for this?

We have a similar extra keyword-argument nchunks in nxapi.square_clustering that controls how much memory is used during calculation.

Neither one, nchunks or chunksize, changes the result; they're only used to control memory use and performance via available parallelism.

Comment on lines +16 to +19
# Use `offdiag` instead of `A`, b/c self-loops don't contribute to the result,
# and negative self-loops are easy negative cycles to avoid.
# We check if we hit a self-loop negative cycle at the end.
A, has_negative_diagonal = G.get_properties("offdiag has_negative_diagonal")
Copy link
Member Author

@eriknw eriknw Feb 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet another change from the "simple" implementation: don't use diagonal values. I think "offdiag" is a property we'll use regularly, so I think we shouldn't be afraid to use it if it's useful.

The downside is possible extra memory use and some micro-benchmarks may perform worse 🤷‍♂️

(Also, added new properties has_negative_diagonal and has_negative_edges*).

@lframosferreira
Copy link

lframosferreira commented Feb 13, 2023 via email

@eriknw
Copy link
Member Author

eriknw commented Feb 13, 2023

Hi Erik! For my problem, I only need distances. I tested the Floyd-Warshall implementation with my graphs but it was very slow, probably because they are connected, as you said in our meeting. But that Bellman-Ford implementation sounds great! I'll try ASAP!

Cool! I'm eager to see how this turns out. Let us know if you want any help or if I should improve anything in this PR.

Did Floyd-Warshall finish, or was it too large?

@eriknw
Copy link
Member Author

eriknw commented Feb 13, 2023

Also, @LuisFelipeRamos, is your graph directed or undirected, and are the edges all the same value or different values?

@lframosferreira
Copy link

lframosferreira commented Feb 13, 2023 via email

@eriknw
Copy link
Member Author

eriknw commented Feb 14, 2023

They are undirected and unweighted.

Good to know! We should be able to further optimize for this case, which I think is pretty much level-BFS.

@eriknw
Copy link
Member Author

eriknw commented Feb 14, 2023

Good news @LuisFelipeRamos--I just doubled the performance of Bellman-Ford for unweighted graphs! (and also for graph that are iso-valued)

I do this by performing BFS that stores the level. That this is only twice as fast suggests to me that our Bellman-Ford implementation is probably pretty solid.

@lframosferreira
Copy link

That's great news!

@eriknw
Copy link
Member Author

eriknw commented Feb 15, 2023

Alright, I think this is good to go in.

I just updated chunksize="auto" to "all pairs" to automatically choose a reasonable chunksize.

@eriknw eriknw merged commit 1180e1c into python-graphblas:main Feb 17, 2023
@eriknw
Copy link
Member Author

eriknw commented Feb 17, 2023

This is in! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants