Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BFS/DFS traversal for directed, as well as undirected graphs #127

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

regexident
Copy link
Contributor

I needed traversal support for graph for a project and figured I might as well open a PR for it, once finished.

I noticed that you had already done work on MS-BFS before, yet later abandoned it? Could say a bit about why you abandoned that PR? When looking for existing benchmarks (to some for traversals) I noticed that that PR was the only place with benchmarks. Would it make sense to pull that stuff out in a PR of its own?

(I couldn't find a CONTRIBUTING file, nor any other contribution guidance with regards to what kind if implementations and level of optimizations you'd expect for PRs on this crate, given its declared focus on performance and scalability.)

@s1ck s1ck self-assigned this Oct 24, 2024
@s1ck
Copy link
Collaborator

s1ck commented Oct 24, 2024

Hey @regexident Thanks for the PR. I will have a look at it within the next two weeks.

@regexident
Copy link
Contributor Author

@s1ck thanks!

Disclaimer: I haven't yet had the time to thoroughly look into #107, but will try to find some time for that in the meanwhile. While I'm fairly experienced in the implementation of efficient low-level algorithms/data structures I haven't been following the latest graph algorithm developments for a while.

To give some context to this PR: I need a bunch of additional algorithms (traversal, clustering, centrality, …) for graph short-term-ish, for which I thus just created regexident/graph-nursery, where I plan to be working on those, with the goal to eventually upstream whatever you find meets your expectations and goals of graph.

@s1ck
Copy link
Collaborator

s1ck commented Nov 27, 2024

@regexident Hey! Sorry for the late reply, I just didn't find time, yet. Could you please rebase the PR, I got rid of clippy and rustc warnings that pollute the checks with errors. Everything that fails after that should be due to errors on your branch.

@s1ck
Copy link
Collaborator

s1ck commented Nov 27, 2024

@regexident I went over the code. I like it in general, however, adding single-threaded algorithms does not really fit into the idea of this project. We initially wrote it as a benchmark to compare with our Java implementation. The idea back then was to port our (parallel) Java implementations to Rust and compare the performance / memory allocations etc. Since there are other Rust libraries out there, that offer single threaded (text book) graph algos, we want to stick to adding parallel algorithms that scale nicely with available cores and graph sizes.

One option I can think of to bring this code in is having a separate crate for single threaded algos that sits on top of graph_builder, but that could just simply be its own project. Maybe @knutwalker has an opinion here?

If you are still interested in contributing a BFS, I suggest having a look into our parallel BFS implementation over in Java land. It is based on the idea of delta stepping, which we initially added in this project, ported it to java and generalized it for unweighted graphs. That would be a great addition to the project!

Since you asked about MSBFS .. there is no real reason that we closed the PR except .. time :) Also, the idea with MSBFS (which we also implemented in Java) is to get closer to the original C++ implementation by using proper SIMD and go beyond 64 concurrent searches (which is our limitation in Java due to long being the largest available primitive). We might pick this up again at some point, but let me know if you are interested and we can collaborate.

@knutwalker
Copy link
Collaborator

One option I can think of to bring this code in is having a separate crate for single threaded algos that sits on top of graph_builder, but that could just simply be its own project. Maybe @knutwalker has an opinion here?

That's an interesting idea. A "textbook/simple" algo implementation crate to explore the API and have a number of various implementations that don't need to adhere to the "high performace/scaling up" requirement.
They could also become part of graph_mate :)

@regexident
Copy link
Contributor Author

@regexident […] Could you please rebase the PR, I got rid of clippy and rustc warnings that pollute the checks with errors.

I opened another PR that addresses clippy warnings from the just-released Rust 1.83.0 over here: #129

Everything that fails after that should be due to errors on your branch.

Just pushed the additional optimizations adapted from my soft-fork regexident/graph-nursery on top of this branch, which should work fine now.

@regexident
Copy link
Contributor Author

@regexident I went over the code. I like it in general, however, adding single-threaded algorithms does not really fit into the idea of this project. We initially wrote it as a benchmark to compare with our Java implementation. The idea back then was to port our (parallel) Java implementations to Rust and compare the performance / memory allocations etc. Since there are other Rust libraries out there, that offer single threaded (text book) graph algos, we want to stick to adding parallel algorithms that scale nicely with available cores and graph sizes.

Agreed! The exact goal (and scope) of this project wasn't exactly clear to me, as the README merely hints at parallelism being a core goal:

Only the README section for graph_builder mentions concurrency/parallelism at all, afaict:

graph_builder provides implementations for directed and undirected graphs. Graphs can be created programatically or read from custom input formats in a type-safe way. The library uses rayon to parallelize all steps during graph creation. The implementation uses a Compressed-Sparse-Row (CSR) data structure which is tailored for fast and concurrent access to the graph topology.

The section on graph merely hints at parallelism being a goal:

graph provides graph algorithms which take graphs created using graph_builder as input. The algorithm implementations are designed to run efficiently on large-scale graphs with billions of nodes and edges.

That said I totally understand the risk of turning graph into a sort of "a jack of all trades", thus losing the focus of parallelism and massive scalability! 👍

One option I can think of to bring this code in is having a separate crate for single threaded algos that sits on top of graph_builder, but that could just simply be its own project. Maybe @knutwalker has an opinion here?

I guess I'll keep things as a soft-fork for now (over at regexident/graph-nursery) and if interest is there one can look into selectively absorbing code into the upstream repo. Either way, no pressure. 🙂

If you are still interested in contributing a BFS, I suggest having a look into our parallel BFS implementation over in Java land. It is based on the idea of delta stepping, which we initially added in this project, ported it to java and generalized it for unweighted graphs. That would be a great addition to the project!

Since you asked about MSBFS .. there is no real reason that we closed the PR except .. time :) Also, the idea with MSBFS (which we also implemented in Java) is to get closer to the original C++ implementation by using proper SIMD and go beyond 64 concurrent searches (which is our limitation in Java due to long being the largest available primitive). We might pick this up again at some point, but let me know if you are interested and we can collaborate.

I currently don't have any pressing need for parallelization for my particular use case (and very little spare time to work on it), but that may change some time in the future!

My thinking was basically that if I was going to write such algorithms for graph, then I might as well share them and check if there's interest in upstreaming, rather than just publishing them in a separate graph-algos crate that lacks discoverability.

One option I can think of to bring this code in is having a separate crate for single threaded algos that sits on top of graph_builder, but that could just simply be its own project. Maybe @knutwalker has an opinion here?

That's an interesting idea. A "textbook/simple" algo implementation crate to explore the API and have a number of various implementations that don't need to adhere to the "high performace/scaling up" requirement. They could also become part of graph_mate :)

Yeah. Right now the practical usefulness of graph is arguably somewhat limited due to the lack of often-needed graph functionality provided (on top of the CSR, which is awesome). As such one could argue that until graph gets parallel traversal having a single-threaded DFS/BFS is still infinitely better than having non at all.

But for something intended to eventually be replaced anyway it would probably indeed make sense to keep things in a separate crate (even if as part of the same repo).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants