Add a few igraph algorithms to run via `scripts/bench.py` #54

eriknw · 2023-03-11T16:00:23Z

See #53 (this PR maybe closes it).

CC @szhorvat. I'm not at all familiar with using igraph, so I'm hoping you can take a look at this. What other algorithms do you think would be easy to compare against? I think we can compare some SSSP and BFS algorithms once #51 is in. Also, how is one supposed to load Matrix Market files into an igraph Graph? Going from *.mtx file to scipy.sparse to networkx to igraph seems less than ideal.

I'm not sure the most fair way to compare PageRank. On one hand, comparing PageRank with default arguments is fair because that's how they will typically get used in practice. On the other hand, we can view PageRank and other centrality measures as ranking metrics (i.e., order them by score), so a fair comparison could be to run iterative versions of the algorithm until the same ranking is achieved.

szhorvat · 2023-03-12T13:54:17Z

Including @ntamas here you might have more feedback. I'll also take a look next week.

I wanted to make it clear again that I asked about benchmarking against igraph only out of curiosity, and to better understand what performance is possible with GraphBLAS. This is not part of the pyOpenSci review.

eriknw · 2023-03-13T15:47:22Z

I wanted to make it clear again that I asked about benchmarking against igraph only out of curiosity, and to better understand what performance is possible with GraphBLAS. This is not part of the pyOpenSci review.

Understood. We have your attention, so I'd like to use it 😉 . I think many people are curious about how igraph compares.

Part of the value proposition of python-graphblas is it provides high performance sparse data structures and operations on them. python-graphblas provides the foundation from which algorithms can be created; graphblas-algorithms is a collection of such algorithms. Users aren't limited to use graphblas-algorithms though--it is possible to explore and invent custom, bespoke, and hopefully fast algorithms that specifically suite ones needs. Some of our earliest (and slightly messy) examples of this are here: https://github.com/metagraph-dev/grblas-recipes/ (someday we'd like to clean up and update some of these to include in our documentation).

So, yeah, I appreciate that you aren't expecting us to benchmark against igraph as part of the review. Nevertheless, if we are able to do so even in a limited fashion, it can help support our claimed value proposition that GraphBLAS can be high-performance, and, hence, a useful tool for people.

Here are results from my initial benchmarking comparing GraphBLAS and igraph using this PR running on a DGX. One should be skeptical of benchmarks in general including these, and I'm sure they don't tell the full story, but the results may be interesting nevertheless:

codecov-commenter · 2023-03-13T20:18:57Z

Codecov Report

Patch coverage has no change and project coverage change: -1.25 ⚠️

Comparison is base (6de1fd6) 69.16% compared to head (8e58900) 67.91%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #54      +/-   ##
==========================================
- Coverage   69.16%   67.91%   -1.25%     
==========================================
  Files          77       77              
  Lines        3152     3151       -1     
  Branches      602      597       -5     
==========================================
- Hits         2180     2140      -40     
- Misses        786      832      +46     
+ Partials      186      179       -7

Impacted Files	Coverage Δ
graphblas_algorithms/nxapi/_utils.py	`57.47% <0.00%> (ø)`

... and 7 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

szhorvat · 2023-03-14T11:14:14Z

scripts/bench.py

+        if backend == "networkx":
+            return G
+        if backend == "igraph":
+            # TODO: is there a better way for igraph to read MM files or scipy.sparse arrays?


Also, how is one supposed to load Matrix Market files into an igraph Graph? Going from *.mtx file to scipy.sparse to networkx to igraph seems less than ideal.

You can use igraph.Graph.Adjacency to convert from a SciPy sparse matrix.

There is no direct support for reading MM files in igraph.

Just what I was looking for, thanks!

szhorvat · 2023-03-14T11:18:31Z

scripts/igraph_impl.py

+    implementation="prpack",
+):
+    if personalization is not None:
+        raise NotImplementedError


Personalization is provided through the reset and reset_vertices parameters of personalized_pagerank()

https://python.igraph.org/en/stable/api/igraph.GraphBase.html#personalized_pagerank

szhorvat · 2023-03-14T11:20:46Z

scripts/igraph_impl.py

+    if nstart is not None:
+        raise NotImplementedError
+    if dangling is not None:
+        raise NotImplementedError


Just confirming that the behaviour is the same as with dangling=None in NetworkX.

szhorvat · 2023-03-14T11:23:14Z

scripts/igraph_impl.py

+    # TODO: check results when `count_zeros=False`
+    mode = "zero" if count_zeros else "nan"


This is correct.

szhorvat · 2023-03-14T12:15:46Z

I'm not sure the most fair way to compare PageRank. On one hand, comparing PageRank with default arguments is fair because that's how they will typically get used in practice.

Probably PageRank is just not the best type of computation for benchmarking ... The results from using the default arguments are still practically useful though.

I just checked what PRPACK does exactly. For graph with less than 128 vertices, it uses Gaussian elimination (which is exact, no tolerance comes into play). For larger graphs, it seems to use a variant of Gauss-Seidel iteration. igraph uses a hard-coded tolerance of $10^{-10}$ here, but tolerances are not comparable between different iterative algorithms anyway.

szhorvat · 2023-03-14T12:18:52Z

Here are results from my initial benchmarking comparing GraphBLAS and igraph using this PR running on a DGX.

Thanks for sharing these results! I assume these were run in parallel, which is fair, since this seems to be one of the major advantages of using GraphBLAS. But it would be interesting to see a serial comparison too.

In igraph, PRPACK can use some parallelism, but it's not very effective. This can be disabled by setting the OMP_NUM_THREADS=1 environment variable. Most other parts of igraph don't run in parallel, except when relying on a parallelized external library (e.g. BLAS).

eriknw · 2023-03-17T01:18:08Z

But it would be interesting to see a serial comparison too.

Agreed! I'll try to get new results this weekend.

This can be disabled by setting the OMP_NUM_THREADS=1 environment variable

Setting this environment variable works for python-graphblas as well, or one can do gb.ss.config["nthreads"] = 1.

eriknw added 2 commits March 11, 2023 08:23

Add a few igraph algorithms to run via scripts/bench.py

973fb1c

clustering in igraph

ec81bd1

eriknw added 2 commits March 13, 2023 08:54

Add --min-n command line option to scripts/bench.py

4bcc817

oops run pre-commit

8e58900

szhorvat reviewed Mar 14, 2023

View reviewed changes

Merge branch 'main' into bench_igraph

76ab622

eriknw mentioned this pull request May 8, 2023

Python-graphblas: high-performance sparse linear algebra for scalable graph analytics pyOpenSci/software-submission#81

Closed

24 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a few igraph algorithms to run via `scripts/bench.py` #54

Add a few igraph algorithms to run via `scripts/bench.py` #54

Uh oh!

eriknw commented Mar 11, 2023

Uh oh!

szhorvat commented Mar 12, 2023 •

edited

Loading

Uh oh!

eriknw commented Mar 13, 2023

Uh oh!

codecov-commenter commented Mar 13, 2023

Uh oh!

szhorvat Mar 14, 2023 •

edited

Loading

Uh oh!

eriknw Mar 17, 2023

Uh oh!

szhorvat Mar 14, 2023

Uh oh!

szhorvat Mar 14, 2023

Uh oh!

szhorvat Mar 14, 2023

Uh oh!

szhorvat commented Mar 14, 2023

Uh oh!

szhorvat commented Mar 14, 2023

Uh oh!

eriknw commented Mar 17, 2023

Uh oh!

Uh oh!

		# TODO: check results when `count_zeros=False`
		mode = "zero" if count_zeros else "nan"

Add a few igraph algorithms to run via scripts/bench.py #54

Are you sure you want to change the base?

Add a few igraph algorithms to run via scripts/bench.py #54

Uh oh!

Conversation

eriknw commented Mar 11, 2023

Uh oh!

szhorvat commented Mar 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eriknw commented Mar 13, 2023

Uh oh!

codecov-commenter commented Mar 13, 2023

Codecov Report

Uh oh!

szhorvat Mar 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eriknw Mar 17, 2023

Choose a reason for hiding this comment

Uh oh!

szhorvat Mar 14, 2023

Choose a reason for hiding this comment

Uh oh!

szhorvat Mar 14, 2023

Choose a reason for hiding this comment

Uh oh!

szhorvat Mar 14, 2023

Choose a reason for hiding this comment

Uh oh!

szhorvat commented Mar 14, 2023

Uh oh!

szhorvat commented Mar 14, 2023

Uh oh!

eriknw commented Mar 17, 2023

Uh oh!

Uh oh!

Add a few igraph algorithms to run via `scripts/bench.py` #54

Add a few igraph algorithms to run via `scripts/bench.py` #54

szhorvat commented Mar 12, 2023 •

edited

Loading

szhorvat Mar 14, 2023 •

edited

Loading