Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PageRank #788

Merged
merged 33 commits into from
May 26, 2023
Merged

Add PageRank #788

merged 33 commits into from
May 26, 2023

Conversation

IvanIsCoding
Copy link
Collaborator

@IvanIsCoding IvanIsCoding commented Jan 23, 2023

Related to #315

Adds an implementation of the PageRank algorithm using sparse matrices. It uses the sprs crate combined with ndarray to implement a Power Method approach of finding the PageRank.

Also, we test this implementation against NetworkX's implementation of the PageRank. We accept all the arguments that NetworkX accepts: tolerance, max_iter, personalization, dangling, etc.

n.b: it's ready for review

@coveralls
Copy link

coveralls commented Jan 23, 2023

Pull Request Test Coverage Report for Build 5086986067

  • 129 of 129 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.06%) to 96.603%

Totals Coverage Status
Change from base Build 5063931930: 0.06%
Covered Lines: 14704
Relevant Lines: 15221

💛 - Coveralls

@IvanIsCoding
Copy link
Collaborator Author

IvanIsCoding commented Jan 23, 2023

Everything should be working now

@IvanIsCoding IvanIsCoding marked this pull request as ready for review January 24, 2023 02:59
@IvanIsCoding IvanIsCoding mentioned this pull request Jan 25, 2023
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks for writing this! I really like using sprs for this, it also avoids us having to figure out how to link against and leverage blas/lapack to compute the eigenvectors of the google matrix from an ndarray. I need to refresh my memory on the algorithm before I do a detailed review on the algorithm code. I just has some quick high level comments from a quick scan of the code.

releasenotes/notes/add-pagerank-bef0de7d46026071.yaml Outdated Show resolved Hide resolved
src/link_analysis.rs Show resolved Hide resolved
.github/workflows/wheels.yml Show resolved Hide resolved
src/link_analysis.rs Outdated Show resolved Hide resolved
@IvanIsCoding
Copy link
Collaborator Author

This is great, thanks for writing this! I really like using sprs for this, it also avoids us having to figure out how to link against and leverage blas/lapack to compute the eigenvectors of the google matrix from an ndarray. I need to refresh my memory on the algorithm before I do a detailed review on the algorithm code. I just has some quick high level comments from a quick scan of the code.

The algorithm is approximating the eigencector of the transition matrix. You might want to check NetworkX’s Python code directly because they have some quirks on how they handle dangling nodes etc.

Co-authored-by: Matthew Treinish <mtreinish@kortar.org>
@mtreinish mtreinish self-assigned this Feb 4, 2023
@mtreinish mtreinish added this to the 0.13.0 milestone May 10, 2023
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in review, this looks excellent to me. The code LGTM and nothing real stands out to me as being incorrect. Just a couple small inline suggestions and questions but other than I think this is ready to merge.

Cargo.lock Show resolved Hide resolved
///
/// :returns: a read-only dict-like object whose keys are the node indices and values are the
/// PageRank score for that node.
/// :rtype: CentralityMapping
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to document that it will raise FailedToConverge if max_iter is reached? It's something people might want to catch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will add the notes for this, HITS, eigenvector centrality and all the centralities that cannot converge in a separate PR

src/link_analysis.rs Outdated Show resolved Hide resolved
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the quick update

@mtreinish mtreinish merged commit afc3627 into Qiskit:main May 26, 2023
@IvanIsCoding IvanIsCoding deleted the pagerank branch May 28, 2023 22:46
@bionicles
Copy link

bionicles commented Jan 18, 2024

gonna put a pin in it and double check everything and maybe make a new issue about it, but somehow i'm getting totally uniform results from this where networkx is (a lot) slower but produces some results that make more sense

@IvanIsCoding
Copy link
Collaborator Author

IvanIsCoding commented Jan 18, 2024

gonna put a pin in it and double check everything and maybe make a new issue about it, but somehow i'm getting totally uniform results from this where networkx is (a lot) slower but produces some results that make more sense

Please open a new issue. If you could give the graph that triggers the difference that would be great as well.

We use the Power Method and NetworkX uses SVD decomposition to approximate the eigenvector so there might be some discrepancies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants