Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add community detection algorithms #604

Open
nwlandry opened this issue Oct 20, 2024 · 10 comments
Open

Add community detection algorithms #604

nwlandry opened this issue Oct 20, 2024 · 10 comments
Labels
new feature New feature or request

Comments

@nwlandry
Copy link
Collaborator

No description provided.

@nwlandry nwlandry added the new feature New feature or request label Oct 20, 2024
@thomasrobiglio
Copy link
Collaborator

Just to get the ball rolling, I have been looking at what the other libraries have about this.

HGX

  • Hy-MMSBM (hypergraph mixed-membership stochastic block model). They do the fitting part using EM.
  • Hy-SC (spectral clustering). I'd have to read the paper for the details.
  • Hypergraph-MT (mixed memberships, assortative structures), also here EM.
  • Hyperlink communities. Hierarchical clustering on hyperedges distances/similarities.

HypernetX

  • Modularity stuff. They have two methods that apply Louvain to maximize the extension of modularity to HGs (I think the one defined here--I have just skimmed the papers).

Which ones do we want to implement? Do we have a priority list or something of the kind?

@maximelucas
Copy link
Collaborator

I would go first for the most "basic" and standard one. And by standard I mean if not standard in hypergraphs yet, that its potential analogue in graphs is standard. Bonus points if it's not the slowest.

Any idea which one would fit? For example I like the mixed-membership ones, but maybe mixed-membership is already kind of more advanced?

@nwlandry
Copy link
Collaborator Author

I think Louvain-style algorithm would be simplest. In fact, I was hoping to implement the Kaminski et al. paper at one point: https://arxiv.org/abs/1810.04816. Maybe this is a good place to start?

@thomasrobiglio
Copy link
Collaborator

thomasrobiglio commented Oct 23, 2024

mmm yes... modularity and link communities (allows for mixed memberships on the nodes @maximelucas ) should be quite easy to implement and relatively fast... about the standard question, it depends on who you ask (aaaargh).

@thomasrobiglio
Copy link
Collaborator

but yeah, I guess we can have them... if no one calls dibs I can work on it.

@kaiser-dan
Copy link
Contributor

  • Hy-SC (spectral clustering). I'd have to read the paper for the details.

I've been working with this spectra for a while now and could contribute to using it for clustering. The paper suggests a heuristic for clustering that boils down to running K-means on the first s-many components of the eigenvectors. From the pyproject.toml file, it doesn't look like scikit-learn is a dependency, so I could either implement a basic version of K-means as part of the addition here or it could be added. It seems like a bit much to add for just K-means, though...

Let me know what yall think and I can maybe make the PR this weekend!

@thomasrobiglio
Copy link
Collaborator

Hey @kaiser-dan, this would be great! I have been procrastinating this for too long :/

idk whats the best way to proceed about the sklearn dependency... let's see what the others have to say

@nwlandry
Copy link
Collaborator Author

If k-means isn't too bad to implement, I say we include it as a small helper function in the same module and avoid the sklearn dependency. Thanks @kaiser-dan! This is a long-overdue issue.

@kaiser-dan
Copy link
Contributor

Okie I have started drafting some architecture to address this in the draft PR #665. I will work on it more this weekend, but I wanted to get your thoughts on it before I get too far in.

The short of it is that I suspect the numerous algorithms for community detection and their evaluation would necessitate a new module. I added that in the draft PR with the commdetect module (I'm not crazy about the name); I didn't find anything about setting up whole new modules in the contributing guide, but I realize a change like this would include significant contextual changes in the docs, tutorials, etc.

Do you all think I should keep developing this alongside the particular spectral_clustering algorithm, hopefully making our lives easier when we get to the other algorithms, or should I stick to a more minimal change? I am happy to do either!

@nwlandry
Copy link
Collaborator Author

I think that this sounds right. A module with corresponding utility functions (either in utils or in the module itself) sounds like the best path forward. Maybe I would call the module communities, but I'm not sure what other libraries do. Thanks for spearheading this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants