Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jaccard Similarity Biased on Bipartite Graphs #8

Open
AlexMRuch opened this issue Aug 27, 2019 · 6 comments
Open

Jaccard Similarity Biased on Bipartite Graphs #8

AlexMRuch opened this issue Aug 27, 2019 · 6 comments

Comments

@AlexMRuch
Copy link

You should consider using Standardized Co-incident Ratio (SCR) as a measure of similarity instead of the Jaccard Similarity, as the latter is biased in large bipartite graphs whereas the former is not: https://yongrenshi.weebly.com/uploads/5/7/8/6/57861243/ssr_structuralsimilarity.pdf. Should be a simple switch.

@anvaka
Copy link
Owner

anvaka commented Aug 29, 2019

Thank you Alex. I'll check this out in more details.

I briefly skimmed over the article, got puzzled by their definition of Jaccard Index - they say

Jij = d / (b + c + d)

Where

  • d - is number of shared members between
  • b - number of i's neighbors
  • c - number of j's neighbors.

I'd expect denominator to be (b + c - d) not (b + c + d). It is just a typo or they used incorrect Jaccard similarity definition?

@anvaka
Copy link
Owner

anvaka commented Aug 29, 2019

Also a quick follow up question, in case I'm missing it. How do you compute E(d) - the number expected in an otherwise identical randomized network?

@AlexMRuch
Copy link
Author

AlexMRuch commented Aug 29, 2019 via email

@anvaka
Copy link
Owner

anvaka commented Aug 29, 2019

The difference is the denominator

It is very interesting. Any chance you can ELI5 this to me? Given the reddit network of posts and comments, what should be the E(d) function? How do I compute it efficiently?

@AlexMRuch
Copy link
Author

AlexMRuch commented Aug 29, 2019 via email

@no-identd
Copy link

Any update on this? Would be interesting to see how this differs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants