Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix edge_insensitive homophily measure #4152

Merged
merged 6 commits into from
Mar 10, 2022
Merged

Conversation

jinjh0123
Copy link
Contributor

Regarding #3977

I found some gaps between the definition of the feat homophily measure in the paper and the implementation of it in #3977, so I suggest to fix them.

  1. The paper suggested the measure as 'summation over C classes (0 to C-1) and division by C-1'.
    However, the code has used the mean operator, which leads to wrong calculation.

  2. To compute h_k value, the code has used edge homophily function, but I think the definition of h_k is different from it.
    We have to consider the degrees of each node, instead of the number of edges.

I think the answers of the test cases need to be corrected too.

@rusty1s
Copy link
Member

rusty1s commented Feb 25, 2022

Thank you. I think you are correct regarding C-1. Sorry for overlooking that. It further looks like you have implemented graph-specific computation of num_classes. Do you think this is needed?

I'm not sure why we can't make use of edge_homophily function TBH. Your code looks exactly the same (except for duplicating via torch.cat() which shouldn't make any difference). Can you clarify?

@jinjh0123
Copy link
Contributor Author

jinjh0123 commented Mar 3, 2022

  1. First, I would clarify the difference between the edge homophily and the class-wise homophily metric h_k in the formula (2) in the paper. The edge_homophily function counts each of intra-class edges only once, but they need to be counted twice in h_k since it is defined in terms of the sum of degrees.
  • For example, consider a graph (in the test case) with edge_index = [[0, 1, 2, 3], [1, 2, 0, 4]] and y = [0, 0, 0, 0, 1]. For the class 0, following the definition from the paper, we should get:

h_0 = (2 + 2 + 2 + 0) / (2 + 2 + 2 + 1) = 6 / 7.

  • But if we use edge_homophily function, we get:

(1 + 1 + 1 + 0) / (1 + 1 + 1 + 1) = 3 / 4.

  • Thus, I've separated the code from edge_homophily to compute the class-wise homophily correctly.
  1. I have introduced num_classes for the batch mode. We need to divide each sum by C - 1 where C is the number of classes and it can be different for each graph in a batch. However, I've just realized that it is duplicated with max_num_classes for non-batch mode (where there is only one graph in the batch).

@rusty1s
Copy link
Member

rusty1s commented Mar 3, 2022

Thanks for your reply!

For example, consider a graph (in the test case) with edge_index = [[0, 1, 2, 3], [1, 2, 0, 4]] and y = [0, 0, 0, 0, 1]. For the class 0, following the definition from the paper, we should get:
h_0 = (2 + 2 + 2 + 0) / (2 + 2 + 2 + 1) = 6 / 7.
But if we use edge_homophily function, we get:
(1 + 1 + 1 + 0) / (1 + 1 + 1 + 1) = 3 / 4.
Thus, I've separated the code from edge_homophily to compute the class-wise homophily correctly.

It is not exactly clear to me how you derive at 6/7 for h_0. To me, this looks like the edge homophily function restricted to a certain class. In particular, there exists no node with degree higher than 1, which leads to:

h_0 = (1 + 1 + 1 + 0 + 0) / (1 + 1 + 1 + 1 + 0) = 3 / 4

The paper also mentions that they "focus on the edge homophily in this work".

@jinjh0123
Copy link
Contributor Author

Oh, I computed h_k assuming the graph is undirected. Sorry for the confusion.

@rusty1s
Copy link
Member

rusty1s commented Mar 3, 2022

Ah, I see, that makes sense. Thanks for your reply. What does that mean for this PR? Is the current implementation still wrong in your opinion?

@jinjh0123
Copy link
Contributor Author

jinjh0123 commented Mar 3, 2022

No, we can maintain to use edge homophily function to compute h.

In conclusion, I think we need to

  1. change 'mean' to 'sum over C classes and div by C-1', and the test cases, accordingly, and
  2. allow different num_classes in the batch mode, since even they are in the same batch, some graphs can have no nodes for certain classes.

@rusty1s
Copy link
Member

rusty1s commented Mar 4, 2022

Sounds good. Thank you! Let me know if you want to update the PR accordingly.

Copy link
Member

@rusty1s rusty1s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@rusty1s rusty1s changed the title Fix feat homophily measure Fix homophily measure Mar 10, 2022
@rusty1s rusty1s changed the title Fix homophily measure Fix edge_insensitive homophily measure Mar 10, 2022
@rusty1s rusty1s merged commit e493a68 into pyg-team:master Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants