Fix `edge_insensitive` homophily measure #4152

jinjh0123 · 2022-02-25T08:22:38Z

Regarding #3977

I found some gaps between the definition of the feat homophily measure in the paper and the implementation of it in #3977, so I suggest to fix them.

The paper suggested the measure as 'summation over C classes (0 to C-1) and division by C-1'.
However, the code has used the mean operator, which leads to wrong calculation.
To compute h_k value, the code has used edge homophily function, but I think the definition of h_k is different from it.
We have to consider the degrees of each node, instead of the number of edges.

I think the answers of the test cases need to be corrected too.

rusty1s · 2022-02-25T11:40:48Z

Thank you. I think you are correct regarding C-1. Sorry for overlooking that. It further looks like you have implemented graph-specific computation of num_classes. Do you think this is needed?

I'm not sure why we can't make use of edge_homophily function TBH. Your code looks exactly the same (except for duplicating via torch.cat() which shouldn't make any difference). Can you clarify?

jinjh0123 · 2022-03-03T06:54:02Z

First, I would clarify the difference between the edge homophily and the class-wise homophily metric h_k in the formula (2) in the paper. The edge_homophily function counts each of intra-class edges only once, but they need to be counted twice in h_k since it is defined in terms of the sum of degrees.

For example, consider a graph (in the test case) with edge_index = [[0, 1, 2, 3], [1, 2, 0, 4]] and y = [0, 0, 0, 0, 1]. For the class 0, following the definition from the paper, we should get:

h_0 = (2 + 2 + 2 + 0) / (2 + 2 + 2 + 1) = 6 / 7.

But if we use edge_homophily function, we get:

(1 + 1 + 1 + 0) / (1 + 1 + 1 + 1) = 3 / 4.

Thus, I've separated the code from edge_homophily to compute the class-wise homophily correctly.

I have introduced num_classes for the batch mode. We need to divide each sum by C - 1 where C is the number of classes and it can be different for each graph in a batch. However, I've just realized that it is duplicated with max_num_classes for non-batch mode (where there is only one graph in the batch).

rusty1s · 2022-03-03T07:09:25Z

Thanks for your reply!

For example, consider a graph (in the test case) with edge_index = [[0, 1, 2, 3], [1, 2, 0, 4]] and y = [0, 0, 0, 0, 1]. For the class 0, following the definition from the paper, we should get:
h_0 = (2 + 2 + 2 + 0) / (2 + 2 + 2 + 1) = 6 / 7.
But if we use edge_homophily function, we get:
(1 + 1 + 1 + 0) / (1 + 1 + 1 + 1) = 3 / 4.
Thus, I've separated the code from edge_homophily to compute the class-wise homophily correctly.

It is not exactly clear to me how you derive at 6/7 for h_0. To me, this looks like the edge homophily function restricted to a certain class. In particular, there exists no node with degree higher than 1, which leads to:

h_0 = (1 + 1 + 1 + 0 + 0) / (1 + 1 + 1 + 1 + 0) = 3 / 4

The paper also mentions that they "focus on the edge homophily in this work".

jinjh0123 · 2022-03-03T08:15:52Z

Oh, I computed h_k assuming the graph is undirected. Sorry for the confusion.

rusty1s · 2022-03-03T08:17:56Z

Ah, I see, that makes sense. Thanks for your reply. What does that mean for this PR? Is the current implementation still wrong in your opinion?

jinjh0123 · 2022-03-03T09:10:15Z

No, we can maintain to use edge homophily function to compute h.

In conclusion, I think we need to

change 'mean' to 'sum over C classes and div by C-1', and the test cases, accordingly, and
allow different num_classes in the batch mode, since even they are in the same batch, some graphs can have no nodes for certain classes.

rusty1s · 2022-03-04T20:42:26Z

Sounds good. Thank you! Let me know if you want to update the PR accordingly.

rusty1s

Fixed.

jinjh0123 added 3 commits February 23, 2022 23:50

Fix torch_geometric/utils/homophily.py: mean -> sum & div_by_c-1

437a2b7

Fix torch_geometric/utils/homophily: edge_homophily -> h_k

9b25806

Fix torch_geometric/utils/homophily.py: batch mode

7c66ab2

Merge branch 'master' into master

bfe994b

rusty1s requested review from wsad1 and rusty1s February 25, 2022 11:49

rusty1s assigned jinjh0123 Feb 25, 2022

rusty1s added bug help wanted 0 - Priority P0 labels Feb 25, 2022

fix test

eaa5afd

rusty1s approved these changes Mar 10, 2022

View reviewed changes

Merge branch 'master' into master

0a4aaa0

rusty1s changed the title ~~Fix feat homophily measure~~ Fix homophily measure Mar 10, 2022

rusty1s changed the title ~~Fix homophily measure~~ Fix edge_insensitive homophily measure Mar 10, 2022

rusty1s merged commit e493a68 into pyg-team:master Mar 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `edge_insensitive` homophily measure #4152

Fix `edge_insensitive` homophily measure #4152

jinjh0123 commented Feb 25, 2022

rusty1s commented Feb 25, 2022 •

edited

Loading

jinjh0123 commented Mar 3, 2022 •

edited

Loading

rusty1s commented Mar 3, 2022 •

edited

Loading

jinjh0123 commented Mar 3, 2022

rusty1s commented Mar 3, 2022

jinjh0123 commented Mar 3, 2022 •

edited

Loading

rusty1s commented Mar 4, 2022

rusty1s left a comment

Fix edge_insensitive homophily measure #4152

Fix edge_insensitive homophily measure #4152

Conversation

jinjh0123 commented Feb 25, 2022

rusty1s commented Feb 25, 2022 • edited Loading

jinjh0123 commented Mar 3, 2022 • edited Loading

rusty1s commented Mar 3, 2022 • edited Loading

jinjh0123 commented Mar 3, 2022

rusty1s commented Mar 3, 2022

jinjh0123 commented Mar 3, 2022 • edited Loading

rusty1s commented Mar 4, 2022

rusty1s left a comment

Choose a reason for hiding this comment

Fix `edge_insensitive` homophily measure #4152

Fix `edge_insensitive` homophily measure #4152

rusty1s commented Feb 25, 2022 •

edited

Loading

jinjh0123 commented Mar 3, 2022 •

edited

Loading

rusty1s commented Mar 3, 2022 •

edited

Loading

jinjh0123 commented Mar 3, 2022 •

edited

Loading