-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about use_valedges_as_input and train_on_subgraph on collab #2
Comments
Thanks for your question.
|
Thanks for your detailed replies!
The output is:
About 4% test edges are filtered out in this script, or equivalently reindexed as the 'self-loops' of node -1 in PLNLP script. With this example script (directly filtering edges) I got similar performance gain (64%->68.5%), which may demonstrate the equivalence between reindexing and filtering. |
Thanks for your reminder.
The output is:
Negative test edges have much larger proportion of "self-loop" pairs than positive test edges. Reindexing with -1 not only influences the positive test edges but also negatvie test edges with larger proportion.
Our main purpose of using 'train_on_subgraph' is to reduce the number of parameters, since the embeddings of unseen nodes in test set are never updated during training process. Please use our code without using "train_on_subgraph" and we will update the code to avoid above situation. |
Thanks for your kind explanations! The negative test edges have much more self-loops. The effect needs to be explored. I will try your code with different settings. |
Thanks for your excellent work on link prediction with GNNs. I have two questions about used tricks on ogbl-collab dataset.
For trick 'use_valedges_as_input':
I note that this trick in original OGB example script contains additional operations: During testing, to obtain scores on training and validation nodes, only raw training edges are used:
https://github.com/snap-stanford/ogb/blob/c8f0d2aca80a4f885bfd6ad5258ecf1c2d0ac2d9/examples/linkproppred/collab/gnn.py#L140
Then augmented training edges including validation edges are used to obtain test scores:
https://github.com/snap-stanford/ogb/blob/c8f0d2aca80a4f885bfd6ad5258ecf1c2d0ac2d9/examples/linkproppred/collab/gnn.py#L166
But in PLNLP implementation, the raw training edges have been replaced by augmented version including validation edges, which means that training, validation and test scores are all based on augmented training edges. The very 'high' reported validation scores (100%@50) seem over-fitted, which are supposed to be close to test scores (~70%@50).
For trick 'train_on_subgraph':
This trick limits the time range of training edges and validation edges to achieve better performance on test edges. However, it seems that test edges are also filtered (>=2010) in PLNLP. It is a bit confusing for me, since the test set is 'modified'.
The text was updated successfully, but these errors were encountered: