Not use public split on citation networks #1

hengruizhang98 · 2021-03-30T09:14:54Z

Hi, thanks for your nice work. I find that in the original paper you state that you use the public split on the citation networks. However, in this repo it seems that you use random split. Can you explain it?

zekarias-tilahun · 2021-03-30T09:28:26Z

Hi, thank you for your interest.

In line 28 of data.py you can see that we invoke utils.create_masks(data=dataset.data) to create train/val/test masks/splits. If you navigate to create_masks function inside the utils.py module, online 200 you can find that we first check whether the data contains a validation mask (if not hasattr(data, "val_mask")). Since the citation networks data have val_mask attribute, we will not create a new one.

hengruizhang98 · 2021-03-30T09:47:42Z

Thanks for your response. According to my knowledge, in self-supervised setting (use 'cora' dataset as an example), in
pretraining step all the nodes(2708) will be used. In linear evaluation step. Only the training nodes (140) will be used to train the linear classifier, and the testing ndoes(1000) will be used only for evaluation. However, it seems that you split the testing nodes into train/test sets with 0.6/0.4 ratios(600 for train and 400 for test).

zekarias-tilahun · 2021-03-30T10:39:02Z

Oh! I miss understood you question. In that case you're right. We use a random (60/40) split of the test set for the LogisticRegression classifier.

hengruizhang98 · 2021-03-30T11:03:06Z

Yes. So I guess you have to update your codes and manuscripts, to compare fairly with other models.

zekarias-tilahun · 2021-03-30T11:19:19Z

BTW, a study discusses how using different splits will result in significantly different outcomes. Thus, we mention the split to indicate which particular splits of the publicly available ones we used for the three citation datasets. However, I agree that we need to state this clearly in the manuscript and I'll update it! Thank you for bringing this into light.

zekarias-tilahun closed this as completed Mar 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not use public split on citation networks #1

Not use public split on citation networks #1

hengruizhang98 commented Mar 30, 2021

zekarias-tilahun commented Mar 30, 2021 •

edited

Loading

hengruizhang98 commented Mar 30, 2021 •

edited

Loading

zekarias-tilahun commented Mar 30, 2021

hengruizhang98 commented Mar 30, 2021

zekarias-tilahun commented Mar 30, 2021

Not use public split on citation networks #1

Not use public split on citation networks #1

Comments

hengruizhang98 commented Mar 30, 2021

zekarias-tilahun commented Mar 30, 2021 • edited Loading

hengruizhang98 commented Mar 30, 2021 • edited Loading

zekarias-tilahun commented Mar 30, 2021

hengruizhang98 commented Mar 30, 2021

zekarias-tilahun commented Mar 30, 2021

zekarias-tilahun commented Mar 30, 2021 •

edited

Loading

hengruizhang98 commented Mar 30, 2021 •

edited

Loading