Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

数据集划分问题 #1

Open
rbycntj opened this issue May 25, 2024 · 1 comment
Open

数据集划分问题 #1

rbycntj opened this issue May 25, 2024 · 1 comment

Comments

@rbycntj
Copy link

rbycntj commented May 25, 2024

作者您好!我有几个问题想要请教一下:
(1)数据集中train_idx、valid_idx与test_idx为什么会有企业交叉
(2)对于xxx_hete_graph异构图,具体是如何进行创建的,为什么train_hete_graph中的节点包含train_idx之外的企业节点
谢谢!

@shaopengw
Copy link
Owner

你好,由于整个数据集是从种子节点采样一阶邻居得到的,为了避免随机划分数据集导致过多分割子图的产生,我们在划分数据集时候也是根据种子节点破产的时间(训练集:2014-2018:632,验证集:2019:127,测试集:2020-2021:130)划分,然后对这些种子节点进行一阶邻居提取得到了最终的训练集、验证集和测试集。由于不同的种子节点存在相同的邻居,因此最终的训练集、验证集和测试集存在少量重合的节点。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants