-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better Train, test, val split for graphs #158
Comments
Right now for graph level tasks the For node and edge tasks we store the masks in the single graph metadata. Notice that for node/edge level tasks we are not really splitting a graph in two, we just have masks that we apply on the model outputs. The same dataset can define multiple splits, we just have to define entries in the datasets or graph metadata, e.g. "edge_train_mask", "node_train_mask". So it seems to me we basically have already what you ask for, maybe we just have to write down a naming convention scheme for the masks, and also specify in the docs the mask types (I think we have boolean masks for most graph dataset, the alternative being a list of indices). To make graph datasets consistent with non-graph datasets, for graph-level tasks (where the datasets contains multiple graphs) we should allow the selection of the split at construction time, as we do with |
Yes, we already have a lot of things done. We just need to document it and solidify it. On, another note is
|
For MNIST the splits are actually separate files, so no shared computation, but for other datasets this is not the case and that's not ideal. Storing processed datasets would be the path forward. The thing is we already went through some breaking changes recently so I really want to avoid others in the near future, so let's see in which case the |
Problems
Proposed solution
I propose the following API:
splits = dataset.metadata.split
value
of the split.value
of the split a dict or named tuple.This should ideally handle most of the cases. We only support non-dynamic graphs for now, I have not worked with dynamic graphs. Most likely we can support splits with little to no changes.
This recursive structure of splits can be confusing for the end-user. We can support APIs in MLUtils.jl like:
Keeping this a discussion for now.
cc: @CarloLucibello
The text was updated successfully, but these errors were encountered: