-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Roadmap] Support multiple node/edge type sampling using NeighborLoader
and friends
#4765
Comments
Thanks @Padarn.
Can you clarify? I think just lifting the current interface to support lists of types as input is totally fine. In addition, I am not sure what "Support multiple node types in One challenge (which may reduce runtime) is to define what we pass to the parent constructor in super().__init__([("paper, 0), ("paper, 1), ...], ...) and convert this back to dictionaries of node indices. This may be expensive to compute since we are operating on Python primitives here, so I would appreciate if we could do some benchmarking first. Let me know your thoughts! |
Thanks, fixed.
Sorry I mistakenly remembered that we stored the index and labels without edge type. I've updated this to make it clear. |
Very good point. Will add an item for this. |
I ran some benchmarks using the new
Here are some results. Old Version
New Version
So its definitely not free. |
I guess the obvious way to do this would be to create a flat graph, like
becomes
The offsets could be added as a property to the |
Thanks for the benchmarks. They don't look that worrying :) Ideally, we could use the current scheme when only one node type is used, and use the slower variant whenever this is not the case. I think your proposed solution of a flat graph won't necessarily work as you will still need to re-construct the the node type for each index in a later step. |
True. I was thinking it might be quicker as it could probably be done in tensor operations, but I see that might not actually be any better.
Agreed! |
NeighborLoader
and friendsNeighborLoader
and friends
🚀 The feature, motivation and pitch
The goal of this roadmap is to understand how we might support sampling across multiple node or edge types in
NeighborLoader
andLinkNeighborLoader
. As an example, currently, the constructor ofNeighborLoader
takes a singleNodeType
with an optional index. For exampleHowever, a user (for example #4707) may reasonably want to instead sample from multiple edge types, so we would like to propose a way to support this directly. Thus we would like to support this for both
NeighborLoader
andLinkNeighborLoader
.Interface
A proposed interface for this could be to extend the supported input types to include a tuple of tuples for the sampling index. This would probably mean also supporting different negative sampling ratios. An example of the new interface would be:
The output of this for
NeighborLoader
could remain the same, but the sampling would be extended to cover the different input types.For the case of
LinkNeighborLoader
, the interface change could be similarFor the
LinkNeighborLoader
we store dictionariesdata[type].edge_label
anddata[type].edge_label_index
to allow users to keep track of which edges the sampled edges correspond toThis is easy to extend to multiple types. But we may want to implement a simple way to find out which edge types were sampled in a given minibatch.
Implementation
To implement this we need to decide whether sampling happens randomly across all types in a batch, or whether the sampling is done for each type independently. To keep it simple for now, we would keep a single
batch_size
and assume the user wants to sample randomly from all of the types/indexes provided, so for examplewould return a random sample of
N
nodes fromunion([index, index2])
.Risks
This adds complication in usage, as to effectively predict on a mini-batch you would need to compute the heads for each node/edge type. This needs to be explored through some examples and help from the community.
Roadmap
NeighborLoader
.NeighborLoader
. Support multiple node type sampling inNeighborLoader
#5013NeighborLoader
.LinkNeighborLoader
.LinkNeighborLoader
.LinkNeighborLoader
.negative_sampling_ratio
inLinkNeighborLoader
for different edge types.NeighborLoader
and friends #4765 (comment)Any feedback and help from the community is highly appreciated!
cc: @rusty1s
The text was updated successfully, but these errors were encountered: