Enable edge based temporal sampling in torch_geometric.distributed
#8428
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR enables edge based temporal distributed training for node and link sampling.
Comment about the edge temporal data definition:
In the case of distributed training, it is necessary to create a separate vector for each partition that will store the time information of the edges included in the partition. (I mention this just to point out that this works differently than with node-based temporal sampling, where we can have one vector common to each partition because we operate on node ids.)
Why:
Each partition has its own unique edge_index in COO format, which is later converted to a matrix in CSR/CSC format in the neighbor sampler. Therefore, we do not have information about the global edge IDs when sampling and we would not be able to find the correct time information for a specific edge. Therefore, this information must be local.
Changes made:
seed_time
needs to be specified (requirement for edge level temporal sampling)