-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update GraphStore
and FeatureStore
[1/6]
#8083
Conversation
this code was moved to another PR #8083
Codecov Report
@@ Coverage Diff @@
## master #8083 +/- ##
==========================================
- Coverage 87.41% 87.12% -0.30%
==========================================
Files 473 473
Lines 28648 28724 +76
==========================================
- Hits 25043 25025 -18
- Misses 3605 3699 +94
... and 1 file with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
**[1/3] Distributed Loaders PRs** This PR includes base class of `DistributedLoader` that handles RPC connection and handling requests from `DistributedNeighborSampler` processes. It includes basic `DistNeighborSampler` functions used by the loader. 1. #8079 2. #8080 3. #8085 Other PRs related to this module: DistSampler: #7974 GraphStore\FeatureStore: #8083 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: rusty1s <matthias.fey@tu-dortmund.de>
This code belongs to the part of the whole distributed training for PyG. `DistNeighborSampler` leverages the `NeighborSampler` class from `pytorch_geometric` and the `neighbor_sample` function from `pyg-lib`. However, due to the fact that in case of distributed training it is required to synchronise the results between machines after each layer, the part of the code responsible for sampling was implemented in python. Added suport for the following sampling methods: - node, edge, negative, disjoint, temporal **TODOs:** - [x] finish hetero part - [x] subgraph sampling **This PR should be merged together with other distributed PRs:** pyg-lib: [#246](pyg-team/pyg-lib#246), [#252](pyg-team/pyg-lib#252) GraphStore\FeatureStore: #8083 DistLoaders: 1. #8079 2. #8080 3. #8085 --------- Co-authored-by: JakubPietrakIntel <jakub.pietrak@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: ZhengHongming888 <hongming.zheng@intel.com> Co-authored-by: Jakub Pietrak <97102979+JakubPietrakIntel@users.noreply.github.com> Co-authored-by: Matthias Fey <matthias.fey@tu-dortmund.de>
GraphStore
and FeatureStore
95d394d
to
00b7fd5
Compare
@rusty1s @ZhengHongming888 |
8a68357
to
3fcab74
Compare
GraphStore
and FeatureStore
GraphStore
and FeatureStore
[1/6]
3fcab74
to
70a5594
Compare
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
**Changes made:** - added support for temporal sampling - use torch.Tensors instead of numpy arrays - move _sample_one_hop() from NeighborSampler to DistNeighborSampler - do not go with disjoint flow in _sample() function - this is not needed because batch is calculated after - added tests for node sampling and disjoint (works without DistNeighborLoader) - added tests for node temporal sampling (works without DistNeighborLoader) - some minor changes like changing variables names etc This PR is based on the #8083, so both must be combined to pass the tests. Other distributed PRs: #8083 #8080 #8085 --------- Co-authored-by: Matthias Fey <matthias.fey@tu-dortmund.de> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
**[2/3] Distributed Loaders PRs** This PR includes`DistributedNeighborLoader` used for processing node sampler output in distributed training setup. 1. #8079 2. #8080 3. #8085 Other PRs related to this module: DistSampler: #7974 GraphStore\FeatureStore: #8083 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Matthias Fey <matthias.fey@tu-dortmund.de>
**[3/3] Distributed Loaders PRs** This PR includes `DistributedLinkNeighborLoader` used for processing edge sampler output in distributed training setup. 1. #8079 2. #8080 3. #8085 Other PRs related to this module: DistSampler: #7974 GraphStore\FeatureStore: #8083 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Matthias Fey <matthias.fey@tu-dortmund.de>
This code belongs to the part of the whole distributed training for PyG.
Please be aware that this PR should be merged before Loaders package! - @JakubPietrakIntel
Loaders:
DistLoader
#8079DistributedNeighborLoader
[3/6] #8080DistributedLinkNeighborLoader
[4/6] #8085Other PRs related to this module:
DistSampler: #7974