Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update GraphStore and FeatureStore [1/6] #8083

Merged
merged 23 commits into from
Nov 6, 2023
Merged

Update GraphStore and FeatureStore [1/6] #8083

merged 23 commits into from
Nov 6, 2023

Conversation

kgajdamo
Copy link
Contributor

@kgajdamo kgajdamo commented Sep 27, 2023

This code belongs to the part of the whole distributed training for PyG.

Please be aware that this PR should be merged before Loaders package! - @JakubPietrakIntel
Loaders:

  1. Add base class DistLoader #8079
  2. Add DistributedNeighborLoader [3/6] #8080
  3. Add DistributedLinkNeighborLoader [4/6] #8085

Other PRs related to this module:
DistSampler: #7974

@github-actions github-actions bot added the data label Sep 27, 2023
@kgajdamo kgajdamo changed the title Update Graph Store Update GraphStore and FeatureStore Sep 27, 2023
JakubPietrakIntel pushed a commit that referenced this pull request Sep 27, 2023
this code was moved to another PR #8083
JakubPietrakIntel added a commit that referenced this pull request Sep 27, 2023
@JakubPietrakIntel JakubPietrakIntel mentioned this pull request Sep 27, 2023
2 tasks
@codecov
Copy link

codecov bot commented Sep 27, 2023

Codecov Report

Merging #8083 (70a5594) into master (3eafd5d) will decrease coverage by 0.30%.
The diff coverage is 60.09%.

@@            Coverage Diff             @@
##           master    #8083      +/-   ##
==========================================
- Coverage   87.41%   87.12%   -0.30%     
==========================================
  Files         473      473              
  Lines       28648    28724      +76     
==========================================
- Hits        25043    25025      -18     
- Misses       3605     3699      +94     
Files Coverage Δ
...rch_geometric/distributed/dist_neighbor_sampler.py 0.00% <0.00%> (ø)
torch_geometric/distributed/local_graph_store.py 95.87% <93.75%> (+4.66%) ⬆️
torch_geometric/data/graph_store.py 92.75% <76.31%> (-5.61%) ⬇️
torch_geometric/distributed/utils.py 51.94% <23.52%> (-7.43%) ⬇️
torch_geometric/distributed/partition.py 83.22% <56.66%> (-15.95%) ⬇️
torch_geometric/distributed/local_feature_store.py 57.85% <45.45%> (-22.41%) ⬇️

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

rusty1s added a commit that referenced this pull request Oct 2, 2023
**[1/3] Distributed Loaders PRs** 
This PR includes base class of `DistributedLoader` that handles RPC
connection and handling requests from `DistributedNeighborSampler`
processes.

It includes basic `DistNeighborSampler` functions used by the loader.

1.  #8079
2.  #8080
3.  #8085

Other PRs related to this module:
DistSampler: #7974
GraphStore\FeatureStore:
#8083

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: rusty1s <matthias.fey@tu-dortmund.de>
JakubPietrakIntel added a commit that referenced this pull request Oct 2, 2023
rusty1s added a commit that referenced this pull request Oct 9, 2023
This code belongs to the part of the whole distributed training for PyG.

`DistNeighborSampler` leverages the `NeighborSampler` class from
`pytorch_geometric` and the `neighbor_sample` function from `pyg-lib`.
However, due to the fact that in case of distributed training it is
required to synchronise the results between machines after each layer,
the part of the code responsible for sampling was implemented in python.

Added suport for the following sampling methods:
- node, edge, negative, disjoint, temporal

**TODOs:**

- [x] finish hetero part
- [x] subgraph sampling

**This PR should be merged together with other distributed PRs:**
pyg-lib: [#246](pyg-team/pyg-lib#246),
[#252](pyg-team/pyg-lib#252)
GraphStore\FeatureStore:
#8083
DistLoaders:
1.  #8079
2.  #8080
3.  #8085

---------

Co-authored-by: JakubPietrakIntel <jakub.pietrak@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ZhengHongming888 <hongming.zheng@intel.com>
Co-authored-by: Jakub Pietrak <97102979+JakubPietrakIntel@users.noreply.github.com>
Co-authored-by: Matthias Fey <matthias.fey@tu-dortmund.de>
@rusty1s rusty1s changed the title Update GraphStore and FeatureStore Update GraphStore and FeatureStore Oct 9, 2023
@JakubPietrakIntel
Copy link
Contributor

@rusty1s @ZhengHongming888
I added ac49add which enforces sorting by destination node in LGS edge_index (in other words sort by col, or edge_index[1]) to match CSC format used for sampling. This is condition ensures that input data won't be permuted again at neighbor sampler init.
We do that to make sure that correct edge_ids are returned at sampler's output and in filter_fn in NodeLoader i.e. L161 the expression is evaluated correctly with perm = None.
We have considered alternative solutions and decided that this one will be the best regarding performance, at a cost of adding a constraint on incoming data. We can't have a global perm stored in LGS, as METIS is not performing a simple split, so the ordering would be lost. The other solution would be to use local values per partition in edge_id, hence increasing complexity and adding yet another re-indexing step. Furthermore, it would require RPC requests for perm values located in remote partitions, because the sampler may return edge_ids located in remote partition.

@rusty1s rusty1s changed the title Update GraphStore and FeatureStore Update GraphStore and FeatureStore [1/6] Oct 30, 2023
@rusty1s rusty1s merged commit a2b77e6 into master Nov 6, 2023
14 checks passed
@rusty1s rusty1s deleted the intel/dist-gfs branch November 6, 2023 15:43
rusty1s added a commit that referenced this pull request Nov 10, 2023
**Changes made:**
- added support for temporal sampling
- use torch.Tensors instead of numpy arrays
- move _sample_one_hop() from NeighborSampler to DistNeighborSampler
- do not go with disjoint flow in _sample() function - this is not
needed because batch is calculated after
- added tests for node sampling and disjoint (works without
DistNeighborLoader)
- added tests for node temporal sampling (works without
DistNeighborLoader)
- some minor changes like changing variables names etc

This PR is based on the #8083, so both must be combined to pass the
tests.

Other distributed PRs:
#8083 
#8080 
#8085

---------

Co-authored-by: Matthias Fey <matthias.fey@tu-dortmund.de>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
rusty1s added a commit that referenced this pull request Nov 10, 2023
**[2/3] Distributed Loaders PRs**
This PR includes`DistributedNeighborLoader` used for processing node
sampler output in distributed training setup.


1.  #8079
2.  #8080
3.  #8085

Other PRs related to this module:
DistSampler: #7974
GraphStore\FeatureStore:
#8083

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Matthias Fey <matthias.fey@tu-dortmund.de>
rusty1s added a commit that referenced this pull request Nov 10, 2023
**[3/3] Distributed Loaders PRs**
This PR includes `DistributedLinkNeighborLoader` used for processing
edge sampler output in distributed training setup.


1.  #8079
2.  #8080
3.  #8085

Other PRs related to this module:
DistSampler: #7974
GraphStore\FeatureStore:
#8083

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Matthias Fey <matthias.fey@tu-dortmund.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants