[Example] Add WholeGraph to accelerate PyG dataloaders with GPUs #9714

chang-l · 2024-10-17T23:04:48Z

This PR demonstrates how to integrate the NVIDIA WholeGraph into PyG’s graph and feature store base classes, providing a modular and PyG-like way to extend PyG's dataloader for better GPU utilization. Let WholeGraph handle the optimization of data access on NVIDIA hardware and manage graph and feature storage, with potential sharding across distributed disk, RAM or device memory.

Compared to existing examples, there are three key differences:

The WholeGraph library does not provide a dataloader but host underlying distributed graph and feature storage with associated efficient primitive operations (e.g., GPU-accelerated fast embedding retrieval and graph sampling).
It is efficient, minimizing CPU interruptions, and can be built into PyG's feature store and graph store (compatible with existing PyG native dataloaders). Please see feature_store.py and graph_store.py implementation.
There is no distinction between single-GPU, multi-GPU, and multi-node multi-GPU training with this new feature store or graph store. Users do not need to partition the graph or hand-craft third-party launch scripts. Everything falls under the traditional PyTorch DDP workflow, The example (papers100m_dist_wholegraph_nc.py or benchmark_data.py) shows how to achieve this from any existing PyG DDP example.

By running benchmark script (benchmark_data.py), we observed 2X, 5X and 9X speedup on single GPU with NVIDIA T4, A100, and H100 GPU, compared to native PyG NeighborLoader. Running with 4 GPUs, the speedups increase to 6.4X, 15X and 35X, respectively (numbers may vary depending on actual CPU used for baseline run).

Meanwhile, given the demonstrated compatibility in this PR and performance benefits, I’d like to propose integrating WholeGraph, as an option, to back data.FeatureStore/HeteroData.FeatureStore first; and to support the WholeMemory type as a new option in index_select function;

pytorch_geometric/torch_geometric/loader/utils.py

Line 57 in 7f844d7

if isinstance(value, Tensor):

, making it (UVA) accessible to more users.

cc. @puririshi98 @TristonC @alexbarghi-nv @linhu-nv @rusty1s

for more information, see https://pre-commit.ci

puririshi98

Overall looks good to me. @rusty1s I wonder if you think it would be a better fit to have these helper files directly integrated into torch_geometric.distributed.wholegraph or something like that.

also @chang-l please remove the stale examples from examples/multi_gpu/

for taobao and pcqm4m examples in the folder, i think it would be best to add a comment to the top mentioning mp.spawn is deprecated and to point to your new examples.
please also update the readme of that folder accordingly.

lastly. please add similar deprecation comment and pointer to new examples for these 2:
https://github.com/pyg-team/pytorch_geometric/blob/master/docs/source/tutorial/multi_gpu_vanilla.rst
https://github.com/pyg-team/pytorch_geometric/blob/master/docs/source/tutorial/multi_node_multi_gpu_vanilla.rst

alexbarghi-nv · 2024-10-22T16:58:36Z

Overall looks good to me. @rusty1s I wonder if you think it would be a better fit to have these helper files directly integrated into torch_geometric.distributed.wholegraph or something like that.

also @chang-l please remove the stale examples from examples/multi_gpu/

these: https://github.com/pyg-team/pytorch_geometric/blob/master/examples/multi_gpu/data_parallel.py https://github.com/pyg-team/pytorch_geometric/blob/master/examples/multi_gpu/distributed_batching.py https://github.com/pyg-team/pytorch_geometric/blob/master/examples/multi_gpu/distributed_sampling.py https://github.com/pyg-team/pytorch_geometric/blob/master/examples/multi_gpu/distributed_sampling_multinode.py https://github.com/pyg-team/pytorch_geometric/blob/master/examples/multi_gpu/distributed_sampling_multinode.sbatch https://github.com/pyg-team/pytorch_geometric/blob/master/examples/multi_gpu/mag240m_graphsage.py https://github.com/pyg-team/pytorch_geometric/blob/master/examples/multi_gpu/papers100m_gcn.py https://github.com/pyg-team/pytorch_geometric/blob/master/examples/multi_gpu/papers100m_gcn_multinode.py

for taobao and pcqm4m examples in the folder, i think it would be best to add a comment to the top mentioning mp.spawn is deprecated and to point to your new examples. please also update the readme of that folder accordingly.

lastly. please add similar deprecation comment and pointer to new examples for these 2: https://github.com/pyg-team/pytorch_geometric/blob/master/docs/source/tutorial/multi_gpu_vanilla.rst https://github.com/pyg-team/pytorch_geometric/blob/master/docs/source/tutorial/multi_node_multi_gpu_vanilla.rst

@puririshi98 can we hold off on this for now? We are having a meeting in a couple hours to discuss this PR and how we want to go about it.

alexbarghi-nv · 2024-10-22T19:54:51Z

@puririshi98 @chang-l We can go ahead and instruct users to use torchrun and the example WG Graph/Feature stores. At some point, we will replace the ones in the examples directory with official ones that are part of cugraph. Our long-term strategy, I think, based on our discussion, is to have this take over feature storage in cuGraph. The cuGraph loaders will remain for users that need them for extreme scale applications. Then, for sampling, we will eventually replace the WholeGraph samplers with cuGraph ones once our C++ code can support custom partitioning schemes.

chang-l · 2024-10-22T20:18:48Z

Okay, I guess, from our side, we can keep this PR as it is (as one of distributed example) for now and gradually merge it within cuGraph along the way while keeping the examples up to date. Sounds good? @alexbarghi-nv @puririshi98 @TristonC @BradReesWork

alexbarghi-nv · 2024-10-22T20:20:02Z

@chang-l sounds good to me 👍

chang-l · 2024-10-22T20:38:26Z

@puririshi98 Thank you Rishi for the suggestions. I will file another PR to update and reorg existing multiGPU/multi-node examples.

for more information, see https://pre-commit.ci

chang-l · 2024-11-01T23:49:42Z

examples/distributed/NVIDIA-RAPIDS/cugraph/papers100m_gcn_cugraph_multinode.py

@alexbarghi-nv do you mind add a README file under this directory later?

Add example

1240df9

chang-l requested a review from wsad1 as a code owner October 17, 2024 23:04

pre-commit-ci bot and others added 3 commits October 17, 2024 23:06

[pre-commit.ci] auto fixes from pre-commit.com hooks

9f170b0

for more information, see https://pre-commit.ci

Minor fix for typos and comments

1e2bd6f

[pre-commit.ci] auto fixes from pre-commit.com hooks

fb432d8

for more information, see https://pre-commit.ci

puririshi98 requested review from rusty1s, puririshi98 and akihironitta October 22, 2024 16:33

Merge branch 'master' into add-uva-ddp-pyg

f86e75b

puririshi98 requested changes Oct 22, 2024

View reviewed changes

puririshi98 assigned chang-l Oct 22, 2024

Merge branch 'master' into add-uva-ddp-pyg

9ebbd19

alexbarghi-nv mentioned this pull request Nov 1, 2024

[FEA] Add New WholeGraph Graph/Feature Stores to cuGraph-PyG rapidsai/cugraph-gnn#63

Open

chang-l and others added 2 commits November 1, 2024 16:46

Example reorg under NVIDIA RAPIDS folder

12b604b

[pre-commit.ci] auto fixes from pre-commit.com hooks

7193592

for more information, see https://pre-commit.ci

chang-l commented Nov 1, 2024

View reviewed changes

chang-l requested a review from puririshi98 November 1, 2024 23:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Example] Add WholeGraph to accelerate PyG dataloaders with GPUs #9714

[Example] Add WholeGraph to accelerate PyG dataloaders with GPUs #9714

chang-l commented Oct 17, 2024 •

edited

Loading

puririshi98 left a comment •

edited

Loading

alexbarghi-nv commented Oct 22, 2024 •

edited

Loading

alexbarghi-nv commented Oct 22, 2024

chang-l commented Oct 22, 2024

alexbarghi-nv commented Oct 22, 2024

chang-l commented Oct 22, 2024

chang-l Nov 1, 2024

[Example] Add WholeGraph to accelerate PyG dataloaders with GPUs #9714

Are you sure you want to change the base?

[Example] Add WholeGraph to accelerate PyG dataloaders with GPUs #9714

Conversation

chang-l commented Oct 17, 2024 • edited Loading

puririshi98 left a comment • edited Loading

Choose a reason for hiding this comment

alexbarghi-nv commented Oct 22, 2024 • edited Loading

alexbarghi-nv commented Oct 22, 2024

chang-l commented Oct 22, 2024

alexbarghi-nv commented Oct 22, 2024

chang-l commented Oct 22, 2024

chang-l Nov 1, 2024

Choose a reason for hiding this comment

chang-l commented Oct 17, 2024 •

edited

Loading

puririshi98 left a comment •

edited

Loading

alexbarghi-nv commented Oct 22, 2024 •

edited

Loading