Skip to content

pyg-lib 0.1.0: Optimized neighborhood sampling and heterogeneous GNN acceleration

Compare
Choose a tag to compare
@rusty1s rusty1s released this 30 Nov 07:55
2eab973

We are proud to release pyg-lib==0.1.0, the first stable version of our new low-level Graph Neural Network library to drive all CPU and GPU acceleration needs of PyG πŸŽ‰πŸŽ‰πŸŽ‰

Extensive documentation is provided here. Once pyg-lib is installed, it will get automatically picked up by PyG, e.g., during neighborhood sampling or during heterogeneous GNN execution, and will accelerate its computation.

Installation

You can install pyg-lib as described in our README.md:

pip install pyg-lib -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html

where

  • ${TORCH} should be replaced by either 1.11.0, 1.12.0 or 1.13.0
  • ${CUDA} should be replaced by either cpu, cu102, cu113, cu115, cu116 or cu117

The following combinations are supported:

PyTorch 1.13 cpu cu102 cu113 cu115 cu116 cu117
Linux βœ… βœ… βœ…
Windows
macOS βœ…
PyTorch 1.12 cpu cu102 cu113 cu115 cu116 cu117
Linux βœ… βœ… βœ… βœ…
Windows
macOS βœ…
PyTorch 1.11 cpu cu102 cu113 cu115 cu116 cu117
Linux βœ… βœ… βœ… βœ…
Windows
macOS βœ…

Highlights

pyg_lib.sampler: Optimized homogeneous and heterogeneous neighborhood sampling

pyg-lib provides fast and optimized CPU routines to iteratively sample neighbors in homogeneous and heterogeneous graphs, and heavily improves upon the previously used neighborhood sampling techniques utilized in PyG. For example, it pre-allocates random numbers, uses vector-based mapping for nodes in smaller node types, leverages a faster hashmap implementation, etc. Overall, it achieves speed-ups of about 10x-15x:

Screenshot 2022-11-30 at 08 44 08

pyg_lib.sampler.neighbor_sample(
    rowptr: Tensor,
    col: Tensor,
    seed: Tensor,
    num_neighbors: List[int],
    time: Optional[Tensor] = None,
    seed_time: Optional[Tensor] = None,
    csc: bool = False,
    replace: bool = False,
    directed: bool = True,
    disjoint: bool = False,
    temporal_strategy: str = 'uniform',
    return_edge_id: bool = True,
)

and

pyg_lib.sampler.hetero_neighbor_sample(
    rowptr_dict: Dict[EdgeType, Tensor],
    col_dict: Dict[EdgeType, Tensor],
    seed_dict: Dict[NodeType, Tensor],
    num_neighbors_dict: Dict[EdgeType, List[int]],
    time_dict: Optional[Dict[NodeType, Tensor]] = None,
    seed_time_dict: Optional[Dict[NodeType, Tensor]] = None,
    csc: bool = False,
    replace: bool = False,
    directed: bool = True,
    disjoint: bool = False,
    temporal_strategy: str = 'uniform',
    return_edge_id: bool = True,
)

pyg_lib.sampler.neighbor_sample and pyg_lib.sampler.hetero_neighbor_sample recursively sample neighbors from all node indices in seed in the graph given by (rowptr, col). Also supports temporal sampling via the time argument, such that no nodes will be sampled that do not fulfill the temporal constraints as indicated by seed_time.

pyg_lib.ops: Heterogeneous GNN acceleration

pyg-lib provides efficient GPU-based routines to parallelize workloads in heterogeneous graphs across different node types and edge types. We achieve this by leveraging type-dependent transformations via NVIDIA CUTLASS integration, which is flexible to implement most heterogeneous GNNs with, and efficient, even for sparse edge types or a large number of different node types:

Screenshot 2022-11-30 at 08 44 38

segment_matmul(inputs: Tensor, ptr: Tensor, other: Tensor) -> Tensor

pyg_lib.ops.segment_matmul performs dense-dense matrix multiplication according to segments along the first dimension of inputs as given by ptr.

inputs = torch.randn(8, 16)
ptr = torch.tensor([0, 5, 8])
other = torch.randn(2, 16, 32)

out = pyg_lib.ops.segment_matmul(inputs, ptr, other)
assert out.size() == (8, 32)
assert out[0:5] == inputs[0:5] @ other[0]
assert out[5:8] == inputs[5:8] @ other[1]

Full Changelog

Added
  • Added PyTorch 1.13 support (#145)
  • Added native PyTorch support for grouped_matmul (#137)
  • Added fused_scatter_reduce operation for multiple reductions (#141, #142)
  • Added triton dependency (#133, #134)
  • Enable pytest testing (#132)
  • Added C++-based autograd and TorchScript support for segment_matmul (#120, #122)
  • Allow overriding time for seed nodes via seed_time in neighbor_sample (#118)
  • Added [segment|grouped]_matmul CPU implementation (#111)
  • Added temporal_strategy option to neighbor_sample (#114)
  • Added benchmarking tool (Google Benchmark) along with pyg::sampler::Mapper benchmark example (#101)
  • Added CSC mode to pyg::sampler::neighbor_sample and pyg::sampler::hetero_neighbor_sample (#95, #96)
  • Speed up pyg::sampler::neighbor_sample via IndexTracker implementation (#84)
  • Added pyg::sampler::hetero_neighbor_sample implementation (#90, #92, #94, #97, #98, #99, #102, #110)
  • Added pyg::utils::to_vector implementation (#88)
  • Added support for PyTorch 1.12 (#57, #58)
  • Added grouped_matmul and segment_matmul CUDA implementations via cutlass (#51, #56, #61, #64, #69, #73, #123)
  • Added pyg::sampler::neighbor_sample implementation (#54, #76, #77, #78, #80, #81), #85, #86, #87, #89)
  • Added pyg::sampler::Mapper utility for mapping global to local node indices (#45, #83)
  • Added benchmark script (#45, #79, #82, #91, #93, #106)
  • Added download script for benchmark data (#44)
  • Added biased sampling utils (#38)
  • Added CHANGELOG.md (#39)
  • Added pyg.subgraph() (#31)
  • Added nightly builds (#28, #36)
  • Added rand CPU engine (#26, #29, #32, #33)
  • Added pyg.random_walk() (#21, #24, #25)
  • Added documentation via readthedocs (#19, #20)
  • Added code coverage report (#15, #16, #17, #18)
  • Added CMakeExtension support (#14)
  • Added test suite via gtest (#13)
  • Added clang-format linting via pre-commit (#12)
  • Added CMake support (#5)
  • Added pyg.cuda_version() (#4)
Changed
  • Allow different types for graph and timestamp data (#143)
  • Fixed dispatcher in hetero_neighbor_sample (#125)
  • Require sorted neighborhoods according to time in temporal sampling (#108)
  • Only sample neighbors with a strictly earlier timestamp than the seed node (#104)
  • Prevent absolute paths in wheel (#75)
  • Improved installation instructions (#68)
  • Replaced std::unordered_map with a faster phmap::flat_hash_map (#65)
  • Fixed versions of checkout and setup-python in CI (#52)
  • Make use of the pyg_sphinx_theme documentation template (#47)
  • Auto-compute number of threads and blocks in CUDA kernels (#41)
  • Optional return types in pyg.subgraph() (#40)
  • Absolute headers (#30)
  • Use at::equal rather than at::all in tests (#37)
  • Build *.so extension on Mac instead of *.dylib(#107)

Full commit list: 6b44207...0.1.0