pyg-lib 0.1.0: Optimized neighborhood sampling and heterogeneous GNN acceleration
We are proud to release pyg-lib==0.1.0
, the first stable version of our new low-level Graph Neural Network library to drive all CPU and GPU acceleration needs of PyG πππ
Extensive documentation is provided here. Once pyg-lib
is installed, it will get automatically picked up by PyG, e.g., during neighborhood sampling or during heterogeneous GNN execution, and will accelerate its computation.
Installation
You can install pyg-lib
as described in our README.md
:
pip install pyg-lib -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
where
${TORCH}
should be replaced by either1.11.0
,1.12.0
or1.13.0
${CUDA}
should be replaced by eithercpu
,cu102
,cu113
,cu115
,cu116
orcu117
The following combinations are supported:
PyTorch 1.13 | cpu |
cu102 |
cu113 |
cu115 |
cu116 |
cu117 |
---|---|---|---|---|---|---|
Linux | β | β | β | |||
Windows | ||||||
macOS | β |
PyTorch 1.12 | cpu |
cu102 |
cu113 |
cu115 |
cu116 |
cu117 |
---|---|---|---|---|---|---|
Linux | β | β | β | β | ||
Windows | ||||||
macOS | β |
PyTorch 1.11 | cpu |
cu102 |
cu113 |
cu115 |
cu116 |
cu117 |
---|---|---|---|---|---|---|
Linux | β | β | β | β | ||
Windows | ||||||
macOS | β |
Highlights
pyg_lib.sampler
: Optimized homogeneous and heterogeneous neighborhood sampling
pyg-lib
provides fast and optimized CPU routines to iteratively sample neighbors in homogeneous and heterogeneous graphs, and heavily improves upon the previously used neighborhood sampling techniques utilized in PyG. For example, it pre-allocates random numbers, uses vector-based mapping for nodes in smaller node types, leverages a faster hashmap implementation, etc. Overall, it achieves speed-ups of about 10x-15x:
pyg_lib.sampler.neighbor_sample(
rowptr: Tensor,
col: Tensor,
seed: Tensor,
num_neighbors: List[int],
time: Optional[Tensor] = None,
seed_time: Optional[Tensor] = None,
csc: bool = False,
replace: bool = False,
directed: bool = True,
disjoint: bool = False,
temporal_strategy: str = 'uniform',
return_edge_id: bool = True,
)
and
pyg_lib.sampler.hetero_neighbor_sample(
rowptr_dict: Dict[EdgeType, Tensor],
col_dict: Dict[EdgeType, Tensor],
seed_dict: Dict[NodeType, Tensor],
num_neighbors_dict: Dict[EdgeType, List[int]],
time_dict: Optional[Dict[NodeType, Tensor]] = None,
seed_time_dict: Optional[Dict[NodeType, Tensor]] = None,
csc: bool = False,
replace: bool = False,
directed: bool = True,
disjoint: bool = False,
temporal_strategy: str = 'uniform',
return_edge_id: bool = True,
)
pyg_lib.sampler.neighbor_sample
and pyg_lib.sampler.hetero_neighbor_sample
recursively sample neighbors from all node indices in seed
in the graph given by (rowptr, col)
. Also supports temporal sampling via the time
argument, such that no nodes will be sampled that do not fulfill the temporal constraints as indicated by seed_time
.
pyg_lib.ops
: Heterogeneous GNN acceleration
pyg-lib
provides efficient GPU-based routines to parallelize workloads in heterogeneous graphs across different node types and edge types. We achieve this by leveraging type-dependent transformations via NVIDIA CUTLASS integration, which is flexible to implement most heterogeneous GNNs with, and efficient, even for sparse edge types or a large number of different node types:
segment_matmul(inputs: Tensor, ptr: Tensor, other: Tensor) -> Tensor
pyg_lib.ops.segment_matmul
performs dense-dense matrix multiplication according to segments along the first dimension of inputs
as given by ptr
.
inputs = torch.randn(8, 16)
ptr = torch.tensor([0, 5, 8])
other = torch.randn(2, 16, 32)
out = pyg_lib.ops.segment_matmul(inputs, ptr, other)
assert out.size() == (8, 32)
assert out[0:5] == inputs[0:5] @ other[0]
assert out[5:8] == inputs[5:8] @ other[1]
Full Changelog
Added
- Added PyTorch 1.13 support (#145)
- Added native PyTorch support for
grouped_matmul
(#137) - Added
fused_scatter_reduce
operation for multiple reductions (#141, #142) - Added
triton
dependency (#133, #134) - Enable
pytest
testing (#132) - Added C++-based autograd and TorchScript support for
segment_matmul
(#120, #122) - Allow overriding
time
for seed nodes viaseed_time
inneighbor_sample
(#118) - Added
[segment|grouped]_matmul
CPU implementation (#111) - Added
temporal_strategy
option toneighbor_sample
(#114) - Added benchmarking tool (Google Benchmark) along with
pyg::sampler::Mapper
benchmark example (#101) - Added CSC mode to
pyg::sampler::neighbor_sample
andpyg::sampler::hetero_neighbor_sample
(#95, #96) - Speed up
pyg::sampler::neighbor_sample
viaIndexTracker
implementation (#84) - Added
pyg::sampler::hetero_neighbor_sample
implementation (#90, #92, #94, #97, #98, #99, #102, #110) - Added
pyg::utils::to_vector
implementation (#88) - Added support for PyTorch 1.12 (#57, #58)
- Added
grouped_matmul
andsegment_matmul
CUDA implementations viacutlass
(#51, #56, #61, #64, #69, #73, #123) - Added
pyg::sampler::neighbor_sample
implementation (#54, #76, #77, #78, #80, #81), #85, #86, #87, #89) - Added
pyg::sampler::Mapper
utility for mapping global to local node indices (#45, #83) - Added benchmark script (#45, #79, #82, #91, #93, #106)
- Added download script for benchmark data (#44)
- Added
biased sampling
utils (#38) - Added
CHANGELOG.md
(#39) - Added
pyg.subgraph()
(#31) - Added nightly builds (#28, #36)
- Added
rand
CPU engine (#26, #29, #32, #33) - Added
pyg.random_walk()
(#21, #24, #25) - Added documentation via
readthedocs
(#19, #20) - Added code coverage report (#15, #16, #17, #18)
- Added
CMakeExtension
support (#14) - Added test suite via
gtest
(#13) - Added
clang-format
linting viapre-commit
(#12) - Added
CMake
support (#5) - Added
pyg.cuda_version()
(#4)
Changed
- Allow different types for graph and timestamp data (#143)
- Fixed dispatcher in
hetero_neighbor_sample
(#125) - Require sorted neighborhoods according to time in temporal sampling (#108)
- Only sample neighbors with a strictly earlier timestamp than the seed node (#104)
- Prevent absolute paths in wheel (#75)
- Improved installation instructions (#68)
- Replaced std::unordered_map with a faster phmap::flat_hash_map (#65)
- Fixed versions of
checkout
andsetup-python
in CI (#52) - Make use of the
pyg_sphinx_theme
documentation template (#47) - Auto-compute number of threads and blocks in CUDA kernels (#41)
- Optional return types in
pyg.subgraph()
(#40) - Absolute headers (#30)
- Use
at::equal
rather thanat::all
in tests (#37) - Build
*.so
extension on Mac instead of*.dylib
(#107)
Full commit list: 6b44207...0.1.0