Skip to content

Deep-Learning-Profiling-Tools/fasten

Repository files navigation

Fasten: A Library of Fast Segment Operators

Introduction

Fasten is a library aimed at speeding up Heterogeneous Graph Neural Network (HGNN) workloads. The current version of Fasten focuses on improving segmented matrix multiplication, a critical operator in HGNNs. Fasten implements a simple interface, making it easy to integrate with existing graph library PyG with minimal changes. Fasten achieved an average speedup of 13.65x and 4.72x in operator-wise benchmarks compared to CUTLASS and cuBLAS, respectively.

Fasten vs CUTLASS

figure9_fasten_vs_cutlass(pyg)

Fasten vs cuBLAS

figure9_fasten_vs_cublas(torch)

Installation

Build Instructions

Install pytorch nightly and triton nightly. We use relatively new triton features so old triton releases may crash.

pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121
pip install -U --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/ triton-nightly

You may need to build triton from source before proton is distributed with triton's pip wheel.

git clone https://github.com/Deep-Learning-Profiling-Tools/fasten.git && cd fasten
pip install .

Examples

Fasten's segment matrix multiplication operator has been integrated with various HGNN architecture such as RGCN, HGT, RGAT in PyG. Examples on how to run the examples can be found below:

GNN Examples

  • RGCN
cd examples/rgcn
# Without fasten
# Available datasets are: AIFB, MUTAG, BGS, AM
python rgcn.py --device cuda --dataset AIFB
# With fasten
python rgat.py --device cuda --mode fasten --dataset AIFB
  • HGT
cd examples/rgcn
# Without fasten
# Available datasets are: DBLP, Freebase, AIFB, MUTAG, BGS, AM
python rgcn.py --device cuda --example DBLP
# With fasten
python rgat.py --device cuda --mode fasten --example DBLP
  • RGAT
cd examples/rgat
# Without fasten
# Available datasets are: AIFB, MUTAG, BGS, AM
python rgat.py --device cuda --dataset MUTAG
# With fasten
python rgat.py --device cuda --mode fasten --dataset MUTAG

Benchmarking

cd test
pytest -vs test_op.py::test_perf

Compatibility

Supported Platforms

  • Linux

Supported Hardware

  • NVIDIA GPUs (Compute Capability 7.0+)

Software requirements

  • Pytorch >=2.2.0
  • Triton >=3.0.0
  • PyG >=2.6.0

Publication

  • Keren Zhou, Karthik Ganapathi Subramanian, Po-Hsun Lin, Matthias Fey, Binqian Yin, and Jiajia Li. 2024. FASTEN: Fast GPU-accelerated Segmented Matrix Multiplication for Heterogeneous Graph Neural Networks. In Proceedings of the 38th ACM International Conference on Supercomputing (ICS’24), June 4–7, 2024, Kyoto, Japan.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages