In this repository we provide the source code of our accelerated Sparse Matrix-Matrix multiplication (SpGEMM) implementation, which we desrcribe in "Accelerating Sparse Matrix-Matrix Multiplication with GPU Tensor Cores" [1].
Cmake and CUDA are required. Modification to the CMakeLists.txt files may be necessary, e.g. to change GPU architecture.
Instructions:
- Download the source code into a folder e.g. tSparse-src.
- Create a folder for building the executable in the same directory as tSparse-src, e.g tSparse-release.
- Inside tSparse-release and call cmake:
cmake ../tSparse-src
- Finally, call make:
make -j{number of CPU cores}
In order to test SPGEMM run "spmm" executable. spmm accepts 1 or 2 arguments. In case of 1 argument it performs matrix squaring (A*A). In case of 2 arguments it performs the matrix multiplication (A*B).
./spmm A.mtx
./spmm A.mtx B.mtx
- The first compilation after running cmake may give an error similar to : "error: undefined reference to '__cudaRegisterLinkedBinary_12_spmm_cpp1_ii_handle'". This error is related to the CUDA library that is used by CUDA dynamic parallelism. Running make a second time solves this issue.
- The latest multiplication and counting kernels do not support Volta. The reason is that we use direct access to fragments (instead of through shared memory) for performance reasons.
Orestis Zachariadis (orestis.zachariadis@uco.es)
[1] O. Zachariadis, N. Satpute, J. Gómez-Luna, and J. Olivares, “Accelerating sparse matrix–matrix multiplication with GPU Tensor Cores,” Computers & Electrical Engineering, vol. 88, p. 106848, Dec. 2020, doi: 10.1016/j.compeleceng.2020.106848.
Also in arXiv.