-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
executable file
·82 lines (55 loc) · 2.44 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
This is the extenstions plugin for PSBLAS, version 1.3.1.
Version 1.3.1 supports versions of CUDA up to 10, or from 11.3.
In version 11, support for HYB has been dropped.
This package contains:
1. Extended matrix formats: ELLPACK, Hacked ELLPACK, DIAgonals, Hacked
DIAgonals. Note: DIA and HDIA have limited support.
2. A GPU plugin: gpu-enabled versions of the above, with the CUDA code
from https://github.com/davidebarbieri/spgpu, plus interfaces to
CSR and HYB formats available in the NVIDIA CuSPARSE lib.
Note: DIAG and HDIAG have limited support.
3. RSB: an interface to http://sourceforge.net/projects/librsb
The architectural ideas of the GPU plugin are explained in
V. Cardellini, S. Filippone and D. Rouson
Design Patterns for Scientific Computations on
Sparse Matrices
Scientific Computing, 22(2014), pp. 1--19.
The architecture of the Fortran 2003 sparse BLAS is described in
S. Filippone, A. Buttari:
Object-Oriented Techniques for Sparse Matrix Computations in Fortran
2003,
ACM Trans. on Math. Software, vol. 38, No. 4, 2012.
PREREQUISITES
To build this code you need to have PSBLAS 3.3.0 or later, together
with its prerequisites.
To make use of the NVIDIA GPU you'll need:
1. An installation of the CUDA toolkit (version 4.1 or later);
2. The SPGPU code from http://spgpu.googlecode.com
INSTALLING
./configure --prefix=/path/to/install \
--with-psblas=/path/to/PSBLAS/install \
--with-cuda=/CUDA/install \
--with-spgpu=/SPGPU/install \
--with-librsb=/LIBRSB/install
make;
make install
Note: we have only tested with GNU Fortran compiler.
Note: CUDA nvcc typically lags behind the latest versions of GCC/GNU
Fortran; currently nvcc supports GCC 4.8 so this is the preferred choice.
Mixing SPGPU CUDA code compiled with an older version and the rest with
e.g. 4.9 has worked fine so far: YMMV.
TODO
Improve MPI support.
WHAT IS NOT HERE
Good preconditioners for the GPU. Performance of triangular system
solves on the GPU is very bad: we enable it in CSRG and HYBG, we do
not even bother to implement it in ELG and HLG.
So if you use the GPU, you are limited to no preconditioning, or
diagonal scaling. We are working on an independent plugin for mld2p4
(www.mld2p4.it) that will deliver better alternatives based on
approximate inverses.
Report bugs to:
https://github.com/sfilippone/psblas3-ext/issues
Contributors
Salvatore Filippone
Alessandro Fanfarillo