-
Notifications
You must be signed in to change notification settings - Fork 9
/
README-v2.txt
100 lines (68 loc) · 4.11 KB
/
README-v2.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
Upgrading to MAGMA 2.0
----------------------
MAGMA 2.0 introduces queues to each magmablas function, and removes the
magmablas{Set, Get}KernelStream( ) functions. This eliminates a thread safety
issue. Previously, in this situation:
thread 0 thread 1
-------- --------
magmablasSetKernelStream( q0 );
magmablasSetKernelStream( q1 );
magma_zgemm( ... ); magma_ztrmm( ... );
the magma_zgemm in thread 0 would erroneously execute on q1, instead of on q0 as
intended. This could even segfault, if q0 and q1 are on different CUDA devices.
One fix is to use thread-local storage, but this depends on the threading
environment, whether that is pthreads, OpenMP, Windows threads, or anything
else.
Our fix is to pass queues to each routine. This is similar to CUBLAS v2 handles.
thread 0 thread 1
-------- --------
magma_zgemm( ..., q0 ); magma_ztrmm( ..., q1 );
This eliminates the global state and makes the code easier to follow, as each
call explicitly states what queue it is running on.
For backwards compatibility, magma.h continues to provide mostly the original
functionality. (The one significant change still being that MAGMA queues are
opaque structures now, not CUDA streams.) A new header, magma_v2.h, provides the
new functionality with queues passed to each routine.
Steps to take to update code that uses MAGMA:
1) Change the header:
#include <magma.h> ==> #include <magma_v2.h>
2) Creating a MAGMA queue now takes a CUDA device index:
magma_queue_create( &queue ); ==> magma_queue_create( dev_id, &queue );
This is for improved compatibility with OpenCL, which takes a device when
creating a queue.
The queue is now an opaque structure. (Previously it was identical to a CUDA
stream.) You should not access its internal values except through the
provided functions. You can query the queue for its device index:
dev_id = magma_queue_get_device( queue );
For non-portable, CUDA-only code, you can query the CUDA stream, CUBLAS
handle, and CUSPARSE handle:
stream = magma_queue_get_cuda_stream( queue );
handle = magma_queue_get_cublas_handle( queue );
sphandle = magma_queue_get_cusparse_handle( queue );
Changing the CUDA stream associated with the CUBLAS or CUSPARSE handles
will result in undefined behavior.
If need be, you can create a queue from an existing CUDA stream, CUBLAS
handle, and CUSPARSE handle:
magma_queue_create_from_cuda( dev_id, cuda_stream, cublas_handle, cusparse_handle, &queue );
3) NULL queues are not allowed. CUDA has a NULL stream which has extra implicit
synchronization. This is not supported in OpenCL, so for portability, MAGMA
no longer allows NULL queues. You may need to add synchronization:
magma_queue_sync( queue );
in appropriate places if your code relied on implicit NULL stream
synchronization.
4) BLAS functions add a queue argument:
magma_zgemm ( ... ); ==> magma_zgemm ( ..., queue );
magmablas_zlaset( ... ); ==> magmablas_zlaset( ..., queue );
magma_zlarfb ( ... ); ==> magma_zlarfb ( ..., queue );
The magmablas{Set, Get}KernelStream( queue ) functions are now removed.
5) Synchronous set, get, copy functions also take a queue:
magma_zsetmatrix( ... ); ==> magma_zsetmatrix( ..., queue );
The semantics of magma_z{set, get, copy}matrix changed slightly. They take a
queue, and synchronize that queue after the transfer. Previously, the v1
interface also implicitly did a device synchronization before the transfer.
This is not supported for OpenCL, so we changed for better compatibility.
Asynchronous set, get, copy functions (magma_zsetmatrix_async) already took
a queue, so remain unchanged.
6) Most high-level functions interface remain unchanged:
magma_zgetrf ( m, n, A, lda, ipiv, &info ); ==> no change
magma_zgetrf_gpu( m, n, dA, ldda, ipiv, &info ); ==> no change