add cuStreamSync for async cusolver functs#215
add cuStreamSync for async cusolver functs#215ericlars merged 8 commits intouxlfoundation:developfrom
Conversation
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
|
One solution would be to only sync the stream used by the interop
Would this achieve asynchronous submissions?
If we go this route it may be roll the wait call into
Interesting. Can you provide a reproducer? |
Actually I think just from the observation that it fixes the test failures it must be effectively blocking future submissions (at least those that are touching the same memory as the cusolver function) until the native stream used in the cusolver functions is finished working (It may be that we observe this blocking behaviour because the context could be being created with the
The reproducer is the getrf tests that are calling the getrf function that uses depend_on here: https://github.com/oneapi-src/oneMKL/blob/61312ed98b8208999f99474778d46919c30ef15b/src/lapack/backends/cusolver/cusolver_lapack.cpp#L1350 If the depends_on was syncing the stream then the corresponding tests wouldn't fail. |
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
|
I've now added blocking waits (using |
|
I've found out that these cusolver functions are apparently asynchronous, even though the Nvidia documentations implies that they are synchronous: therefore I think that Therefore I will update the changes made to synchronize the native stream within the host_task and then use |
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
I've done this now. |
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
|
@AidanBeltonS could you check this is all OK? Thanks |
LGTM. |
|
@ericlars what do you think this solution? This means the |
ericlars
left a comment
There was a problem hiding this comment.
Apologies for the delayed response, I've been on an extended vacation. This looks like a really elegant solution, thanks for working on it. I have a better appreciation for the difficulties of asynchronicity and cuda now.
|
attaching log: log_llvm_cusolver_.txt |
Signed-off-by: JackAKirk jack.kirk@codeplay.com
Description
This is a bug fix for failures first identified since the multi-streams implementation of the cuda backend in intel/llvm (failures identified here #209 (comment)):
The failed tests are due to the lack of a stream synchronisation after some cusolver interop functions such as
cusolverdnsgetrfare called fromlapack::cusolver::getrf. Since before the multistreams implementation all queues were effectively in-order using the cuda backend of intel/llvm, syncing streams returned from a queue that did not have thein_orderqueue property was not necessary.The fix is to call:
cudaStream_t currentStreamId; CUSOLVER_ERROR_FUNC(cusolverDnGetStream, err, handle, ¤tStreamId); cuStreamSynchronize(currentStreamId);after the cusolver functions. Since some cusolver functions are apparently asynchronous (and we can't know for sure from the docs which if any are not asynchronous), we have to synchronize the stream after it is used in cusolver calls.