Skip to content

Commit 4f0a3df

Browse files
authored
[SYCL][CUDA] Fix unexpected async memcpy (#1798)
When memory buffers are created with either of the following flags * PI_MEM_FLAGS_HOST_PTR_USE * PI_MEM_FLAGS_HOST_PTR_COPY we copy the data to the device. During memory buffer creation we do not have a CUDA stream and therefore call cuMemCpyHtoD which operates on the CUDA default stream. This fix synchronizes with the default stream to ensure that data copying is finished before any other PI operation uses it on a non-default stream. Signed-off-by: Bjoern Knafla <bjoern@codeplay.com>
1 parent 9aa5029 commit 4f0a3df

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

sycl/plugins/cuda/pi_cuda.cpp

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1521,7 +1521,15 @@ pi_result cuda_piMemBufferCreate(pi_context context, pi_mem_flags flags,
15211521
if (piMemObj != nullptr) {
15221522
retMemObj = piMemObj.release();
15231523
if (performInitialCopy) {
1524+
// Operates on the default stream of the current CUDA context.
15241525
retErr = PI_CHECK_ERROR(cuMemcpyHtoD(ptr, host_ptr, size));
1526+
// Synchronize with default stream implicitly used by cuMemcpyHtoD
1527+
// to make buffer data available on device before any other PI call
1528+
// uses it.
1529+
if (retErr == PI_SUCCESS) {
1530+
CUstream defaultStream = 0;
1531+
retErr = PI_CHECK_ERROR(cuStreamSynchronize(defaultStream));
1532+
}
15251533
}
15261534
} else {
15271535
retErr = PI_OUT_OF_HOST_MEMORY;

0 commit comments

Comments
 (0)