Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Fix race condition in device_scalar::set_value #569

Merged
merged 7 commits into from
Sep 23, 2020
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
- PR #545 Fix build to support using `clang` as the host compiler
- PR #534 Fix `pool_memory_resource` failure when init and max pool sizes are equal
- PR #546 Remove CUDA driver linking and correct NVTX macro.
- PR #569 Correct `device_scalar::set_value` to pass host value by reference to avoid copying from invalid value


# RMM 0.15.0 (26 Aug 2020)
Expand Down
44 changes: 40 additions & 4 deletions include/rmm/device_scalar.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,25 @@ class device_scalar {
* (e.g. using `cudaStreamWaitEvent()` or `cudaStreamSynchronize()`) before and after calling
* this function, otherwise there may be a race condition.
*
* Does not synchronize `stream`.
* This function does not synchronize stream `stream` before returning. Therefore, the object
jrhemstad marked this conversation as resolved.
Show resolved Hide resolved
* referenced by `host_value` should not be destroyed or modified until `stream` has been
* synchronized. Otherwise, behavior is undefined.
*
* @note: This function incurs a host to device memcpy and should be used sparingly.

* Example:
* \code{cpp}
* rmm::device_scalar<int32_t> s;
*
* int v{42};
*
* // Copies 42 to element 0 on `stream`. Does _not_ synchronize
* vec.set_value(v, stream);
* ...
* cudaStreamSynchronize(stream);
* // Synchronization is required before `v` can be modified
* v = 13;
* \endcode
*
* @throws `rmm::cuda_error` if copying `host_value` to device memory fails.
* @throws `rmm::cuda_error` if synchronizing `stream` fails.
Expand All @@ -123,7 +141,7 @@ class device_scalar {
* @param stream CUDA stream on which to perform the copy
*/
template <typename Dummy = void>
auto set_value(T host_value, cudaStream_t stream = 0)
auto set_value(T const &host_value, cudaStream_t stream = 0)
-> std::enable_if_t<std::is_fundamental<T>::value, Dummy>
jrhemstad marked this conversation as resolved.
Show resolved Hide resolved
{
if (host_value == T{0}) {
Expand All @@ -141,7 +159,25 @@ class device_scalar {
* (e.g. using `cudaStreamWaitEvent()` or `cudaStreamSynchronize()`) before and after calling
* this function, otherwise there may be a race condition.
*
* Does not synchronize `stream`.
* This function does not synchronize stream `stream` before returning. Therefore, the object
jrhemstad marked this conversation as resolved.
Show resolved Hide resolved
* referenced by `host_value` should not be destroyed or modified until `stream` has been
* synchronized. Otherwise, behavior is undefined.
*
* @note: This function incurs a host to device memcpy and should be used sparingly.

* Example:
* \code{cpp}
* rmm::device_scalar<int32_t> s;
*
* int v{42};
*
* // Copies 42 to element 0 on `stream`. Does _not_ synchronize
* vec.set_value(v, stream);
* ...
* cudaStreamSynchronize(stream);
* // Synchronization is required before `v` can be modified
* v = 13;
* \endcode
*
* @throws `rmm::cuda_error` if copying `host_value` to device memory fails
* @throws `rmm::cuda_error` if synchronizing `stream` fails
Expand All @@ -150,7 +186,7 @@ class device_scalar {
* @param stream CUDA stream on which to perform the copy
*/
template <typename Dummy = void>
auto set_value(T host_value, cudaStream_t stream = 0)
auto set_value(T const &host_value, cudaStream_t stream = 0)
-> std::enable_if_t<not std::is_fundamental<T>::value, Dummy>
jrhemstad marked this conversation as resolved.
Show resolved Hide resolved
{
_memcpy(buffer.data(), &host_value, stream);
Expand Down