Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add {Copy,Move}ToDeviceCache<T> class templates and moveToDeviceAsync function template #43969

Merged
merged 9 commits into from
Dec 18, 2024
22 changes: 22 additions & 0 deletions DataFormats/Portable/interface/PortableHostObject.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,23 +20,45 @@ class PortableHostObject {
using Buffer = cms::alpakatools::host_buffer<Product>;
using ConstBuffer = cms::alpakatools::const_host_buffer<Product>;

static_assert(std::is_trivially_destructible_v<Product>);

PortableHostObject() = delete;

PortableHostObject(edm::Uninitialized) noexcept {}

// Note that in contrast to the variadic template overload, this
// constructor does not initialize the contained object
PortableHostObject(alpaka_common::DevHost const& host)
// allocate pageable host memory
: buffer_{cms::alpakatools::make_host_buffer<Product>()}, product_{buffer_->data()} {
assert(reinterpret_cast<uintptr_t>(product_) % alignof(Product) == 0);
}

template <typename... Args>
PortableHostObject(alpaka_common::DevHost const& host, Args&&... args)
// allocate pageable host memory
: buffer_{cms::alpakatools::make_host_buffer<Product>()},
product_{new(buffer_->data()) Product(std::forward<Args>(args)...)} {
assert(reinterpret_cast<uintptr_t>(product_) % alignof(Product) == 0);
}

// Note that in contrast to the variadic template overload, this
// constructor does not initialize the contained object
template <typename TQueue, typename = std::enable_if_t<alpaka::isQueue<TQueue>>>
PortableHostObject(TQueue const& queue)
// allocate pinned host memory associated to the given work queue, accessible by the queue's device
: buffer_{cms::alpakatools::make_host_buffer<Product>(queue)}, product_{buffer_->data()} {
assert(reinterpret_cast<uintptr_t>(product_) % alignof(Product) == 0);
}

template <typename TQueue, typename... Args, typename = std::enable_if_t<alpaka::isQueue<TQueue>>>
PortableHostObject(TQueue const& queue, Args&&... args)
// allocate pinned host memory associated to the given work queue, accessible by the queue's device
: buffer_{cms::alpakatools::make_host_buffer<Product>(queue)},
product_{new(buffer_->data()) Product(std::forward<Args>(args)...)} {
assert(reinterpret_cast<uintptr_t>(product_) % alignof(Product) == 0);
}

// non-copyable
PortableHostObject(PortableHostObject const&) = delete;
PortableHostObject& operator=(PortableHostObject const&) = delete;
Expand Down
38 changes: 34 additions & 4 deletions DataFormats/Portable/test/test_catch2_portableObjectOnHost.cc
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,40 @@ namespace {

// This test is currently mostly about the code compiling
TEST_CASE("Use of PortableObject<T> on host code", s_tag) {
PortableObject<Test, alpaka::DevCpu> obj(cms::alpakatools::host());
obj->a = 42;
static_assert(std::is_same_v<PortableObject<Test, alpaka::DevCpu>, PortableHostObject<Test>>);

SECTION("Tests") { REQUIRE(obj->a == 42); }
SECTION("Initialize by setting members") {
SECTION("With device") {
PortableObject<Test, alpaka::DevCpu> obj(cms::alpakatools::host());
obj->a = 42;

static_assert(std::is_same_v<PortableObject<Test, alpaka::DevCpu>, PortableHostObject<Test>>);
REQUIRE(obj->a == 42);
}

SECTION("With queue") {
alpaka::QueueCpuBlocking queue(cms::alpakatools::host());

PortableObject<Test, alpaka::DevCpu> obj(queue);
obj->a = 42;

REQUIRE(obj->a == 42);
}
}

SECTION("Initialize via constructor") {
SECTION("With device") {
PortableObject<Test, alpaka::DevCpu> obj(cms::alpakatools::host(), Test{42, 3.14f});

REQUIRE(obj->a == 42);
REQUIRE(obj->b == 3.14f);
}

SECTION("With queue") {
alpaka::QueueCpuBlocking queue(cms::alpakatools::host());
PortableObject<Test, alpaka::DevCpu> obj(queue, Test{42, 3.14f});

REQUIRE(obj->a == 42);
REQUIRE(obj->b == 3.14f);
}
}
}
20 changes: 20 additions & 0 deletions HeterogeneousCore/AlpakaCore/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,26 @@ In the [`fillDescriptions()`](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWG

Also note that the `fillDescription()` function must have the same content for all backends, i.e. any backend-specific behavior with e.g. `#ifdef` or `if constexpr` are forbidden.

### Copy e.g. configuration data to all devices in EDProducer

While the EventSetup can be used to handle copying data to all devices
of an Alpaka backend, for data used only by one EDProducer a simpler
way would be to use one of
* `cms::alpakatools::MoveToDeviceCache<TDevice, THostObject>` (recommended)
* `#include "HeterogeneousCore/AlpakaCore/interface/MoveToDeviceCache.h"`
* Moves the `THostObject` to all devices using `cms::alpakatools::CopyToDevice<THostObject>` synchronously. On host backends the argument `THostObject` is moved around, but not copied.
* The `THostObject` must not be copyable
* This is to avoid easy mistakes with objects that follow copy semantics of `std::shared_ptr` (that includes Alpaka buffers), that would allow the source memory buffer to be used via another copy during the asynchronous data copy to the device.
* The constructor argument `THostObject` object may not be used, unless it is initialized again e.g. by assigning another `THostObject` into it.
* The corresponding device-side object can be obtained with `get()` member function using either alpaka Device or Queue object. It can be used immediately after the constructor returns.
* `cms::alpakatools::CopyToDeviceCache<TDevice, THostObject>` (use only if **must** use copyable `THostObject`)
* `#include "HeterogeneousCore/AlpakaCore/interface/CopyToDeviceCache.h"`
* Copies the `THostObject` to all devices using `cms::alpakatools::CopyToDevice<THostObject>` synchronously. Also host backends do a copy.
* The constructor argument `THostObject` object can be used for other purposes immediately after the constructor returns
* The corresponding device-side object can be obtained with `get()` member function using either alpaka Device or Queue object. It can be used immediately after the constructor returns.

For examples see [`HeterogeneousCore/AlpakaTest/plugins/alpaka/TestAlpakaGlobalProducerCopyToDeviceCache.cc`](../../HeterogeneousCore/AlpakaTest/plugins/alpaka/TestAlpakaGlobalProducerCopyToDeviceCache.cc) and [`HeterogeneousCore/AlpakaTest/plugins/alpaka/TestAlpakaGlobalProducerMoveToDeviceCache.cc`](../../HeterogeneousCore/AlpakaTest/plugins/alpaka/TestAlpakaGlobalProducerMoveToDeviceCache.cc).

## Guarantees

* All Event data products in the device memory space are guaranteed to be accessible only for operations enqueued in the `Queue` given by `device::Event::queue()` when accessed through the `device::Event`.
Expand Down
104 changes: 104 additions & 0 deletions HeterogeneousCore/AlpakaCore/interface/CopyToDeviceCache.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
#ifndef HeterogeneousCore_AlpakaInterface_interface_CopyToDeviceCache_h
#define HeterogeneousCore_AlpakaInterface_interface_CopyToDeviceCache_h

#include <alpaka/alpaka.hpp>

#include "HeterogeneousCore/AlpakaCore/interface/QueueCache.h"
#include "HeterogeneousCore/AlpakaInterface/interface/CopyToDevice.h"
#include "HeterogeneousCore/AlpakaInterface/interface/devices.h"

namespace cms::alpakatools {
namespace detail {
// By default copy the host object with CopyToDevice<T>
//
// Doing with template specialization (rather than
// std::conditional_t and if constexpr) because the
// CopyToDevice<THostObject>::copyAsync() is ill-defined e.g. for
// PortableCollection on host device
template <typename TDevice, typename THostObject>
class CopyToDeviceCacheImpl {
public:
using Device = TDevice;
using Queue = alpaka::Queue<Device, alpaka::NonBlocking>;
using HostObject = THostObject;
using Copy = CopyToDevice<HostObject>;
using DeviceObject = decltype(Copy::copyAsync(std::declval<Queue&>(), std::declval<HostObject const&>()));

CopyToDeviceCacheImpl(HostObject const& srcObject) {
using Platform = alpaka::Platform<Device>;
auto const& devices = cms::alpakatools::devices<Platform>();
std::vector<std::shared_ptr<Queue>> queues;
queues.reserve(devices.size());
data_.reserve(devices.size());
for (auto const& dev : devices) {
auto queue = getQueueCache<Queue>().get(dev);
data_.emplace_back(Copy::copyAsync(*queue, srcObject));
queues.emplace_back(std::move(queue));
}
for (auto& queuePtr : queues) {
alpaka::wait(*queuePtr);
}
}

DeviceObject const& get(size_t i) const { return data_[i]; }

private:
std::vector<DeviceObject> data_;
};

// For host device, copy the host object directly instead
template <typename THostObject>
class CopyToDeviceCacheImpl<alpaka_common::DevHost, THostObject> {
public:
using HostObject = THostObject;
using DeviceObject = HostObject;

CopyToDeviceCacheImpl(HostObject const& srcObject) : data_(srcObject) {}

DeviceObject const& get(size_t i) const { return data_; }

private:
HostObject data_;
};
} // namespace detail

/**
* This class template implements a cache for data that is moved
* from the host (of type THostObject) to all the devices
* corresponding to the TDevice device type.
*
* The host-side object to be copied is given as an argument to the
* class constructor. The constructor uses the
* CopyToDevice<THostObject> class template to perfom the copy, and
* waits for the data copies to finish, i.e. the constructor is
* synchronous wrt. the data copies.
*
* The device-side object corresponding to the THostObject (actual
* type is the return type of CopyToDevice<THostObject>::copyAsync())
* can be obtained with get() member function, that has either the
* queue or device argument.
*/
template <typename TDevice, typename THostObject>
requires alpaka::isDevice<TDevice>
class CopyToDeviceCache {
using Device = TDevice;
using HostObject = THostObject;
using Impl = detail::CopyToDeviceCacheImpl<Device, HostObject>;
using DeviceObject = typename Impl::DeviceObject;

public:
CopyToDeviceCache(THostObject const& srcData) : data_(srcData) {}

DeviceObject const& get(Device const& dev) const { return data_.get(alpaka::getNativeHandle(dev)); }

template <typename TQueue>
DeviceObject const& get(TQueue const& queue) const {
return get(alpaka::getDev(queue));
}

private:
Impl data_;
};
} // namespace cms::alpakatools

#endif
101 changes: 101 additions & 0 deletions HeterogeneousCore/AlpakaCore/interface/MoveToDeviceCache.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
#ifndef HeterogeneousCore_AlpakaInterface_interface_MoveToDeviceCache_h
#define HeterogeneousCore_AlpakaInterface_interface_MoveToDeviceCache_h

#include <type_traits>

#include <alpaka/alpaka.hpp>

#include "HeterogeneousCore/AlpakaCore/interface/QueueCache.h"
#include "HeterogeneousCore/AlpakaCore/interface/CopyToDeviceCache.h"
#include "HeterogeneousCore/AlpakaInterface/interface/devices.h"

namespace cms::alpakatools {
namespace detail {
// By default copy the host object with CopyToDevice<T>
//
// Doing with template specialization (rather than
// std::conditional_t and if constexpr) because the
// CopyToDevice<THostObject>::copyAsync() is ill-defined e.g. for
// PortableCollection on host device
template <typename TDevice, typename THostObject>
class MoveToDeviceCacheImpl {
public:
using HostObject = THostObject;
using Impl = CopyToDeviceCacheImpl<TDevice, THostObject>;
using DeviceObject = typename Impl::DeviceObject;

MoveToDeviceCacheImpl(HostObject&& srcObject) : impl_(srcObject) {}

DeviceObject const& get(size_t i) const { return impl_.get(i); }

private:
Impl impl_;
};

// For host device, move the host object instead
template <typename THostObject>
class MoveToDeviceCacheImpl<alpaka_common::DevHost, THostObject> {
public:
using HostObject = THostObject;
using DeviceObject = HostObject;

MoveToDeviceCacheImpl(HostObject&& srcObject) : data_(std::move(srcObject)) {}

DeviceObject const& get(size_t i) const { return data_; }

private:
HostObject data_;
};
} // namespace detail

/**
* This class template implements a cache for data that is moved
* from the host (of type THostObject) to all the devices
* corresponding to the TDevice device type.
*
* The host-side object to be moved is given as an argument to the
* class constructor. The constructor uses the
* CopyToDevice<THostObject> class template to copy the data to the
* devices, and waits for the data copies to finish, i.e. the
* constructor is synchronous wrt. the data copies. The "move" is
* achieved by requiring the constructor argument to be an rvalue
* reference.
*
* Note that the host object type is required to be non-copyable.
* This is to avoid easy mistakes with objects that follow copy
* semantics of std::shared_ptr (that includes Alpaka buffers), that
* would allow the source memory buffer to be used via another copy
* during the asynchronous data copy to the device.
*
* The device-side object corresponding to the THostObject (actual
* type is the return type of CopyToDevice<THostObject>::copyAsync())
* can be obtained with get() member function, that has either the
* queue or device argument.
*/
template <typename TDevice, typename THostObject>
requires alpaka::isDevice<TDevice>
class MoveToDeviceCache {
public:
using Device = TDevice;
using HostObject = THostObject;
using Impl = detail::MoveToDeviceCacheImpl<Device, HostObject>;
using DeviceObject = typename Impl::DeviceObject;

static_assert(not(std::is_copy_constructible_v<HostObject> or std::is_copy_assignable_v<HostObject>),
"The data object to be moved to device must not be copyable.");

MoveToDeviceCache(HostObject&& srcData) : data_(std::move(srcData)) {}

DeviceObject const& get(Device const& dev) const { return data_.get(alpaka::getNativeHandle(dev)); }

template <typename TQueue>
DeviceObject const& get(TQueue const& queue) const {
return get(alpaka::getDev(queue));
}

private:
Impl data_;
};
} // namespace cms::alpakatools

#endif
38 changes: 37 additions & 1 deletion HeterogeneousCore/AlpakaInterface/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,9 @@ See the previous section for considerations about the use of device-mapped
memory.


## A note about copies and synchronisation
## Notes about copies and synchronisation

### Host-to-device copy

When copying data from a host buffer to a device buffer, _e.g._ with
```c++
Expand All @@ -163,6 +165,40 @@ std::memset(a_host_buffer.data(), 0x00, size);
is likely to overwrite part of the buffer while the copy is still ongoing,
resulting in `a_device_buffer` with incomplete and corrupted contents.

### Host-to-device move

For host data types that are movable and not copyable one can, to
large degree, avoid worrying about the caveats above about avoiding
any operations on the host with the following utility and move semantics
```c++
#include "HeterogeneousCore/AlpakaInterface/interface/moveToDeviceAsync.h"
// ...
auto device_object = cms::alpakatools::moveToDeviceAsync(queue, std::move(host_object));
```

Here the host-side `host_object` is _moved_ to the
`moveToDeviceAsync()` function, which returns a correponding
device-side `device_object`. In this case any subsequent use of
`host_object` is clearly "use after move", which is easier to catch in
code review or by static analysis tools than the consequences of
`alpaka::mempcy()`.

The `cms::alpakatools::CopyToDevice<T>` class temlate must have a
specialization for the host data type (otherwise the compilation will fail).

As mentioned above, the host data type must be movable but not
copyable (the compilation will fail with copyable types). For example,
the `PortableHostCollection` and `PortableHostObject` class templates
can be used, but Alpaka buffers can not be directly used.

The host data object should manage memory in
[queue-ordered](#allocating-queue-ordered-host-buffers-in-device-mapped-memory)
way. If not, the object must synchronize the device and the host in
its destructor (although such synchronization is undesirable).
Otherwise, the behavior is undefined.

### Device-to-host copy

When copying data from a device buffer to a host buffer, _e.g._ with
```c++
alpaka::memcpy(queue, a_host_buffer, a_device_buffer);
Expand Down
Loading