Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cudadev] Macro based SoA (followup of #PR211) #287

Open
wants to merge 50 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 39 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
bc4392b
[cudadev] Downgraded the requirement for device/pinned host memory to…
ericcano Oct 12, 2021
de1f3ed
[cudadev] Added support for buffer-store-view SoA model.
ericcano Oct 12, 2021
d8db9e9
[cudadev] Converted SiPixelROCsStatusAndMapping to SoA.
ericcano Oct 12, 2021
7e01562
[cudadev] Moved SiPixelClustersCUDA to SoA store.
ericcano Oct 12, 2021
d72582a
[cudadev] Moved SiPixelDigisCUDA to SoA store and views.
ericcano Oct 13, 2021
e8a7735
[cudadev] Moved TrackingRecHit2D to SoA store and views.
ericcano Oct 13, 2021
2918050
[cudadev] Updated SoA view so they support scalars in addition to col…
ericcano Oct 19, 2021
c815fe2
[cudadev] Added unit test to validate SoA stores and view.
ericcano Oct 19, 2021
3718459
[cudadev] removal of unused make_device_unique_uninitialized
ericcano Nov 16, 2021
be3fb71
[cudadev] Simplified resetting store and buffer.
ericcano Nov 23, 2021
aaadfb4
[cudadev] Reflected const nature of class in variables and function n…
ericcano Nov 23, 2021
7cf639e
[cudadev] Changed SiPixelClustersCUDA product access method from stor…
ericcano Nov 24, 2021
851d711
[cudadev] Renamed TrackingRecHit2DHostSOAView to TrackingRecHit2DHost…
ericcano Nov 24, 2021
6122c36
[cudadev] Removed redundant const.
ericcano Nov 24, 2021
ca7840e
[cudadev] Simplified reassignment.
ericcano Nov 24, 2021
3a7b692
[cudadev] Imported templated SoA structured from CMSSW
ericcano Nov 25, 2021
c38be20
[cudadev] Ported cudadev to the new SoA layout.
ericcano Dec 2, 2021
a4d1b46
[cudadev] Moving accesses from layouts to views.
ericcano Dec 2, 2021
16ae05f
[cudadev] SoA: Made store accessors private to disallow access.
ericcano Dec 3, 2021
8107f13
[cudadev] Removed accessors from stores/layouts and completed the ren…
ericcano Dec 3, 2021
e3fdaea
[cudadev] Moved SoA view generating macro to uppercase.
ericcano Dec 6, 2021
4805e5e
[cudadev] Reverting SiPixelROCsStatusAndMapping to a struct of fixed …
ericcano Dec 7, 2021
cbac317
[cudadev] Removed dedundant const specifier.
ericcano Dec 8, 2021
8bcecbe
[cudadev] Moved SoA templates to cms::soa namespace.
ericcano Dec 8, 2021
4dc5308
[cudadev] Fixed missing host device declaration for constructor.
ericcano Dec 9, 2021
b50d5a9
[cudadev] Marked SoA layout clone function as const.
ericcano Dec 9, 2021
988e8db
[cudadev] Added const correctness/limitations in SoAMetadata subclasses.
ericcano Dec 9, 2021
cf212d2
[cudadev] Added SoA general explanation.
ericcano Dec 11, 2021
9911404
[cudadev] Replaced hardcoded value with symbolic equivalent.
ericcano Dec 13, 2021
a8bf1ff
[cudadev] Fixed redundant const specifier.
ericcano Dec 13, 2021
7b24d99
[cudadev] Limited operator=() of elements to non-scalars.
ericcano Dec 13, 2021
3986550
[cudadev] Removed aliasing of SoAs in local memory.
ericcano Dec 13, 2021
c004580
[cudadev] Added planned features to SoA.md
ericcano Dec 14, 2021
a2fad0a
[cudadev] Added support for switchable restrict and cache style selec…
ericcano Dec 14, 2021
87d084a
[cudadev] Moved accesses from value to const ref so that we get the b…
ericcano Dec 15, 2021
5a2d472
[cudadev] Added automatic generation of trivially deducted view from …
ericcano Jan 13, 2022
f89caec
[cudadev] Removed cache access style support code in SoA.
ericcano Jan 13, 2022
9ad2ba7
[cudadev] Added size support in views and range checking.
ericcano Jan 13, 2022
a2643bf
[cudadev] Renamed SoA test to match currrent nomenclature.
ericcano Jan 14, 2022
24926af
[cudadev] Ran clang-format to format code.
ericcano Jan 17, 2022
f1e7f05
[cudadev] Re-added support for Eigen columns
ericcano Jan 27, 2022
bb0d028
[cudadev] Updated SoA status in MD file.
ericcano Jan 27, 2022
c7a5f59
[cudadev] Fixed hardcoded restrict qualifier for Eigen SoA columns
ericcano Jan 28, 2022
4f08179
[cudadev] Removed superseeded variable in SoA views.
ericcano Jan 28, 2022
218f925
[cudadev] Cleaned up testing #if
ericcano Jan 31, 2022
2be25c5
[cudadev] Made Eigen headers inclusion optional for SoA.
ericcano Jan 31, 2022
3abe4d5
[cudadev] Improved compilation errors when Eigen is missing for SoA.
ericcano Jan 31, 2022
8b50bc3
[cudadev] Replaced .dump() member function with operator<<() for SoA …
ericcano Feb 2, 2022
352eee9
[cudadev] Caught up with SoA developments in alpaka.
ericcano Apr 5, 2022
54d0e3a
[cudadev] Backported the kernel based SoA test from alpaka to cudadev
ericcano Apr 5, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 4 additions & 26 deletions src/cudadev/CUDACore/device_unique_ptr.h
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,8 @@ namespace cms {

template <typename T>
typename device::impl::make_device_unique_selector<T>::non_array make_device_unique(cudaStream_t stream) {
static_assert(std::is_trivially_constructible<T>::value,
"Allocating with non-trivial constructor on the device memory is not supported");
static_assert(std::is_trivially_copyable<T>::value,
"Allocating with non-trivial copy on the device memory is not supported");
void *mem = allocate_device(sizeof(T), stream);
return typename device::impl::make_device_unique_selector<T>::non_array{reinterpret_cast<T *>(mem),
device::impl::DeviceDeleter{stream}};
Expand All @@ -60,37 +60,15 @@ namespace cms {
typename device::impl::make_device_unique_selector<T>::unbounded_array make_device_unique(size_t n,
cudaStream_t stream) {
using element_type = typename std::remove_extent<T>::type;
static_assert(std::is_trivially_constructible<element_type>::value,
"Allocating with non-trivial constructor on the device memory is not supported");
static_assert(std::is_trivially_copyable<element_type>::value,
"Allocating with non-trivial copy on the device memory is not supported");
void *mem = allocate_device(n * sizeof(element_type), stream);
return typename device::impl::make_device_unique_selector<T>::unbounded_array{
reinterpret_cast<element_type *>(mem), device::impl::DeviceDeleter{stream}};
}

template <typename T, typename... Args>
typename device::impl::make_device_unique_selector<T>::bounded_array make_device_unique(Args &&...) = delete;

// No check for the trivial constructor, make it clear in the interface
template <typename T>
typename device::impl::make_device_unique_selector<T>::non_array make_device_unique_uninitialized(
cudaStream_t stream) {
void *mem = allocate_device(sizeof(T), stream);
return typename device::impl::make_device_unique_selector<T>::non_array{reinterpret_cast<T *>(mem),
device::impl::DeviceDeleter{stream}};
}

template <typename T>
typename device::impl::make_device_unique_selector<T>::unbounded_array make_device_unique_uninitialized(
size_t n, cudaStream_t stream) {
using element_type = typename std::remove_extent<T>::type;
void *mem = allocate_device(n * sizeof(element_type), stream);
return typename device::impl::make_device_unique_selector<T>::unbounded_array{
reinterpret_cast<element_type *>(mem), device::impl::DeviceDeleter{stream}};
}

template <typename T, typename... Args>
typename device::impl::make_device_unique_selector<T>::bounded_array make_device_unique_uninitialized(Args &&...) =
delete;
} // namespace cuda
} // namespace cms

Expand Down
8 changes: 4 additions & 4 deletions src/cudadev/CUDACore/host_unique_ptr.h
Original file line number Diff line number Diff line change
Expand Up @@ -39,17 +39,17 @@ namespace cms {
// Allocate pinned host memory
template <typename T>
typename host::impl::make_host_unique_selector<T>::non_array make_host_unique(cudaStream_t stream) {
static_assert(std::is_trivially_constructible<T>::value,
"Allocating with non-trivial constructor on the pinned host memory is not supported");
static_assert(std::is_trivially_copyable<T>::value,
"Allocating with non-trivial copy on the pinned host memory is not supported");
void *mem = allocate_host(sizeof(T), stream);
return typename host::impl::make_host_unique_selector<T>::non_array{reinterpret_cast<T *>(mem)};
}

template <typename T>
typename host::impl::make_host_unique_selector<T>::unbounded_array make_host_unique(size_t n, cudaStream_t stream) {
using element_type = typename std::remove_extent<T>::type;
static_assert(std::is_trivially_constructible<element_type>::value,
"Allocating with non-trivial constructor on the pinned host memory is not supported");
static_assert(std::is_trivially_copyable<element_type>::value,
"Allocating with non-trivial copy on the pinned host memory is not supported");
void *mem = allocate_host(n * sizeof(element_type), stream);
return typename host::impl::make_host_unique_selector<T>::unbounded_array{reinterpret_cast<element_type *>(mem)};
}
Expand Down
19 changes: 6 additions & 13 deletions src/cudadev/CUDADataFormats/SiPixelClustersCUDA.cc
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,10 @@
#include "CUDACore/host_unique_ptr.h"
#include "CUDADataFormats/SiPixelClustersCUDA.h"

SiPixelClustersCUDA::SiPixelClustersCUDA(size_t maxModules, cudaStream_t stream)
: moduleStart_d(cms::cuda::make_device_unique<uint32_t[]>(maxModules + 1, stream)),
clusInModule_d(cms::cuda::make_device_unique<uint32_t[]>(maxModules, stream)),
moduleId_d(cms::cuda::make_device_unique<uint32_t[]>(maxModules, stream)),
clusModuleStart_d(cms::cuda::make_device_unique<uint32_t[]>(maxModules + 1, stream)) {
auto view = cms::cuda::make_host_unique<DeviceConstView>(stream);
view->moduleStart_ = moduleStart_d.get();
view->clusInModule_ = clusInModule_d.get();
view->moduleId_ = moduleId_d.get();
view->clusModuleStart_ = clusModuleStart_d.get();
SiPixelClustersCUDA::SiPixelClustersCUDA(): data_d(), deviceLayout_(data_d.get(), 0), deviceView_(deviceLayout_) {}

view_d = cms::cuda::make_device_unique<DeviceConstView>(stream);
cms::cuda::copyAsync(view_d, view, stream);
}
SiPixelClustersCUDA::SiPixelClustersCUDA(size_t maxModules, cudaStream_t stream)
: data_d(cms::cuda::make_device_unique<std::byte[]>(DeviceLayout::computeDataSize(maxModules), stream)),
deviceLayout_(data_d.get(), maxModules),
deviceView_(deviceLayout_)
{}
88 changes: 56 additions & 32 deletions src/cudadev/CUDADataFormats/SiPixelClustersCUDA.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,54 @@
#include "CUDACore/device_unique_ptr.h"
#include "CUDACore/host_unique_ptr.h"
#include "CUDACore/cudaCompat.h"
#include "DataFormats/SoALayout.h"
#include "DataFormats/SoAView.h"

#include <cuda_runtime.h>

class SiPixelClustersCUDA {
public:
SiPixelClustersCUDA() = default;
GENERATE_SOA_LAYOUT(DeviceLayoutTemplate,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be simplified using GENERATE_SOA_LAYOUT_VIEW_AND_CONST_VIEW ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this code predates the introduction of this utility macro.

SOA_COLUMN(uint32_t, moduleStart), // index of the first pixel of each module
SOA_COLUMN(uint32_t, clusInModule), // number of clusters found in each module
SOA_COLUMN(uint32_t, moduleId), // module id of each module

// originally from rechits
SOA_COLUMN(uint32_t, clusModuleStart) // index of the first cluster of each module
)

// We use all defaults for the template parameters.
using DeviceLayout = DeviceLayoutTemplate<>;

GENERATE_SOA_VIEW(DeviceViewTemplate,
SOA_VIEW_LAYOUT_LIST(SOA_VIEW_LAYOUT(DeviceLayout, deviceLayout)),
SOA_VIEW_VALUE_LIST(
SOA_VIEW_VALUE(deviceLayout, moduleStart), // index of the first pixel of each module
SOA_VIEW_VALUE(deviceLayout, clusInModule), // number of clusters found in each module
SOA_VIEW_VALUE(deviceLayout, moduleId), // module id of each module

// originally from rechits
SOA_VIEW_VALUE(deviceLayout, clusModuleStart) // index of the first cluster of each module
)
)

using DeviceView = DeviceViewTemplate<>;

GENERATE_SOA_CONST_VIEW(DeviceConstViewTemplate,
SOA_VIEW_LAYOUT_LIST(SOA_VIEW_LAYOUT(DeviceView, deviceView)),
SOA_VIEW_VALUE_LIST(
SOA_VIEW_VALUE(deviceView, moduleStart), // index of the first pixel of each module
SOA_VIEW_VALUE(deviceView, clusInModule), // number of clusters found in each module
SOA_VIEW_VALUE(deviceView, moduleId), // module id of each module

// originally from rechits
SOA_VIEW_VALUE(deviceView, clusModuleStart) // index of the first cluster of each module
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this "derived" from DeviceView and not from DeviceLayout ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "idea" was that the class has its mutable view internally for initialization, and then down grades it to a const view for usage by the user in member function view(). Both path are possible, but we need to use the type we will pass to the constructor. The "hard work" is in the layout constructor, and in both cases, we're copying pointers (and the compiler will optimize this copying things out).

That makes the name of the SOA_VIEW_LAYOUT_LIST macro too restrictive, maybe. What about SOA_VIEW_DATA_SOURCE_LIST (and accordingly for SOA_VIEW_LAYOUT)?


using DeviceConstView = DeviceConstViewTemplate<>;

explicit SiPixelClustersCUDA();
explicit SiPixelClustersCUDA(size_t maxModules, cudaStream_t stream);
~SiPixelClustersCUDA() = default;

Expand All @@ -22,41 +64,23 @@ class SiPixelClustersCUDA {

uint32_t nClusters() const { return nClusters_h; }

uint32_t *moduleStart() { return moduleStart_d.get(); }
uint32_t *clusInModule() { return clusInModule_d.get(); }
uint32_t *moduleId() { return moduleId_d.get(); }
uint32_t *clusModuleStart() { return clusModuleStart_d.get(); }
uint32_t *moduleStart() { return deviceView_.moduleStart(); }
uint32_t *clusInModule() { return deviceView_.clusInModule(); }
uint32_t *moduleId() { return deviceView_.moduleId(); }
uint32_t *clusModuleStart() { return deviceView_.clusModuleStart(); }

uint32_t const *moduleStart() const { return moduleStart_d.get(); }
uint32_t const *clusInModule() const { return clusInModule_d.get(); }
uint32_t const *moduleId() const { return moduleId_d.get(); }
uint32_t const *clusModuleStart() const { return clusModuleStart_d.get(); }
uint32_t const *moduleStart() const { return deviceView_.moduleStart(); }
uint32_t const *clusInModule() const { return deviceView_.clusInModule(); }
uint32_t const *moduleId() const { return deviceView_.moduleId(); }
uint32_t const *clusModuleStart() const { return deviceView_.clusModuleStart(); }

class DeviceConstView {
public:
__device__ __forceinline__ uint32_t moduleStart(int i) const { return __ldg(moduleStart_ + i); }
__device__ __forceinline__ uint32_t clusInModule(int i) const { return __ldg(clusInModule_ + i); }
__device__ __forceinline__ uint32_t moduleId(int i) const { return __ldg(moduleId_ + i); }
__device__ __forceinline__ uint32_t clusModuleStart(int i) const { return __ldg(clusModuleStart_ + i); }

uint32_t const *moduleStart_;
uint32_t const *clusInModule_;
uint32_t const *moduleId_;
uint32_t const *clusModuleStart_;
};

DeviceConstView *view() const { return view_d.get(); }
DeviceConstView view() const { return DeviceConstView(deviceView_); }

private:
cms::cuda::device::unique_ptr<uint32_t[]> moduleStart_d; // index of the first pixel of each module
cms::cuda::device::unique_ptr<uint32_t[]> clusInModule_d; // number of clusters found in each module
cms::cuda::device::unique_ptr<uint32_t[]> moduleId_d; // module id of each module

// originally from rechits
cms::cuda::device::unique_ptr<uint32_t[]> clusModuleStart_d; // index of the first cluster of each module

cms::cuda::device::unique_ptr<DeviceConstView> view_d; // "me" pointer

cms::cuda::device::unique_ptr<std::byte[]> data_d; // Single SoA storage
DeviceLayout deviceLayout_;
DeviceView deviceView_;

uint32_t nClusters_h = 0;
};

Expand Down
79 changes: 46 additions & 33 deletions src/cudadev/CUDADataFormats/SiPixelDigisCUDA.cc
Original file line number Diff line number Diff line change
Expand Up @@ -5,44 +5,57 @@
#include "CUDACore/host_unique_ptr.h"

SiPixelDigisCUDA::SiPixelDigisCUDA(size_t maxFedWords, cudaStream_t stream)
: xx_d(cms::cuda::make_device_unique<uint16_t[]>(maxFedWords, stream)),
yy_d(cms::cuda::make_device_unique<uint16_t[]>(maxFedWords, stream)),
adc_d(cms::cuda::make_device_unique<uint16_t[]>(maxFedWords, stream)),
moduleInd_d(cms::cuda::make_device_unique<uint16_t[]>(maxFedWords, stream)),
clus_d(cms::cuda::make_device_unique<int32_t[]>(maxFedWords, stream)),
view_d(cms::cuda::make_device_unique<DeviceConstView>(stream)),
pdigi_d(cms::cuda::make_device_unique<uint32_t[]>(maxFedWords, stream)),
rawIdArr_d(cms::cuda::make_device_unique<uint32_t[]>(maxFedWords, stream)) {
auto view = cms::cuda::make_host_unique<DeviceConstView>(stream);
view->xx_ = xx_d.get();
view->yy_ = yy_d.get();
view->adc_ = adc_d.get();
view->moduleInd_ = moduleInd_d.get();
view->clus_ = clus_d.get();

cms::cuda::copyAsync(view_d, view, stream);
}
: data_d(cms::cuda::make_device_unique<std::byte[]>(
DeviceOnlyLayout::computeDataSize(maxFedWords) +
HostDeviceLayout::computeDataSize(maxFedWords),
stream)),
deviceOnlyLayout_d(data_d.get(), maxFedWords),
hostDeviceLayout_d(deviceOnlyLayout_d.soaMetadata().nextByte(), maxFedWords),
deviceFullView_(deviceOnlyLayout_d, hostDeviceLayout_d),
devicePixelConstView_(deviceFullView_)
{}

cms::cuda::host::unique_ptr<uint16_t[]> SiPixelDigisCUDA::adcToHostAsync(cudaStream_t stream) const {
auto ret = cms::cuda::make_host_unique<uint16_t[]>(nDigis(), stream);
cms::cuda::copyAsync(ret, adc_d, nDigis(), stream);
return ret;
}
SiPixelDigisCUDA::SiPixelDigisCUDA()
: data_d(),deviceOnlyLayout_d(), hostDeviceLayout_d(), deviceFullView_(), devicePixelConstView_()
{}

cms::cuda::host::unique_ptr<int32_t[]> SiPixelDigisCUDA::clusToHostAsync(cudaStream_t stream) const {
auto ret = cms::cuda::make_host_unique<int32_t[]>(nDigis(), stream);
cms::cuda::copyAsync(ret, clus_d, nDigis(), stream);
return ret;
SiPixelDigisCUDA::HostStore::HostStore()
: data_h(), hostLayout_(nullptr, 0), hostView_(hostLayout_)
{}

SiPixelDigisCUDA::HostStore::HostStore(size_t maxFedWords, cudaStream_t stream)
: data_h(cms::cuda::make_host_unique<std::byte[]>(SiPixelDigisCUDA::HostDeviceLayout::computeDataSize(maxFedWords), stream)),
hostLayout_(data_h.get(), maxFedWords),
hostView_(hostLayout_)
{}

void SiPixelDigisCUDA::HostStore::reset() {
hostLayout_ = HostDeviceLayout();
hostView_ = HostDeviceView(hostLayout_);
data_h.reset();
}

cms::cuda::host::unique_ptr<uint32_t[]> SiPixelDigisCUDA::pdigiToHostAsync(cudaStream_t stream) const {
auto ret = cms::cuda::make_host_unique<uint32_t[]>(nDigis(), stream);
cms::cuda::copyAsync(ret, pdigi_d, nDigis(), stream);
cms::cuda::host::unique_ptr<uint16_t[]> SiPixelDigisCUDA::adcToHostAsync(cudaStream_t stream) const {
auto ret = cms::cuda::make_host_unique<uint16_t[]>(nDigis(), stream);
// TODO: this is downgraded from cms::cuda::copyAsync as we copy data from within a block but not the full block.
cudaCheck(cudaMemcpyAsync(ret.get(), deviceFullView_.adc(), nDigis() * sizeof(decltype(ret[0])), cudaMemcpyDeviceToHost, stream));
return ret;
}

cms::cuda::host::unique_ptr<uint32_t[]> SiPixelDigisCUDA::rawIdArrToHostAsync(cudaStream_t stream) const {
auto ret = cms::cuda::make_host_unique<uint32_t[]>(nDigis(), stream);
cms::cuda::copyAsync(ret, rawIdArr_d, nDigis(), stream);
SiPixelDigisCUDA::HostStore SiPixelDigisCUDA::dataToHostAsync(cudaStream_t stream) const {
// Allocate the needed space only and build the compact data in place in host memory (from the larger device memory).
// Due to the compaction with the 2D copy, we need to know the precise geometry, and hence operate on the store (as opposed
// to the view, which is unaware of the column pitches.
HostStore ret(nDigis(), stream);
auto rhlsm = ret.hostLayout_.soaMetadata();
auto hdlsm_d = hostDeviceLayout_d.soaMetadata();
cudaCheck(cudaMemcpyAsync(rhlsm.addressOf_adc(), hdlsm_d.addressOf_adc(), nDigis_h * sizeof(*rhlsm.addressOf_adc()),
cudaMemcpyDeviceToHost, stream));
// Copy the other columns, realigning the data in shorter arrays. clus is the first but all 3 columns (clus, pdigis, rawIdArr) have
// the same geometry.
cudaCheck(cudaMemcpy2DAsync(rhlsm.addressOf_clus(), rhlsm.clusPitch(),
hdlsm_d.addressOf_clus(), hdlsm_d.clusPitch(),
3 /* rows */,
nDigis() * sizeof(decltype (*ret.hostView_.clus())), cudaMemcpyDeviceToHost, stream));
return ret;
}
}
Loading