Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cudadev] Macro based SoA (followup of #PR211) #287

Open
wants to merge 50 commits into
base: master
Choose a base branch
from

Conversation

ericcano
Copy link
Contributor

This PR follows up and replaces #211. Its goal is to implement #281.

The current status of SoA (with future plans) is described in SoA.md

ericcano and others added 30 commits December 8, 2021 10:42
… trivially copyable.

The requirement for trivially constructible prevented creating structures on the host side and then memory copying it to the device side.
The SoA store divides a user pre-allocated buffer in adjacent columns of values. The columns are bytes aligned to a setable alignment
and their length is determined at run time.

The SoA views allow grouping columns from multiple stores in a logical entity. For example when some columns are used on device-only
and otheres transmitted to host. The two groups are on two different stores, but joined together in a logic view.

Views can also provide access to a subset of a store.
…umns.

The Eigen columns, present in the stores are explicitly not supported (assertion).
…e to const view.

Create a const view type, and renamed the access method return type and name.
…SOAStore.

The rename reflect the fact that the buffers are owned by the class.
The templation allows defining compile time alignment choice, alignement enforcement, and alignment hinting for the compiler.
The SoA macros now generate templates, adapted code and added using clauses to select the proper variant.
Introduced separate, specifically const constViews as const views can be worked around by copying, and const correctness is insufficient.
  Applied the previous to the SiPixelROCsStatusAndMappingWrapper product.
Moved alignment value from runtime to compile time parameter.
Code simplifications.
Added views where necessary to replace store accesses.
Renamed variables to "layout" in anticipation of the renaming of the stores concept to layout.
…sized arrays.

The struct is passed around and mapped into a view only for use.
Added symbolic names for cache line sizes/default alignment.
Added more notes in the SoA.md plans section.
@ericcano ericcano marked this pull request as ready for review January 14, 2022 16:44
@ericcano
Copy link
Contributor Author

@fwyzard, @makortel, this version does not contain all the intended features, but could be considered for merging into master.

The missing features I intend to cover later are:

  • Eigen support. This probably will require templating the way we describe columns in views, and I did not figure it out yet, and will probably require quite from testing, as this is macro-generated code. I also intend to make the inclusion of Eigen headers optional if Eigen columns are not used.
  • Convert the dump member function into an operator<<.

I tried merging with the current master and this compiles and runs fine.

@fwyzard
Copy link
Contributor

fwyzard commented Jan 14, 2022

A silly thing: could you run it all through clang-format ?

Comment on lines 40 to 50
GENERATE_SOA_CONST_VIEW(DeviceConstViewTemplate,
SOA_VIEW_LAYOUT_LIST(SOA_VIEW_LAYOUT(DeviceView, deviceView)),
SOA_VIEW_VALUE_LIST(
SOA_VIEW_VALUE(deviceView, moduleStart), // index of the first pixel of each module
SOA_VIEW_VALUE(deviceView, clusInModule), // number of clusters found in each module
SOA_VIEW_VALUE(deviceView, moduleId), // module id of each module

// originally from rechits
SOA_VIEW_VALUE(deviceView, clusModuleStart) // index of the first cluster of each module
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this "derived" from DeviceView and not from DeviceLayout ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "idea" was that the class has its mutable view internally for initialization, and then down grades it to a const view for usage by the user in member function view(). Both path are possible, but we need to use the type we will pass to the constructor. The "hard work" is in the layout constructor, and in both cases, we're copying pointers (and the compiler will optimize this copying things out).

That makes the name of the SOA_VIEW_LAYOUT_LIST macro too restrictive, maybe. What about SOA_VIEW_DATA_SOURCE_LIST (and accordingly for SOA_VIEW_LAYOUT)?

@ericcano
Copy link
Contributor Author

@VinInn, you might be interested too, especially in this commit: 87d084a, which adds automatic support to __restrict__ hinting for the compiler.

Re-enabled Eigen parts of the test.
Eigen/Core should be included before the SoA headers to enable support.
…layout

This also changes the function from a static to a non static one.

A sample output is:
SoA1LayoutTemplate(10000 elements, byte alignement= 128, @0x7f7a8d525080):
  sizeof(SoA1LayoutTemplate): 168
 Column x at offset 0 has size 80000 and padding 0
 Column y at offset 80000 has size 80000 and padding 0
 Column z at offset 160000 has size 80000 and padding 0
 Column sum at offset 240000 has size 80000 and padding 0
 Column prod at offset 320000 has size 80000 and padding 0
 Eigen value a at offset 400000 has dimension (3 x 1) and per column size 80000 and padding 0
 Eigen value b at offset 640000 has dimension (3 x 1) and per column size 80000 and padding 0
 Eigen value r at offset 880000 has dimension (3 x 1) and per column size 80000 and padding 0
 Column color at offset 1120000 has size 20000 and padding 96
 Column value at offset 1140096 has size 40000 and padding 64
 Column py at offset 1180160 has size 80000 and padding 0
 Column count at offset 1260160 has size 40000 and padding 64
 Column anotherCount at offset 1300224 has size 40000 and padding 64
 Scalar description at offset 1340288 has size 8 and padding 120
 Scalar someNumber at offset 1340416 has size 4 and padding 124
Final offset = 1340544 computeDataSize(...): 1340544
@ericcano ericcano changed the title Macro based SoA (followup of #PR211) [cudadev] Macro based SoA (followup of #PR211) Feb 9, 2022
ericcano added a commit to ericcano/pixeltrack-standalone that referenced this pull request Feb 9, 2022
The test validates multiple multiple SoA use cases, host and device side.
It validates buffer sharing for multiple layouts.
An an Alpaka native sub-buffer operation (device side memset) was used. This type of operations
can be used in the future to copy partial columns.
This version is a port of cms-patatrack#287 to Alpaka.

#include <cuda_runtime.h>

class SiPixelClustersCUDA {
public:
SiPixelClustersCUDA() = default;
GENERATE_SOA_LAYOUT(DeviceLayoutTemplate,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be simplified using GENERATE_SOA_LAYOUT_VIEW_AND_CONST_VIEW ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this code predates the introduction of this utility macro.

byteAlignment / sizeof(CPP_TYPE::Scalar);) \
if constexpr (alignmentEnforcement == AlignmentEnforcement::Enforced) \
if (reinterpret_cast<intptr_t>(BOOST_PP_CAT(NAME, _)) % byteAlignment) \
throw std::out_of_range("In layout constructor: misaligned column: " #NAME);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why out_of_range for an alignment error ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not find a perfect match in std::exceptions and felt this could be the closest (maybe std::length_error ? ). Otherwise, we could just create our own exceptions, but we mostly rely on std:: ones, don't we?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go with a generic std::runtime_error.

curMem += (((nElements_ * sizeof(CPP_TYPE::Scalar) - 1) / byteAlignment) + 1) * byteAlignment * \
CPP_TYPE::RowsAtCompileTime * CPP_TYPE::ColsAtCompileTime; \
BOOST_PP_CAT(NAME, Stride_) = (((nElements_ * sizeof(CPP_TYPE::Scalar) - 1) / byteAlignment) + 1) * \
byteAlignment / sizeof(CPP_TYPE::Scalar);) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if byteAlignment is not a multiple of sizeof(CPP_TYPE::Scalar) ?
what happens if byteAlignment is less than sizeof(CPP_TYPE::Scalar) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, bad things might happen. Usual values are powers of 2 and a typical byte alignment is bigger than a built-in scalar, but this is not guaranteed. I think asserting (compile-time) that both sizeof(CPP_TYPE::Scalar) and byteAlignment are powers of 2 and aligning the columns to the biggest of the two should forbid any unwanted scenario while remaining a reasonable constraint on the parameters.

Copy link
Contributor

@fwyzard fwyzard Feb 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alignment must always be a power of 2.
The size of a type can be anything, but c++ (almost) mandates that it is a multiple of its own alignment.

But "scalar" in this context does not necessarily mean a primitive type - it could also be an aggregate like class or an array (e.g. char name[1024]).
Then the stride becomes 0, which I don't think is the intended behaviour ? (no it doesn't, it becomes nElements_)
If sizeof(CPP_TYPE::Scalar) is greater that the alignment, the stride should just be nElements_ * sizeof(CPP_TYPE::Scalar) .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, shouldn't the stride simply be (((nElements_ * sizeof(CPP_TYPE::Scalar) - 1) / byteAlignment) + 1) * byteAlignment, without the / sizeof(CPP_TYPE::Scalar) part ?

@fwyzard
Copy link
Contributor

fwyzard commented Feb 23, 2022

Here are some ideas after reading most of the PR and the discussions of the past couple of days.

My understanding is that the basic SoA-based structure would look similar to

// define the SoALayout and SoAView types
GENERATE_SOA_LAYOUT_VIEW_AND_CONST_VIEW(
    SoALayoutTemplate,
    SoAViewTemplate,
    SoAConstViewTemplate,
    ...
);

using SoALayout = SoALayoutTemplate<>;
using SoAView = SoAViewTemplate<>;

class SoA {
public:
  SoA(size_t size) :
    storage_{std::make_unique_ptr<std::byte[]>(SoALayout::computeDataSize(size))},
    layout_{storage_.get(), size)},
    view_{layout_}
  {}

  SoAView view() { return view_; }  // return by value or by reference ?
  SoAConstView view() const { return const_view(); }
  SoAConstView const_view() const { return SoAConstView(layout_); }  // can we create the const view from the view instead ?

private:
  std::unique_ptr<std::byte[]> storage_;  // use a different storage type for GPUs, etc.
  SoALayout layout_;
  SoAView view_;
};

Then, I would have suggestions to simplify the usage.

Make the View and ConstView nested types of the Layout, and always declare them

```c++
// define the SoALayout and SoAView types
GENERATE_SOA_LAYOUT(
    SoALayoutTemplate,
    ...
);

would always generate

  • SoALayoutTemplate<...>
  • SoALayoutTemplate<...>::View
  • SoALayoutTemplate<...>::ConstView

and

using SoALayout = SoALayoutTemplate<>;

would give

  • SoALayout
  • SoALayout::View
  • SoALayout::ConstView

Make the view a data member of the layout

If I understood correctly the content of the Layout and View types, the former is a superset of the latter.
Then we could simplify the Layout type by making it own a View data member, instead of duplicating those fields ?

So the SoALayoutTemplate<...> template would have a SoALayoutTemplate<...>::View view_; data member instead of the individual pointers.
It should also have the view(), view() const and const_view() const methods to return a View and ConstView


With these two changes, the example above would become

// define the SoALayout and SoAView types
GENERATE_SOA_LAYOUT(SoALayoutTemplate,
    ...
);

using SoALayout = SoALayoutTemplate<>;

class SoA {
public:
  SoA(size_t size) :
    storage_{std::make_unique_ptr<std::byte[]>(SoALayout::computeDataSize(size))},
    layout_{storage_.get(), size)}
  {}

  // return by value or by reference ?
  SoAView view() { return layout.view(); }
  SoAConstView view() const { return layout.const_view(); }
  SoAConstView const_view() const { return layout.const_view();; }

private:
  std::unique_ptr<std::byte[]> storage_;  // use a different storage type for GPUs, etc.
  SoALayout layout_;
};

@ericcano
Copy link
Contributor Author

ericcano commented Mar 9, 2022

I attempted to achieve this, but failed in the current 2 classes model as the View is defined from the metadata of the Layout (column type are deduced from the metadata of the source layout/views). Instantiating the View early enough to be used inside the Layout

This schema can be achieved nevertheless if we create a third type of generated class (Trivial_view?) where the types of columns are directly generated, like for the Layout. It will be used like the View but generated in a different way.

@fwyzard fwyzard added the enhancement New feature or request label Jun 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants