Skip to content

Commit

Permalink
Merge pull request #1736 from LLNL/artv3/note-on-layout-unit-stride
Browse files Browse the repository at this point in the history
add note about unit stride to layout
  • Loading branch information
artv3 authored Oct 15, 2024
2 parents 0edd83c + 78d6d79 commit a1150aa
Show file tree
Hide file tree
Showing 3 changed files with 99 additions and 80 deletions.
131 changes: 75 additions & 56 deletions docs/sphinx/user_guide/tutorial/view_layout.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ from the build directory.

Key RAJA features shown in this section are:

* ``RAJA::View``
* ``RAJA::Layout`` and ``RAJA::OffsetLayout`` constructs
* ``RAJA::View``
* ``RAJA::Layout`` and ``RAJA::OffsetLayout`` constructs
* Layout permutations

The examples in this section illustrate RAJA View and Layout concepts
Expand All @@ -40,11 +40,11 @@ operation, using :math:`N \times N` matrices:
:end-before: _cstyle_matmult_end
:language: C++

As is commonly done for efficiency in C and C++, we have allocated the data
for the matrices as one-dimensional arrays. Thus, we need to manually compute
As is commonly done for efficiency in C and C++, we have allocated the data
for the matrices as one-dimensional arrays. Thus, we need to manually compute
the data pointer offsets for the row and column indices in the kernel.
Here, we use the array ``Cref`` to hold a reference solution matrix that
we use to compare with results generated by the examples below.
we use to compare with results generated by the examples below.

To simplify the multi-dimensional indexing, we can use ``RAJA::View`` objects,
which we define as:
Expand All @@ -55,20 +55,31 @@ which we define as:
:language: C++

Here we define three ``RAJA::View`` objects, 'Aview', 'Bview', and 'Cview',
that *wrap* the array data pointers, 'A', 'B', and 'C', respectively. We
pass a data pointer as the first argument to each view constructor and then
that *wrap* the array data pointers, 'A', 'B', and 'C', respectively. We
pass a data pointer as the first argument to each view constructor and then
the extent of each matrix dimension as the second and third arguments. There
are two extent arguments since we indicate in the ``RAJA::Layout`` template
parameter list. The matrices are square and each extent is 'N'. Here, the
template parameters to ``RAJA::View`` are the array data type 'double' and
parameter list. The matrices are square and each extent is 'N'. Here, the
template parameters to ``RAJA::View`` are the array data type 'double' and
a ``RAJA::Layout`` type. Specifically::

RAJA::Layout<2, int>

means that each View represents a two-dimensional default data layout, and
that we will use values of type 'int' to index into the arrays.
means that each View represents a two-dimensional default data layout, and
that we will use values of type 'int' to index into the arrays.

Using the ``RAJA::View`` objects, we can access the data entries for the rows
.. note:: A third argument in the Layout type can be used to specify the index
with unit stride::

RAJA::Layout<2, int, 1>

In the example above index 1 will be marked to have unit stride making
multi-dimensional indexing more efficient by avoiding multiplication by
`1` when it is unnecessary.



Using the ``RAJA::View`` objects, we can access the data entries for the rows
and columns using a more natural, less error-prone syntax:

.. literalinclude:: ../../../../exercises/view-layout_solution.cpp
Expand All @@ -79,9 +90,9 @@ and columns using a more natural, less error-prone syntax:
Default Layouts Use Row-major Ordering
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The default data layout ordering in RAJA is *row-major*, which is the
convention for multi-dimensional array indexing in C and C++. This means that
the rightmost index will be stride-1, the index to the left of the rightmost
The default data layout ordering in RAJA is *row-major*, which is the
convention for multi-dimensional array indexing in C and C++. This means that
the rightmost index will be stride-1, the index to the left of the rightmost
index will have stride equal to the extent of the rightmost dimension, and
so on.

Expand All @@ -90,32 +101,32 @@ so on.
see :ref:`feat-view-label` for more details.

To illustrate the default data layout striding, we next show simple
one-, two-, and three-dimensional examples where the for-loop ordering
for the different dimensions is such that all data access is stride-1. We
one-, two-, and three-dimensional examples where the for-loop ordering
for the different dimensions is such that all data access is stride-1. We
begin by defining some dimensions, allocate and initialize arrays:

.. literalinclude:: ../../../../exercises/view-layout_solution.cpp
:start-after: _default_views_init_start
:end-before: _default_views_init_end
:language: C++

The version of the array initialization kernel using a one-dimensional
The version of the array initialization kernel using a one-dimensional
``RAJA::View`` is:

.. literalinclude:: ../../../../exercises/view-layout_solution.cpp
:start-after: _default_view1D_start
:end-before: _default_view1D_end
:language: C++

The version of the array initialization using a two-dimensional
The version of the array initialization using a two-dimensional
``RAJA::View`` is:

.. literalinclude:: ../../../../exercises/view-layout_solution.cpp
:start-after: _default_view2D_start
:end-before: _default_view2D_end
:language: C++

The three-dimensional version is:
The three-dimensional version is:

.. literalinclude:: ../../../../exercises/view-layout_solution.cpp
:start-after: _default_view3D_start
Expand All @@ -126,16 +137,16 @@ It's worth repeating that the data array access in all three variants shown
here using ``RAJA::View`` objects is stride-1 since we order the for-loops
in the loop nests to match the row-major ordering.

RAJA Layout types support other data access patterns with different striding
orders, offsets, and permutations. To this point, we have used the default
Layout constructor. RAJA provides methods to generate Layouts for different
indexing patterns. We describe these in the next several sections. Next, we
RAJA Layout types support other data access patterns with different striding
orders, offsets, and permutations. To this point, we have used the default
Layout constructor. RAJA provides methods to generate Layouts for different
indexing patterns. We describe these in the next several sections. Next, we
show how to permute the data striding order using permuted Layouts.

Permuted Layouts Change Data Striding Order
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Every ``RAJA::Layout`` object has a permutation. When a permutation is not
Every ``RAJA::Layout`` object has a permutation. When a permutation is not
specified at creation, a Layout will use the identity permutation. Here are
examples where the identity permutation is explicitly provided. First, in
two dimensions:
Expand All @@ -153,10 +164,10 @@ Then, in three dimensions:
:language: C++

These two examples access the data with stride-1 ordering, the same as in
the earlier examples, which is shown by the nested loop ordering.
the earlier examples, which is shown by the nested loop ordering.
The identity permutation in two dimensions is '{0, 1}' and is '{0, 1, 2}'
for three dimensions. The method ``RAJA::make_permuted_layout`` is used to
create a ``RAJA::Layout`` object with a permutation. The method takes two
for three dimensions. The method ``RAJA::make_permuted_layout`` is used to
create a ``RAJA::Layout`` object with a permutation. The method takes two
arguments, the extents of each dimension and the permutation.

.. note:: If a permuted Layout is created with the *identity permutation*
Expand All @@ -170,9 +181,9 @@ Next, we permute the striding order for the two-dimensional example:
:language: C++

Read from right to left, the permutation '{1, 0}' specifies that the first
(zero) index 'i' is stride-1 and the second index (one) 'j' has stride equal
to the extent of the first Layout dimension 'Nx'. This is evident in the
for-loop ordering.
(zero) index 'i' is stride-1, additionally captured in the ``RAJA::Layout``,
and the second index (one) 'j' has stride equal to the extent of the first
Layout dimension 'Nx'. This is evident in the for-loop ordering.

Here is the three-dimensional case, where we have reversed the striding order
using the permutation '{2, 1, 0}':
Expand All @@ -182,7 +193,16 @@ using the permutation '{2, 1, 0}':
:end-before: _perma_view3D_end
:language: C++

The data access remains stride-1 due to the for-loop reordering. For fun,
.. note:: As the index is now held by index 0 we adjust the Layout template
argument accordingly::

RAJA::Layout<3, int, 0>

As before index 0 will be marked to have unit stride making
multi-dimensional indexing more efficient by avoiding multiplication by
`1` when it is unnecessary.

The data access remains stride-1 due to the for-loop reordering. For fun,
here is another three-dimensional permutation:

.. literalinclude:: ../../../../exercises/view-layout_solution.cpp
Expand All @@ -197,8 +217,8 @@ Multi-dimensional Indices and Linear Indices
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``RAJA::Layout`` types provide methods to convert between linear indices and
multi-dimensional indices and vice versa. Recall the Layout 'perm3a_layout'
from above that was created with the permutation '{2, 1, 0}'. To get the
multi-dimensional indices and vice versa. Recall the Layout 'perm3a_layout'
from above that was created with the permutation '{2, 1, 0}'. To get the
linear index corresponding to the index triple '(1, 2, 0)', you can do
this::

Expand All @@ -210,36 +230,36 @@ for linear index 7, you can do::
int i, j, k;
perm3a_layout.toIndices(7, i, j, k);

This sets 'i' to 1, 'j' to 2, and 'k' to 0.
This sets 'i' to 1, 'j' to 2, and 'k' to 0.

Similarly for the Layout 'permb_layout', which was created with the
Similarly for the Layout 'permb_layout', which was created with the
permutation '{1, 2, 0}'::

lin = perm3b_layout(1, 2, 0);
lin = perm3b_layout(1, 2, 0);

sets 'lin' to 13 = 1 + 0 * Nx + 2 * Nx * Nz and::

perm3b_layout.toIndices(13, i, j, k);

sets 'i' to 1, 'j' to 2, and 'k' to 0.

There are more examples in the exercise file associated with this section.
There are more examples in the exercise file associated with this section.
Feel free to experiment with them.

One important item to note is that, by default, there is no bounds checking
on indices passed to a ``RAJA::View`` data access method or ``RAJA::Layout``
index computation methods. Therefore, it is the responsibility of a user
to ensure that indices passed to ``RAJA::View`` and ``RAJA::Layoout``
methods are in bounds to avoid accessing data outside
of the View or computing invalid indices.
index computation methods. Therefore, it is the responsibility of a user
to ensure that indices passed to ``RAJA::View`` and ``RAJA::Layout``
methods are in bounds to avoid accessing data outside
of the View or computing invalid indices.

.. note:: RAJA provides a CMake variable ``RAJA_ENABLE_BOUNDS_CHECK`` to
.. note:: RAJA provides a CMake variable ``RAJA_ENABLE_BOUNDS_CHECK`` to
turn run time bounds checking on or off when the code is compiled.
Enabling bounds checking is useful for debugging and to ensure
your code is correct. However, when enabled, bounds checking adds
noticeable run time overhead. So it should not be enabled for
a production build of your code.
a production build of your code.

Offset Layouts Apply Offsets to Indices
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand All @@ -251,9 +271,9 @@ We first illustrate the concept of an offset with a C-style for-loop:
:end-before: _cstyle_offlayout1D_end
:language: C++

Here, the for-loop runs from 'imin' to 'imax-1' (i.e., -5 to 5). To avoid
out-of-bounds negative indexing, we subtract 'imin' (i.e., -5) from the loop
index 'i'.
Here, the for-loop runs from 'imin' to 'imax-1' (i.e., -5 to 5). To avoid
out-of-bounds negative indexing, we subtract 'imin' (i.e., -5) from the loop
index 'i'.

To do the same thing with RAJA, we create a ``RAJA::OffsetLayout`` object
and use it to index into the array:
Expand All @@ -264,7 +284,7 @@ and use it to index into the array:
:language: C++

``RAJA::OffsetLayout`` is a different type than ``RAJA::Layout`` because
it contains offset information. The arguments to the
it contains offset information. The arguments to the
``RAJA::make_offset_layout`` method are the index bounds.

As expected, the two dimensional case is similar. First, a C-style loop:
Expand All @@ -284,7 +304,7 @@ and then the same operation using a ``RAJA::OffsetLayout`` object:
Note that the first argument passed to ``RAJA::make_offset_layout`` contains
the lower bounds for 'i' and 'j' and the second argument contains the upper
bounds. Also, the 'j' index is stride-1 by default since we did not pass
a permutation to the ``RAJA::make_offset_layout`` method, which is the same
a permutation to the ``RAJA::make_offset_layout`` method, which is the same
as the non-offset Layout usage.

Just like ``RAJA::Layout`` has a permutation, so does ``RAJA::OffsetLayout``.
Expand All @@ -293,11 +313,10 @@ Here is an example where we permute the (i, j) index stride ordering:
.. literalinclude:: ../../../../exercises/view-layout_solution.cpp
:start-after: _raja_permofflayout2D_start
:end-before: _raja_permofflayout2D_end
:language: C++
:language: C++

The permutation '{1, 0}' is passed as the third argument to
``RAJA::make_offset_layout``. From the ordering of the for-loops, we can see
that the 'i' index is stride-1 and the 'j' index has stride equal to the
extent of the 'i' dimension so the for-loop nest strides through
The permutation '{1, 0}' is passed as the third argument to
``RAJA::make_offset_layout``. From the ordering of the for-loops, we can see
that the 'i' index is stride-1 and the 'j' index has stride equal to the
extent of the 'i' dimension so the for-loop nest strides through
the data with unit stride.

18 changes: 9 additions & 9 deletions exercises/view-layout.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -105,9 +105,9 @@ int main(int RAJA_UNUSED_ARG(argc), char **RAJA_UNUSED_ARG(argv[]))
// Note: we use default Layout
//
// _matmult_views_start
RAJA::View< double, RAJA::Layout<2, int> > Aview(A, N, N);
RAJA::View< double, RAJA::Layout<2, int> > Bview(B, N, N);
RAJA::View< double, RAJA::Layout<2, int> > Cview(C, N, N);
RAJA::View< double, RAJA::Layout<2, int, 1> > Aview(A, N, N);
RAJA::View< double, RAJA::Layout<2, int, 1> > Bview(B, N, N);
RAJA::View< double, RAJA::Layout<2, int, 1> > Cview(C, N, N);
// _matmult_views_end

// _cstyle_matmult_views_start
Expand Down Expand Up @@ -165,7 +165,7 @@ int main(int RAJA_UNUSED_ARG(argc), char **RAJA_UNUSED_ARG(argv[]))
std::memset(a, 0, Ntot * sizeof(int));

// _default_view1D_start
RAJA::View< int, RAJA::Layout<1, int> > view_1D(a, Ntot);
RAJA::View< int, RAJA::Layout<1, int, 0> > view_1D(a, Ntot);

for (int i = 0; i < Ntot; ++i) {
view_1D(i) = i;
Expand All @@ -182,7 +182,7 @@ int main(int RAJA_UNUSED_ARG(argc), char **RAJA_UNUSED_ARG(argv[]))
std::memset(a, 0, Ntot * sizeof(int));

// _default_view2D_start
RAJA::View< int, RAJA::Layout<2, int> > view_2D(a, Nx, Ny);
RAJA::View< int, RAJA::Layout<2, int, 1> > view_2D(a, Nx, Ny);

int iter{0};
for (int i = 0; i < Nx; ++i) {
Expand Down Expand Up @@ -229,9 +229,9 @@ int main(int RAJA_UNUSED_ARG(argc), char **RAJA_UNUSED_ARG(argv[]))

// _default_perm_view2D_start
std::array<RAJA::idx_t, 2> defperm2 {{0, 1}};
RAJA::Layout< 2, int > defperm2_layout =
RAJA::Layout< 2, int> defperm2_layout =
RAJA::make_permuted_layout( {{Nx, Ny}}, defperm2);
RAJA::View< int, RAJA::Layout<2, int> > defperm_view_2D(a, defperm2_layout);
RAJA::View< int, RAJA::Layout<2, int, 1> > defperm_view_2D(a, defperm2_layout);

iter = 0;
for (int i = 0; i < Nx; ++i) {
Expand Down Expand Up @@ -272,7 +272,7 @@ int main(int RAJA_UNUSED_ARG(argc), char **RAJA_UNUSED_ARG(argv[]))
std::array<RAJA::idx_t, 2> perm2 {{1, 0}};
RAJA::Layout< 2, int > perm2_layout =
RAJA::make_permuted_layout( {{Nx, Ny}}, perm2);
RAJA::View< int, RAJA::Layout<2, int> > perm_view_2D(a, perm2_layout);
RAJA::View< int, RAJA::Layout<2, int, 0> > perm_view_2D(a, perm2_layout);

iter = 0;
for (int j = 0; j < Ny; ++j) {
Expand Down Expand Up @@ -318,7 +318,7 @@ int main(int RAJA_UNUSED_ARG(argc), char **RAJA_UNUSED_ARG(argv[]))
std::array<RAJA::idx_t, 3> perm3b {{1, 2, 0}};
RAJA::Layout< 3, int > perm3b_layout =
RAJA::make_permuted_layout( {{Nx, Ny, Nz}}, perm3b);
RAJA::View< int, RAJA::Layout<3, int> > perm3b_view_3D(a, perm3b_layout);
RAJA::View< int, RAJA::Layout<3, int, 0> > perm3b_view_3D(a, perm3b_layout);

iter = 0;
for (int j = 0; j < Ny; ++j) {
Expand Down
Loading

0 comments on commit a1150aa

Please sign in to comment.