Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MultiReducer #1665

Merged
merged 128 commits into from
Jul 12, 2024
Merged
Show file tree
Hide file tree
Changes from 115 commits
Commits
Show all changes
128 commits
Select commit Hold shift + click to select a range
195d506
Add Seq and OMP MultiReduce and unit test
MrBurmark May 17, 2024
25fe448
Add reset tests for single and container init versions of reset
MrBurmark May 20, 2024
aaa6661
test resizing on reset
MrBurmark May 20, 2024
47a0fb8
Add basic functional tests
MrBurmark May 21, 2024
2565b68
Add consistency test to reducer tests
MrBurmark May 21, 2024
988b6d1
Align data to avoid false sharing in OMP MultiReducers
MrBurmark May 21, 2024
fb19038
Update consistency to also take policy into consideration
MrBurmark May 23, 2024
35094bd
add ordered/unordered into cuda/hip reduce policies
MrBurmark Jun 4, 2024
2453643
Use get in multi_reduce reference
MrBurmark Jun 10, 2024
18c0ef3
Use HighAccuracyReduce in MultiReduceOrderedDataOMP
MrBurmark Jun 10, 2024
01cf2bd
Take dynamic_smem by reference in make_launch_body
MrBurmark Jun 10, 2024
515126f
Fix typo in MultiReduceOrderedDataOMP
MrBurmark Jun 10, 2024
40e3e50
use multi_reduce reference get
MrBurmark Jun 10, 2024
469ccd6
Use atomic multi_reduce policies
MrBurmark Jun 10, 2024
b1dc681
Fix unit tests for reference get
MrBurmark Jun 10, 2024
37b7f0a
fixup MemUtils_HIP
MrBurmark Jun 10, 2024
4b3bb46
Add host device to BaseMultiReduce*
MrBurmark Jun 10, 2024
77ecd00
Add initial cuda/hip multi reduce impl
MrBurmark Jun 10, 2024
27f8fc6
work around non-working camp type_trait
MrBurmark Jun 11, 2024
9b668dc
Add example of forall and multi_reduce
MrBurmark Jun 11, 2024
3c41391
Add for_each_tuple to iterate over elements of tuple
MrBurmark Jun 11, 2024
3fb0a7d
remove extra includes in cuda/hip multi_reduce
MrBurmark Jun 11, 2024
cc6ae7a
Add missing include in openmp multi_reduce
MrBurmark Jun 11, 2024
45fc6ad
Use for_each_tuple in forall multi-reduce example
MrBurmark Jun 11, 2024
6ea3156
Make MultiReduce example more interesting
MrBurmark Jun 11, 2024
bec1924
Merge branch 'develop' into feature/burmark1/multireduce
rhornung67 Jun 13, 2024
fe33670
Update camp version to support std::tuple_size
MrBurmark Jun 11, 2024
eb4b041
Merge branch 'develop' into feature/burmark1/multireduce
rhornung67 Jun 17, 2024
5b2abd6
fix spacing in multi_reduce example
MrBurmark Jun 18, 2024
67efc05
Use tunings in multi_reduce policies
MrBurmark Jun 18, 2024
c0b1e1b
Merge branch 'feature/burmark1/multireduce' of github.com:LLNL/RAJA i…
MrBurmark Jun 18, 2024
ec1a0ea
style fixes and fix to assert loop
MrBurmark Jun 19, 2024
ec5f456
Fix example to make it compile with cuda
MrBurmark Jun 19, 2024
c16208e
Work around preprocessing bug in nvcc
MrBurmark Jun 19, 2024
de2fe83
Fix unused var warning
MrBurmark Jun 19, 2024
f6e4034
Suppress host device warnings
MrBurmark Jun 19, 2024
997087e
Add replication tuning options to cuda/hip multi reduce
MrBurmark Jun 21, 2024
553ad06
Merge branch 'develop' of github.com:LLNL/RAJA into feature/burmark1/…
MrBurmark Jun 21, 2024
49c4ff1
Abort if can't get shared memory
MrBurmark Jun 21, 2024
15290a1
hopefully fix compile issue with msvc
MrBurmark Jun 21, 2024
9fb21ba
Add global atomic only multi reduce for cuda/hip
MrBurmark Jun 21, 2024
1e4d3b1
Add math header to cuda/hip policy files
MrBurmark Jun 21, 2024
14279a3
Put global tally code in base class
MrBurmark Jun 21, 2024
617a653
Add backup path to cuda/hip multireduce
MrBurmark Jun 21, 2024
60751d5
Merge branch 'develop' of github.com:LLNL/RAJA into feature/burmark1/…
MrBurmark Jun 21, 2024
dd8d888
Add size check in multi reduce get_all
MrBurmark Jun 24, 2024
f498a44
Test get_all in basic MultiReduce test
MrBurmark Jun 24, 2024
e31193e
Remove ifdef in OpenMPMultiReducePols list
MrBurmark Jun 24, 2024
30a90cf
Radomize number of loops in basic multi-reducer test
MrBurmark Jun 24, 2024
e99b1a2
Add comments to each test in basic multi reduce test
MrBurmark Jun 24, 2024
276a1f3
test multiple sizes in the basic multi reduce test
MrBurmark Jun 24, 2024
6d28df8
Add more tests of numbers of bins in basic multi reduce test
MrBurmark Jun 24, 2024
04b55ee
Randomize num_bins in multi reduce test
MrBurmark Jun 24, 2024
a1cc5a7
Check for 0 num_bins
MrBurmark Jun 24, 2024
4a0eb8c
Make multi reduce constructors more consistent
MrBurmark Jun 24, 2024
3a8e677
Merge branch 'develop' of github.com:LLNL/RAJA into feature/burmark1/…
MrBurmark Jun 24, 2024
b458ed5
Generalize device constants
MrBurmark Jun 24, 2024
16daeb0
Choose replication per warp instead of per block
MrBurmark Jun 24, 2024
4975957
Move GetOffset operators into util file
MrBurmark Jun 25, 2024
8fbc3be
Use smaller global atomic width for V100
MrBurmark Jun 25, 2024
703c95e
Use left bunched offset for tally
MrBurmark Jun 25, 2024
5f04856
fix vexing parse
MrBurmark Jun 25, 2024
dce477d
Test more in multi reduce tests
MrBurmark Jun 25, 2024
f62440f
Remove largest test size in multi reduce tests
MrBurmark Jun 25, 2024
b157af7
Cut down multi reduce test runtime a bit more
MrBurmark Jun 25, 2024
1f53fdc
Update include/RAJA/policy/openmp/multi_reduce.hpp
MrBurmark Jun 25, 2024
1497c9a
Remove some subtests in ForallMultiReduceBasicTest
MrBurmark Jun 26, 2024
da27a78
Add kernel tests for multi reduce
MrBurmark Jun 26, 2024
e2d4f4f
Remove ompt and sycl multi reduce usage to avoid compile error
MrBurmark Jun 26, 2024
d66dd1b
Disable reset tests that use the reducer
MrBurmark Jun 26, 2024
01a51c7
add launch mult reducer tests
MrBurmark Jun 27, 2024
f3ccef1
fix include guard in kernel test
MrBurmark Jun 27, 2024
3e6e218
Merge branch 'develop' into feature/burmark1/multireduce
rhornung67 Jun 27, 2024
e6aefdd
Make allcoate Dynamic Shared memory function
MrBurmark Jun 27, 2024
da0ddb8
Rename Smem to Shmem
MrBurmark Jun 27, 2024
7a4abb8
Merge branch 'feature/burmark1/multireduce' of github.com:LLNL/RAJA i…
MrBurmark Jun 27, 2024
1d612c1
Take rich's suggestion on commenting device_prop()
MrBurmark Jun 27, 2024
6904235
Add more tuneability to multi reduce policies
MrBurmark Jun 27, 2024
79e6380
Add low performance low overhead multi-reduce cuda/hip policies
MrBurmark Jun 28, 2024
6efb2d9
Merge branch 'develop' into feature/burmark1/multireduce
rhornung67 Jun 28, 2024
d96c830
Merge branch 'feature/burmark1/multireduce' of github.com:LLNL/RAJA i…
MrBurmark Jun 28, 2024
15a3302
Changes warp size static assert message
MrBurmark Jun 28, 2024
1f282d8
Apply some basic tuning for cuda/hip multi reduce policies
MrBurmark Jul 1, 2024
21ac750
Remove unnecessary parts of DeviceConstants
MrBurmark Jul 1, 2024
31f3add
Merge branch 'develop' of github.com:LLNL/RAJA into feature/burmark1/…
MrBurmark Jul 1, 2024
ff0d034
Return self reference from combine
MrBurmark Jul 2, 2024
4e0990e
Remove dependence on block size from global tunings
MrBurmark Jul 2, 2024
5bdd4a8
rename policies to be a bit more verbose
MrBurmark Jul 2, 2024
4994c6e
Optimize indexers
MrBurmark Jul 3, 2024
8956206
if 0 out SyclMultiReducePols
MrBurmark Jul 3, 2024
8e8b66b
Add an extra guard on BaseMultiReduce constructor
MrBurmark Jul 3, 2024
b0423e1
Move RepeatView to utils
MrBurmark Jul 3, 2024
1611908
Change examples in pattern/multi_reduce.hpp
MrBurmark Jul 3, 2024
a727c59
Try fixing MultiReduceBase Container constructor
MrBurmark Jul 3, 2024
1adbe0d
Fix repeat view
MrBurmark Jul 3, 2024
23ece53
Add repeat view include to openmp multi reduce
MrBurmark Jul 3, 2024
b2c42f6
Fix comments in RepeatView
MrBurmark Jul 3, 2024
fe61ff8
Add multi-reduction cook book example
MrBurmark Jul 3, 2024
65c33a8
Remove sycl and ompt multi reduce policies
MrBurmark Jul 3, 2024
835ed2d
Add rarely used multi-reducer cookbook section
MrBurmark Jul 3, 2024
96203bc
Add feature MultiReduce documentation
MrBurmark Jul 3, 2024
95914b3
Add multi reduce policy section
MrBurmark Jul 3, 2024
97e1b24
Add sub-header for rarely used multireductions
MrBurmark Jul 3, 2024
f5ca84d
update docs
MrBurmark Jul 3, 2024
cc9d0b4
use RAJA::cuda/hip::synchronize(res) in multireduce
MrBurmark Jul 3, 2024
6ce2682
refine low_performance_low_overhead docs
MrBurmark Jul 3, 2024
ba5cc46
Add an example using the container interface
MrBurmark Jul 3, 2024
6e3a6ef
Fix cuda/hip test policy lists
MrBurmark Jul 3, 2024
f0409d8
Update spack configs so ci will use new camp
MrBurmark Jul 8, 2024
dc61db7
update radiuss-space-configs
MrBurmark Jul 8, 2024
4f66ece
Trying to work around spack mirroring issue
rhornung67 Jul 9, 2024
4e7021a
Add a type to test fallback on global atomics
MrBurmark Jul 9, 2024
a19130b
Merge branch 'feature/burmark1/multireduce' of github.com:LLNL/RAJA i…
MrBurmark Jul 9, 2024
9c52b63
Apply suggestions for documentation from code review
MrBurmark Jul 10, 2024
cd95bb5
Apply suggestions to comments from code review
MrBurmark Jul 10, 2024
f27b772
Apply suggestions to comments
MrBurmark Jul 10, 2024
c778aca
Document fallback when shmem unavailable
MrBurmark Jul 10, 2024
232df8a
Add func to Cuda/HipInfo to get right maxDynamicShmem
MrBurmark Jul 11, 2024
4485337
Merge branch 'feature/burmark1/multireduce' of github.com:LLNL/RAJA i…
MrBurmark Jul 11, 2024
142dfc5
Note issue with setting up shmem and launch dims in kernel
MrBurmark Jul 11, 2024
a3f8721
Move Scoped Assignment into utils
MrBurmark Jul 11, 2024
3d49289
fix compile
MrBurmark Jul 11, 2024
9264742
fix fix compile
MrBurmark Jul 11, 2024
361b37f
fix fix fix compile
MrBurmark Jul 11, 2024
84a70f2
Apply suggestions from code review
MrBurmark Jul 11, 2024
893246a
Change radius-spack-configs
MrBurmark Jul 12, 2024
3970beb
Merge branch 'feature/burmark1/multireduce' of github.com:LLNL/RAJA i…
MrBurmark Jul 12, 2024
758d065
Merge branch 'develop' of github.com:LLNL/RAJA into feature/burmark1/…
MrBurmark Jul 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/sphinx/user_guide/cook_book.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,5 @@ to provide users with complete beyond usage examples beyond what can be found in
:maxdepth: 2

cook_book/reduction
cook_book/multi-reduction

160 changes: 160 additions & 0 deletions docs/sphinx/user_guide/cook_book/multi-reduction.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
.. ##
.. ## Copyright (c) 2016-24, Lawrence Livermore National Security, LLC
.. ## and other RAJA project contributors. See the RAJA/LICENSE file
.. ## for details.
.. ##
.. ## SPDX-License-Identifier: (BSD-3-Clause)
.. ##

.. _cook-book-multi-reductions-label:

============================
Cooking with MultiReductions
============================

Please see the following section for overview discussion about RAJA multi-reductions:

* :ref:`feat-multi-reductions-label`.


---------------------------------
MultiReductions with RAJA::forall
---------------------------------

Here is the setup for a simple multi-reduction example::

const int N = 1000;
const int num_bins = 10;

int vec[N];
int bins[N];

for (int i = 0; i < N; ++i) {

vec[i] = 1;
bins[i] = i % num_bins;

}

Here a simple sum multi-reduction performed in a C-style for-loop::

int vsum[num_bins] {0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

// Run a kernel using the multi-reduction objects
for (int i = 0; i < N; ++i) {

vsum[bins[i]] += vec[i];

}

The results of these operations will yield the following values:

* ``vsum[0] == 100``
* ``vsum[1] == 100``
* ``vsum[2] == 100``
* ``vsum[3] == 100``
* ``vsum[4] == 100``
* ``vsum[5] == 100``
* ``vsum[6] == 100``
* ``vsum[7] == 100``
* ``vsum[8] == 100``
* ``vsum[9] == 100``

RAJA uses policy types to specify how things are implemented.

The forall *execution policy* specifies how the loop is run by the ``RAJA::forall`` method. The following discussion includes examples of several other RAJA execution policies that could be applied.
For example ``RAJA::seq_exec`` runs a C-style for-loop sequentially on a CPU. The
``RAJA::cuda_exec_with_reduce<256>`` runs the operation as a CUDA GPU kernel with
256 threads per block and other CUDA kernel launch parameters, like the
number of blocks, optimized for performance with multi_reducers.::

using exec_policy = RAJA::seq_exec;
// using exec_policy = RAJA::omp_parallel_for_exec;
// using exec_policy = RAJA::cuda_exec_with_reduce<256>;
// using exec_policy = RAJA::hip_exec_with_reduce<256>;

The multi-reduction policy specifies how the multi-reduction is done and must be compatible with the
execution policy. For example, ``RAJA::seq_multi_reduce`` does a sequential multi-reduction
and can only be used with sequential execution policies. The
``RAJA::cuda_multi_reduce_atomic`` policy uses atomics and can only be used with
cuda execution policies. Similarly for other RAJA execution back-ends, such as
HIP and OpenMP. Here are example RAJA multi-reduction policies whose names are
indicative of which execution policies they work with::

using multi_reduce_policy = RAJA::seq_multi_reduce;
// using multi_reduce_policy = RAJA::omp_multi_reduce;
// using multi_reduce_policy = RAJA::cuda_multi_reduce_atomic;
// using multi_reduce_policy = RAJA::hip_multi_reduce_atomic;

Here a simple sum multi-reduction is performed using RAJA::
Copy link
Member

@rhornung67 rhornung67 Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want that change unless you change: "Here a" to "Here is a", which is more verbose.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @rhornung67 meant to rephrase either with "Here, a" or "Here is how a".
Or, "Here is a simple sum multi-reduction performed using RAJA::".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided to leave this as is.


RAJA::MultiReduceSum<multi_reduce_policy, int> vsum(num_bins, 0);

RAJA::forall<exec_policy>( RAJA::RangeSegment(0, N),
[=](RAJA::Index_type i) {

vsum[bins[i]] += vec[i];

});

The results of these operations will yield the following values:

* ``vsum[0].get() == 100``
* ``vsum[1].get() == 100``
* ``vsum[2].get() == 100``
* ``vsum[3].get() == 100``
* ``vsum[4].get() == 100``
* ``vsum[5].get() == 100``
* ``vsum[6].get() == 100``
* ``vsum[7].get() == 100``
* ``vsum[8].get() == 100``
* ``vsum[9].get() == 100``

Another option for the execution policy when using the CUDA or HIP backends are
the base policies which have a boolean parameter to choose between the general
use ``cuda/hip_exec`` policy and the ``cuda/hip_exec_with_reduce`` policy.::

// static constexpr bool with_reduce = ...;
// using exec_policy = RAJA::cuda_exec_base<with_reduce, 256>;
// using exec_policy = RAJA::hip_exec_base<with_reduce, 256>;


---------------------------
Rarely Used MultiReductions
---------------------------

Multi-reductions consume resources even if they are not used in a
loop kernel. If a multi-reducer is conditionally used to set an error flag, for example, even
if the multi-reduction is not used at runtime in the loop kernel, then the setup
and finalization for the multi-reduction is still done and any resources are
still allocated and deallocated. To minimize these overheads, some backends have
special policies that minimize the amount of work the multi-reducer does in the
case that it is not used at runtime even if it is compiled into a loop kernel.
Here are example RAJA multi-reduction policies that have minimal overhead::

using rarely_used_multi_reduce_policy = RAJA::seq_multi_reduce;
// using rarely_used_multi_reduce_policy = RAJA::omp_multi_reduce;
// using rarely_used_multi_reduce_policy = RAJA::cuda_multi_reduce_atomic_low_performance_low_overhead;
// using rarely_used_multi_reduce_policy = RAJA::hip_multi_reduce_atomic_low_performance_low_overhead;

Here is a simple rarely used bitwise or multi-reduction performed using RAJA::
MrBurmark marked this conversation as resolved.
Show resolved Hide resolved

RAJA::MultiReduceBitOr<rarely_used_multi_reduce_policy, int> vor(num_bins, 0);

RAJA::forall<exec_policy>( RAJA::RangeSegment(0, N),
[=](RAJA::Index_type i) {

if (vec[i] < 0) {
vor[0] |= 1;
}

});

The results of these operations will yield the following value if the condition
is never met:

* ``vsum[0].get() == 0``

or yield the following value if the condition is ever met:

* ``vsum[0].get() == 1``
227 changes: 227 additions & 0 deletions docs/sphinx/user_guide/feature/multi-reduction.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,227 @@
.. ##
.. ## Copyright (c) 2016-24, Lawrence Livermore National Security, LLC
.. ## and other RAJA project contributors. See the RAJA/LICENSE file
.. ## for details.
.. ##
.. ## SPDX-License-Identifier: (BSD-3-Clause)
.. ##

.. _feat-multi-reductions-label:

=========================
MultiReduction Operations
=========================

RAJA provides multi-reduction types that allow users to perform a runtime number
of reduction operations in kernels launched using ``RAJA::forall``, ``RAJA::kernel``,
and ``RAJA::launch`` methods in a portable, thread-safe manner. Users may
use as many multi-reduction objects in a loop kernel as they need. If a small
fixed number of reductions is required in a loop kernel then standard RAJA reduction objects can be
used. Available RAJA multi-reduction types are described in this section.

.. note:: All RAJA multi-reduction types are located in the namespace ``RAJA``.

Also

.. note:: * Each RAJA multi-reduction type is templated on a **multi-reduction policy**
and a **reduction value type** for the multi-reduction variable. The
**multi-reduction policy type must be compatible with the execution
policy used by the kernel in which it is used.** For example, in
a CUDA kernel, a CUDA multi-reduction policy must be used.
* Each RAJA multi-reduction type accepts an **initial reduction value or
values** at construction (see below).
* Each RAJA multi-reduction type has a 'get' method to access reduced
values after kernel execution completes.

Please see the following sections for a description of reducers:

* :ref:`feat-reductions-label`.

Please see the following cook book sections for guidance on policy usage:

* :ref:`cook-book-multi-reductions-label`.


--------------------
MultiReduction Types
--------------------

RAJA supports three common multi-reduction types:

* ``MultiReduceSum< multi_reduce_policy, data_type >`` - Sum of values.

* ``MultiReduceMin< multi_reduce_policy, data_type >`` - Min value.

* ``MultiReduceMax< multi_reduce_policy, data_type >`` - Max value.

and two less common bitwise multi-reduction types:

* ``MultiReduceBitAnd< multi_reduce_policy, data_type >`` - Bitwise 'and' of values (i.e., ``a & b``).

* ``MultiReduceBitOr< multi_reduce_policy, data_type >`` - Bitwise 'or' of values (i.e., ``a | b``).

.. note:: ``RAJA::MultiReduceBitAnd`` and ``RAJA::MultiReduceBitOr`` reduction types are designed to work on integral data types because **in C++, at the language level, there is no such thing as a bitwise operator on floating-point numbers.**

-----------------------
MultiReduction Examples
-----------------------

Next, we provide a few examples to illustrate basic usage of RAJA multi-reduction
types.

Here is a simple RAJA multi-reduction example that shows how to use a sum
multi-reduction type::

const int N = 1000;
const int B = 10;

//
// Initialize an array of length N with all ones, and another array to
// integers between 0 and B-1
//
int vec[N];
int bins[N];
for (int i = 0; i < N; ++i) {
vec[i] = 1;
bins[i] = i % B;
}

// Create a sum multi-reduction object with a size of B, and initial
// values of zero
RAJA::MultiReduceSum< RAJA::omp_multi_reduce, int > vsum(B, 0);

// Run a kernel using the multi-reduction object
RAJA::forall<RAJA::omp_parallel_for_exec>( RAJA::RangeSegment(0, N),
[=](RAJA::Index_type i) {

vsum[bins[i]] += vec[i];

});

// After kernel is run, extract the reduced values
int my_vsums[B];
for (int bin = 0; bin < B; ++bin) {
my_vsums[bin] = vsum[bin].get();
}

The results of these operations will yield the following values:

* my_vsums[0] == 100
* my_vsums[1] == 100
* my_vsums[2] == 100
* my_vsums[3] == 100
* my_vsums[4] == 100
* my_vsums[5] == 100
* my_vsums[6] == 100
* my_vsums[7] == 100
* my_vsums[8] == 100
* my_vsums[9] == 100


Here is the same example but using values stored in a container::

const int N = 1000;
const int B = 10;

//
// Initialize an array of length N with all ones, and another array to
// integers between 0 and B-1
//
int vec[N];
int bins[N];
for (int i = 0; i < N; ++i) {
vec[i] = 1;
bins[i] = i % B;
}

// Create a vector with a size of B, and initial values of zero
std::vector<int> my_vsums(B, 0);

// Create a multi-reducer initalized with size and values from my_vsums
RAJA::MultiReduceSum< RAJA::omp_multi_reduce, int > vsum(my_vsums);

// Run a kernel using the multi-reduction object
RAJA::forall<RAJA::omp_parallel_for_exec>( RAJA::RangeSegment(0, N),
[=](RAJA::Index_type i) {

vsum[bins[i]] += vec[i];

});

// After kernel is run, extract the reduced values back into my_vsums
vsum.get_all(my_vsums);

The results of these operations will yield the following values:

* my_vsums[0] == 100
* my_vsums[1] == 100
* my_vsums[2] == 100
* my_vsums[3] == 100
* my_vsums[4] == 100
* my_vsums[5] == 100
* my_vsums[6] == 100
* my_vsums[7] == 100
* my_vsums[8] == 100
* my_vsums[9] == 100





Here is an example of a bitwise-or multi-reduction::

const int N = 128;
const int B = 8;

//
// Initialize an array of length N to integers between 0 and B-1
//
int bins[N];
for (int i = 0; i < N; ++i) {
bins[i] = i % B;
}

// Create a bitwise or multi-reduction object with initial value of '0'
MrBurmark marked this conversation as resolved.
Show resolved Hide resolved
RAJA::MultiReduceBitOr< RAJA::omp_multi_reduce, int > vor(B, 0);

// Run a kernel using the multi-reduction object
RAJA::forall<RAJA::omp_parallel_for_exec>( RAJA::RangeSegment(0, N),
[=](RAJA::Index_type i) {

vor[bins[i]] |= i;

});

// After kernel is run, extract the reduced values
int my_vors[B];
for (int bin = 0; bin < B; ++bin) {
my_vors[bin] = vor[bin].get();
}

The results of these operations will yield the following values:

* my_vors[0] == 120 == 0b1111000
* my_vors[1] == 121 == 0b1111001
* my_vors[2] == 122 == 0b1111010
* my_vors[3] == 123 == 0b1111011
* my_vors[4] == 124 == 0b1111100
* my_vors[5] == 125 == 0b1111101
* my_vors[6] == 126 == 0b1111110
* my_vors[7] == 127 == 0b1111111

The results of the multi-reduction start at 120 and increase to 127. In binary
representation (i.e., bits), :math:`120 = 0b1111000` and :math:`127 = 0b1111111`.
The bins were picked in such a way that all the integers in a bin had the same
remainder modulo 8 so their last 3 binary digits were all the same while their
upper binary digits varied. Because bitwise-or keeps all the set bits, the upper
bits are all set because at least one integer in that bin set them. The last
3 bits were the same in all the integers so the last 3 bits are the same as the
remainder modulo 8 of the bin number.

-----------------------
MultiReduction Policies
-----------------------

For more information about available RAJA multi-reduction policies and guidance
on which to use with RAJA execution policies, please see
:ref:`multi-reducepolicy-label`.
Loading
Loading