-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MultiReducer #1665
Merged
Merged
Add MultiReducer #1665
Changes from 115 commits
Commits
Show all changes
128 commits
Select commit
Hold shift + click to select a range
195d506
Add Seq and OMP MultiReduce and unit test
MrBurmark 25fe448
Add reset tests for single and container init versions of reset
MrBurmark aaa6661
test resizing on reset
MrBurmark 47a0fb8
Add basic functional tests
MrBurmark 2565b68
Add consistency test to reducer tests
MrBurmark 988b6d1
Align data to avoid false sharing in OMP MultiReducers
MrBurmark fb19038
Update consistency to also take policy into consideration
MrBurmark 35094bd
add ordered/unordered into cuda/hip reduce policies
MrBurmark 2453643
Use get in multi_reduce reference
MrBurmark 18c0ef3
Use HighAccuracyReduce in MultiReduceOrderedDataOMP
MrBurmark 01cf2bd
Take dynamic_smem by reference in make_launch_body
MrBurmark 515126f
Fix typo in MultiReduceOrderedDataOMP
MrBurmark 40e3e50
use multi_reduce reference get
MrBurmark 469ccd6
Use atomic multi_reduce policies
MrBurmark b1dc681
Fix unit tests for reference get
MrBurmark 37b7f0a
fixup MemUtils_HIP
MrBurmark 4b3bb46
Add host device to BaseMultiReduce*
MrBurmark 77ecd00
Add initial cuda/hip multi reduce impl
MrBurmark 27f8fc6
work around non-working camp type_trait
MrBurmark 9b668dc
Add example of forall and multi_reduce
MrBurmark 3c41391
Add for_each_tuple to iterate over elements of tuple
MrBurmark 3fb0a7d
remove extra includes in cuda/hip multi_reduce
MrBurmark cc6ae7a
Add missing include in openmp multi_reduce
MrBurmark 45fc6ad
Use for_each_tuple in forall multi-reduce example
MrBurmark 6ea3156
Make MultiReduce example more interesting
MrBurmark bec1924
Merge branch 'develop' into feature/burmark1/multireduce
rhornung67 fe33670
Update camp version to support std::tuple_size
MrBurmark eb4b041
Merge branch 'develop' into feature/burmark1/multireduce
rhornung67 5b2abd6
fix spacing in multi_reduce example
MrBurmark 67efc05
Use tunings in multi_reduce policies
MrBurmark c0b1e1b
Merge branch 'feature/burmark1/multireduce' of github.com:LLNL/RAJA i…
MrBurmark ec1a0ea
style fixes and fix to assert loop
MrBurmark ec5f456
Fix example to make it compile with cuda
MrBurmark c16208e
Work around preprocessing bug in nvcc
MrBurmark de2fe83
Fix unused var warning
MrBurmark f6e4034
Suppress host device warnings
MrBurmark 997087e
Add replication tuning options to cuda/hip multi reduce
MrBurmark 553ad06
Merge branch 'develop' of github.com:LLNL/RAJA into feature/burmark1/…
MrBurmark 49c4ff1
Abort if can't get shared memory
MrBurmark 15290a1
hopefully fix compile issue with msvc
MrBurmark 9fb21ba
Add global atomic only multi reduce for cuda/hip
MrBurmark 1e4d3b1
Add math header to cuda/hip policy files
MrBurmark 14279a3
Put global tally code in base class
MrBurmark 617a653
Add backup path to cuda/hip multireduce
MrBurmark 60751d5
Merge branch 'develop' of github.com:LLNL/RAJA into feature/burmark1/…
MrBurmark dd8d888
Add size check in multi reduce get_all
MrBurmark f498a44
Test get_all in basic MultiReduce test
MrBurmark e31193e
Remove ifdef in OpenMPMultiReducePols list
MrBurmark 30a90cf
Radomize number of loops in basic multi-reducer test
MrBurmark e99b1a2
Add comments to each test in basic multi reduce test
MrBurmark 276a1f3
test multiple sizes in the basic multi reduce test
MrBurmark 6d28df8
Add more tests of numbers of bins in basic multi reduce test
MrBurmark 04b55ee
Randomize num_bins in multi reduce test
MrBurmark a1cc5a7
Check for 0 num_bins
MrBurmark 4a0eb8c
Make multi reduce constructors more consistent
MrBurmark 3a8e677
Merge branch 'develop' of github.com:LLNL/RAJA into feature/burmark1/…
MrBurmark b458ed5
Generalize device constants
MrBurmark 16daeb0
Choose replication per warp instead of per block
MrBurmark 4975957
Move GetOffset operators into util file
MrBurmark 8fbc3be
Use smaller global atomic width for V100
MrBurmark 703c95e
Use left bunched offset for tally
MrBurmark 5f04856
fix vexing parse
MrBurmark dce477d
Test more in multi reduce tests
MrBurmark f62440f
Remove largest test size in multi reduce tests
MrBurmark b157af7
Cut down multi reduce test runtime a bit more
MrBurmark 1f53fdc
Update include/RAJA/policy/openmp/multi_reduce.hpp
MrBurmark 1497c9a
Remove some subtests in ForallMultiReduceBasicTest
MrBurmark da27a78
Add kernel tests for multi reduce
MrBurmark e2d4f4f
Remove ompt and sycl multi reduce usage to avoid compile error
MrBurmark d66dd1b
Disable reset tests that use the reducer
MrBurmark 01a51c7
add launch mult reducer tests
MrBurmark f3ccef1
fix include guard in kernel test
MrBurmark 3e6e218
Merge branch 'develop' into feature/burmark1/multireduce
rhornung67 e6aefdd
Make allcoate Dynamic Shared memory function
MrBurmark da0ddb8
Rename Smem to Shmem
MrBurmark 7a4abb8
Merge branch 'feature/burmark1/multireduce' of github.com:LLNL/RAJA i…
MrBurmark 1d612c1
Take rich's suggestion on commenting device_prop()
MrBurmark 6904235
Add more tuneability to multi reduce policies
MrBurmark 79e6380
Add low performance low overhead multi-reduce cuda/hip policies
MrBurmark 6efb2d9
Merge branch 'develop' into feature/burmark1/multireduce
rhornung67 d96c830
Merge branch 'feature/burmark1/multireduce' of github.com:LLNL/RAJA i…
MrBurmark 15a3302
Changes warp size static assert message
MrBurmark 1f282d8
Apply some basic tuning for cuda/hip multi reduce policies
MrBurmark 21ac750
Remove unnecessary parts of DeviceConstants
MrBurmark 31f3add
Merge branch 'develop' of github.com:LLNL/RAJA into feature/burmark1/…
MrBurmark ff0d034
Return self reference from combine
MrBurmark 4e0990e
Remove dependence on block size from global tunings
MrBurmark 5bdd4a8
rename policies to be a bit more verbose
MrBurmark 4994c6e
Optimize indexers
MrBurmark 8956206
if 0 out SyclMultiReducePols
MrBurmark 8e8b66b
Add an extra guard on BaseMultiReduce constructor
MrBurmark b0423e1
Move RepeatView to utils
MrBurmark 1611908
Change examples in pattern/multi_reduce.hpp
MrBurmark a727c59
Try fixing MultiReduceBase Container constructor
MrBurmark 1adbe0d
Fix repeat view
MrBurmark 23ece53
Add repeat view include to openmp multi reduce
MrBurmark b2c42f6
Fix comments in RepeatView
MrBurmark fe61ff8
Add multi-reduction cook book example
MrBurmark 65c33a8
Remove sycl and ompt multi reduce policies
MrBurmark 835ed2d
Add rarely used multi-reducer cookbook section
MrBurmark 96203bc
Add feature MultiReduce documentation
MrBurmark 95914b3
Add multi reduce policy section
MrBurmark 97e1b24
Add sub-header for rarely used multireductions
MrBurmark f5ca84d
update docs
MrBurmark cc9d0b4
use RAJA::cuda/hip::synchronize(res) in multireduce
MrBurmark 6ce2682
refine low_performance_low_overhead docs
MrBurmark ba5cc46
Add an example using the container interface
MrBurmark 6e3a6ef
Fix cuda/hip test policy lists
MrBurmark f0409d8
Update spack configs so ci will use new camp
MrBurmark dc61db7
update radiuss-space-configs
MrBurmark 4f66ece
Trying to work around spack mirroring issue
rhornung67 4e7021a
Add a type to test fallback on global atomics
MrBurmark a19130b
Merge branch 'feature/burmark1/multireduce' of github.com:LLNL/RAJA i…
MrBurmark 9c52b63
Apply suggestions for documentation from code review
MrBurmark cd95bb5
Apply suggestions to comments from code review
MrBurmark f27b772
Apply suggestions to comments
MrBurmark c778aca
Document fallback when shmem unavailable
MrBurmark 232df8a
Add func to Cuda/HipInfo to get right maxDynamicShmem
MrBurmark 4485337
Merge branch 'feature/burmark1/multireduce' of github.com:LLNL/RAJA i…
MrBurmark 142dfc5
Note issue with setting up shmem and launch dims in kernel
MrBurmark a3f8721
Move Scoped Assignment into utils
MrBurmark 3d49289
fix compile
MrBurmark 9264742
fix fix compile
MrBurmark 361b37f
fix fix fix compile
MrBurmark 84a70f2
Apply suggestions from code review
MrBurmark 893246a
Change radius-spack-configs
MrBurmark 3970beb
Merge branch 'feature/burmark1/multireduce' of github.com:LLNL/RAJA i…
MrBurmark 758d065
Merge branch 'develop' of github.com:LLNL/RAJA into feature/burmark1/…
MrBurmark File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,160 @@ | ||
.. ## | ||
.. ## Copyright (c) 2016-24, Lawrence Livermore National Security, LLC | ||
.. ## and other RAJA project contributors. See the RAJA/LICENSE file | ||
.. ## for details. | ||
.. ## | ||
.. ## SPDX-License-Identifier: (BSD-3-Clause) | ||
.. ## | ||
|
||
.. _cook-book-multi-reductions-label: | ||
|
||
============================ | ||
Cooking with MultiReductions | ||
============================ | ||
|
||
Please see the following section for overview discussion about RAJA multi-reductions: | ||
|
||
* :ref:`feat-multi-reductions-label`. | ||
|
||
|
||
--------------------------------- | ||
MultiReductions with RAJA::forall | ||
--------------------------------- | ||
|
||
Here is the setup for a simple multi-reduction example:: | ||
|
||
const int N = 1000; | ||
const int num_bins = 10; | ||
|
||
int vec[N]; | ||
int bins[N]; | ||
|
||
for (int i = 0; i < N; ++i) { | ||
|
||
vec[i] = 1; | ||
bins[i] = i % num_bins; | ||
|
||
} | ||
|
||
Here a simple sum multi-reduction performed in a C-style for-loop:: | ||
|
||
int vsum[num_bins] {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; | ||
|
||
// Run a kernel using the multi-reduction objects | ||
for (int i = 0; i < N; ++i) { | ||
|
||
vsum[bins[i]] += vec[i]; | ||
|
||
} | ||
|
||
The results of these operations will yield the following values: | ||
|
||
* ``vsum[0] == 100`` | ||
* ``vsum[1] == 100`` | ||
* ``vsum[2] == 100`` | ||
* ``vsum[3] == 100`` | ||
* ``vsum[4] == 100`` | ||
* ``vsum[5] == 100`` | ||
* ``vsum[6] == 100`` | ||
* ``vsum[7] == 100`` | ||
* ``vsum[8] == 100`` | ||
* ``vsum[9] == 100`` | ||
|
||
RAJA uses policy types to specify how things are implemented. | ||
|
||
The forall *execution policy* specifies how the loop is run by the ``RAJA::forall`` method. The following discussion includes examples of several other RAJA execution policies that could be applied. | ||
For example ``RAJA::seq_exec`` runs a C-style for-loop sequentially on a CPU. The | ||
``RAJA::cuda_exec_with_reduce<256>`` runs the operation as a CUDA GPU kernel with | ||
256 threads per block and other CUDA kernel launch parameters, like the | ||
number of blocks, optimized for performance with multi_reducers.:: | ||
|
||
using exec_policy = RAJA::seq_exec; | ||
// using exec_policy = RAJA::omp_parallel_for_exec; | ||
// using exec_policy = RAJA::cuda_exec_with_reduce<256>; | ||
// using exec_policy = RAJA::hip_exec_with_reduce<256>; | ||
|
||
The multi-reduction policy specifies how the multi-reduction is done and must be compatible with the | ||
execution policy. For example, ``RAJA::seq_multi_reduce`` does a sequential multi-reduction | ||
and can only be used with sequential execution policies. The | ||
``RAJA::cuda_multi_reduce_atomic`` policy uses atomics and can only be used with | ||
cuda execution policies. Similarly for other RAJA execution back-ends, such as | ||
HIP and OpenMP. Here are example RAJA multi-reduction policies whose names are | ||
indicative of which execution policies they work with:: | ||
|
||
using multi_reduce_policy = RAJA::seq_multi_reduce; | ||
// using multi_reduce_policy = RAJA::omp_multi_reduce; | ||
// using multi_reduce_policy = RAJA::cuda_multi_reduce_atomic; | ||
// using multi_reduce_policy = RAJA::hip_multi_reduce_atomic; | ||
|
||
Here a simple sum multi-reduction is performed using RAJA:: | ||
|
||
RAJA::MultiReduceSum<multi_reduce_policy, int> vsum(num_bins, 0); | ||
|
||
RAJA::forall<exec_policy>( RAJA::RangeSegment(0, N), | ||
[=](RAJA::Index_type i) { | ||
|
||
vsum[bins[i]] += vec[i]; | ||
|
||
}); | ||
|
||
The results of these operations will yield the following values: | ||
|
||
* ``vsum[0].get() == 100`` | ||
* ``vsum[1].get() == 100`` | ||
* ``vsum[2].get() == 100`` | ||
* ``vsum[3].get() == 100`` | ||
* ``vsum[4].get() == 100`` | ||
* ``vsum[5].get() == 100`` | ||
* ``vsum[6].get() == 100`` | ||
* ``vsum[7].get() == 100`` | ||
* ``vsum[8].get() == 100`` | ||
* ``vsum[9].get() == 100`` | ||
|
||
Another option for the execution policy when using the CUDA or HIP backends are | ||
the base policies which have a boolean parameter to choose between the general | ||
use ``cuda/hip_exec`` policy and the ``cuda/hip_exec_with_reduce`` policy.:: | ||
|
||
// static constexpr bool with_reduce = ...; | ||
// using exec_policy = RAJA::cuda_exec_base<with_reduce, 256>; | ||
// using exec_policy = RAJA::hip_exec_base<with_reduce, 256>; | ||
|
||
|
||
--------------------------- | ||
Rarely Used MultiReductions | ||
--------------------------- | ||
|
||
Multi-reductions consume resources even if they are not used in a | ||
loop kernel. If a multi-reducer is conditionally used to set an error flag, for example, even | ||
if the multi-reduction is not used at runtime in the loop kernel, then the setup | ||
and finalization for the multi-reduction is still done and any resources are | ||
still allocated and deallocated. To minimize these overheads, some backends have | ||
special policies that minimize the amount of work the multi-reducer does in the | ||
case that it is not used at runtime even if it is compiled into a loop kernel. | ||
Here are example RAJA multi-reduction policies that have minimal overhead:: | ||
|
||
using rarely_used_multi_reduce_policy = RAJA::seq_multi_reduce; | ||
// using rarely_used_multi_reduce_policy = RAJA::omp_multi_reduce; | ||
// using rarely_used_multi_reduce_policy = RAJA::cuda_multi_reduce_atomic_low_performance_low_overhead; | ||
// using rarely_used_multi_reduce_policy = RAJA::hip_multi_reduce_atomic_low_performance_low_overhead; | ||
|
||
Here is a simple rarely used bitwise or multi-reduction performed using RAJA:: | ||
MrBurmark marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
RAJA::MultiReduceBitOr<rarely_used_multi_reduce_policy, int> vor(num_bins, 0); | ||
|
||
RAJA::forall<exec_policy>( RAJA::RangeSegment(0, N), | ||
[=](RAJA::Index_type i) { | ||
|
||
if (vec[i] < 0) { | ||
vor[0] |= 1; | ||
} | ||
|
||
}); | ||
|
||
The results of these operations will yield the following value if the condition | ||
is never met: | ||
|
||
* ``vsum[0].get() == 0`` | ||
|
||
or yield the following value if the condition is ever met: | ||
|
||
* ``vsum[0].get() == 1`` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,227 @@ | ||
.. ## | ||
.. ## Copyright (c) 2016-24, Lawrence Livermore National Security, LLC | ||
.. ## and other RAJA project contributors. See the RAJA/LICENSE file | ||
.. ## for details. | ||
.. ## | ||
.. ## SPDX-License-Identifier: (BSD-3-Clause) | ||
.. ## | ||
|
||
.. _feat-multi-reductions-label: | ||
|
||
========================= | ||
MultiReduction Operations | ||
========================= | ||
|
||
RAJA provides multi-reduction types that allow users to perform a runtime number | ||
of reduction operations in kernels launched using ``RAJA::forall``, ``RAJA::kernel``, | ||
and ``RAJA::launch`` methods in a portable, thread-safe manner. Users may | ||
use as many multi-reduction objects in a loop kernel as they need. If a small | ||
fixed number of reductions is required in a loop kernel then standard RAJA reduction objects can be | ||
used. Available RAJA multi-reduction types are described in this section. | ||
|
||
.. note:: All RAJA multi-reduction types are located in the namespace ``RAJA``. | ||
|
||
Also | ||
|
||
.. note:: * Each RAJA multi-reduction type is templated on a **multi-reduction policy** | ||
and a **reduction value type** for the multi-reduction variable. The | ||
**multi-reduction policy type must be compatible with the execution | ||
policy used by the kernel in which it is used.** For example, in | ||
a CUDA kernel, a CUDA multi-reduction policy must be used. | ||
* Each RAJA multi-reduction type accepts an **initial reduction value or | ||
values** at construction (see below). | ||
* Each RAJA multi-reduction type has a 'get' method to access reduced | ||
values after kernel execution completes. | ||
|
||
Please see the following sections for a description of reducers: | ||
|
||
* :ref:`feat-reductions-label`. | ||
|
||
Please see the following cook book sections for guidance on policy usage: | ||
|
||
* :ref:`cook-book-multi-reductions-label`. | ||
|
||
|
||
-------------------- | ||
MultiReduction Types | ||
-------------------- | ||
|
||
RAJA supports three common multi-reduction types: | ||
|
||
* ``MultiReduceSum< multi_reduce_policy, data_type >`` - Sum of values. | ||
|
||
* ``MultiReduceMin< multi_reduce_policy, data_type >`` - Min value. | ||
|
||
* ``MultiReduceMax< multi_reduce_policy, data_type >`` - Max value. | ||
|
||
and two less common bitwise multi-reduction types: | ||
|
||
* ``MultiReduceBitAnd< multi_reduce_policy, data_type >`` - Bitwise 'and' of values (i.e., ``a & b``). | ||
|
||
* ``MultiReduceBitOr< multi_reduce_policy, data_type >`` - Bitwise 'or' of values (i.e., ``a | b``). | ||
|
||
.. note:: ``RAJA::MultiReduceBitAnd`` and ``RAJA::MultiReduceBitOr`` reduction types are designed to work on integral data types because **in C++, at the language level, there is no such thing as a bitwise operator on floating-point numbers.** | ||
|
||
----------------------- | ||
MultiReduction Examples | ||
----------------------- | ||
|
||
Next, we provide a few examples to illustrate basic usage of RAJA multi-reduction | ||
types. | ||
|
||
Here is a simple RAJA multi-reduction example that shows how to use a sum | ||
multi-reduction type:: | ||
|
||
const int N = 1000; | ||
const int B = 10; | ||
|
||
// | ||
// Initialize an array of length N with all ones, and another array to | ||
// integers between 0 and B-1 | ||
// | ||
int vec[N]; | ||
int bins[N]; | ||
for (int i = 0; i < N; ++i) { | ||
vec[i] = 1; | ||
bins[i] = i % B; | ||
} | ||
|
||
// Create a sum multi-reduction object with a size of B, and initial | ||
// values of zero | ||
RAJA::MultiReduceSum< RAJA::omp_multi_reduce, int > vsum(B, 0); | ||
|
||
// Run a kernel using the multi-reduction object | ||
RAJA::forall<RAJA::omp_parallel_for_exec>( RAJA::RangeSegment(0, N), | ||
[=](RAJA::Index_type i) { | ||
|
||
vsum[bins[i]] += vec[i]; | ||
|
||
}); | ||
|
||
// After kernel is run, extract the reduced values | ||
int my_vsums[B]; | ||
for (int bin = 0; bin < B; ++bin) { | ||
my_vsums[bin] = vsum[bin].get(); | ||
} | ||
|
||
The results of these operations will yield the following values: | ||
|
||
* my_vsums[0] == 100 | ||
* my_vsums[1] == 100 | ||
* my_vsums[2] == 100 | ||
* my_vsums[3] == 100 | ||
* my_vsums[4] == 100 | ||
* my_vsums[5] == 100 | ||
* my_vsums[6] == 100 | ||
* my_vsums[7] == 100 | ||
* my_vsums[8] == 100 | ||
* my_vsums[9] == 100 | ||
|
||
|
||
Here is the same example but using values stored in a container:: | ||
|
||
const int N = 1000; | ||
const int B = 10; | ||
|
||
// | ||
// Initialize an array of length N with all ones, and another array to | ||
// integers between 0 and B-1 | ||
// | ||
int vec[N]; | ||
int bins[N]; | ||
for (int i = 0; i < N; ++i) { | ||
vec[i] = 1; | ||
bins[i] = i % B; | ||
} | ||
|
||
// Create a vector with a size of B, and initial values of zero | ||
std::vector<int> my_vsums(B, 0); | ||
|
||
// Create a multi-reducer initalized with size and values from my_vsums | ||
RAJA::MultiReduceSum< RAJA::omp_multi_reduce, int > vsum(my_vsums); | ||
|
||
// Run a kernel using the multi-reduction object | ||
RAJA::forall<RAJA::omp_parallel_for_exec>( RAJA::RangeSegment(0, N), | ||
[=](RAJA::Index_type i) { | ||
|
||
vsum[bins[i]] += vec[i]; | ||
|
||
}); | ||
|
||
// After kernel is run, extract the reduced values back into my_vsums | ||
vsum.get_all(my_vsums); | ||
|
||
The results of these operations will yield the following values: | ||
|
||
* my_vsums[0] == 100 | ||
* my_vsums[1] == 100 | ||
* my_vsums[2] == 100 | ||
* my_vsums[3] == 100 | ||
* my_vsums[4] == 100 | ||
* my_vsums[5] == 100 | ||
* my_vsums[6] == 100 | ||
* my_vsums[7] == 100 | ||
* my_vsums[8] == 100 | ||
* my_vsums[9] == 100 | ||
|
||
|
||
|
||
|
||
|
||
Here is an example of a bitwise-or multi-reduction:: | ||
|
||
const int N = 128; | ||
const int B = 8; | ||
|
||
// | ||
// Initialize an array of length N to integers between 0 and B-1 | ||
// | ||
int bins[N]; | ||
for (int i = 0; i < N; ++i) { | ||
bins[i] = i % B; | ||
} | ||
|
||
// Create a bitwise or multi-reduction object with initial value of '0' | ||
MrBurmark marked this conversation as resolved.
Show resolved
Hide resolved
|
||
RAJA::MultiReduceBitOr< RAJA::omp_multi_reduce, int > vor(B, 0); | ||
|
||
// Run a kernel using the multi-reduction object | ||
RAJA::forall<RAJA::omp_parallel_for_exec>( RAJA::RangeSegment(0, N), | ||
[=](RAJA::Index_type i) { | ||
|
||
vor[bins[i]] |= i; | ||
|
||
}); | ||
|
||
// After kernel is run, extract the reduced values | ||
int my_vors[B]; | ||
for (int bin = 0; bin < B; ++bin) { | ||
my_vors[bin] = vor[bin].get(); | ||
} | ||
|
||
The results of these operations will yield the following values: | ||
|
||
* my_vors[0] == 120 == 0b1111000 | ||
* my_vors[1] == 121 == 0b1111001 | ||
* my_vors[2] == 122 == 0b1111010 | ||
* my_vors[3] == 123 == 0b1111011 | ||
* my_vors[4] == 124 == 0b1111100 | ||
* my_vors[5] == 125 == 0b1111101 | ||
* my_vors[6] == 126 == 0b1111110 | ||
* my_vors[7] == 127 == 0b1111111 | ||
|
||
The results of the multi-reduction start at 120 and increase to 127. In binary | ||
representation (i.e., bits), :math:`120 = 0b1111000` and :math:`127 = 0b1111111`. | ||
The bins were picked in such a way that all the integers in a bin had the same | ||
remainder modulo 8 so their last 3 binary digits were all the same while their | ||
upper binary digits varied. Because bitwise-or keeps all the set bits, the upper | ||
bits are all set because at least one integer in that bin set them. The last | ||
3 bits were the same in all the integers so the last 3 bits are the same as the | ||
remainder modulo 8 of the bin number. | ||
|
||
----------------------- | ||
MultiReduction Policies | ||
----------------------- | ||
|
||
For more information about available RAJA multi-reduction policies and guidance | ||
on which to use with RAJA execution policies, please see | ||
:ref:`multi-reducepolicy-label`. |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want that change unless you change: "Here a" to "Here is a", which is more verbose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @rhornung67 meant to rephrase either with "Here, a" or "Here is how a".
Or, "Here is a simple sum multi-reduction performed using RAJA::".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided to leave this as is.