Distributed Performance-portable Stencil Compuitation - Documentation@bricks.run
- C++14 compatible compiler
- OpenMP
- MPI library
- CMake
- Optional backends
- CUDA
- OpenCL
- SYCL
- HIP WIP
- Clone the repository
- Create a build directory inside the source tree
mkdir build
- Create build configuration
cd build && cmake .. -DCMAKE_BUILD_TYPE=Release
- Build different test cases using
make <testname>
For description of the test cases see here.
The brick template consists of 3 part:
Brick
: declare brick data structureBrickInfo
: an adjacency list that describes the relations between bricksBrickStorage
: a chunk of memory for storing bricks
The behavior of such templated data structures are as normal: they do not require the use of code generator to function; provide a fallback way of writing code for compute & data movement.
Stencil expression for code generator are specified using Python library. Code generator provide optimization and vectorization support for different backend.
The code generation are carried out by CMake wrapper automatically. For details, see Codegen Integration.
Template arguments & code ordering is contiguous dimension last. Dimension arrays are contiguous at 0 (contiguous first).
include
andsrc
contains the brick library headers and library files.docs
various documentscmake
CMake module file- Included test cases are split into 4 folders:
stencils
contains different stencils and related initialization code used by all tests as neededsingle
for single node (no MPI)weak
for weak scaling or strong scaling with one-level decomposition (one subdomain per rank)strong
for strong scaling with two-level decomposition (multiple fixed-sized subdomains per rank)
A large portion of the brick library is entirely based on templates and can be included as a header only library.
- This research was supported by the Exascale Computing Project (17-SC-20-SC), a joint project of the U.S. Department of Energy's Office of Science and National Nuclear Security Administration.
- This research used resources of the Oak Ridge Leadership Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
- This research used resources of the Argonne Leadership Computing Facility at Argonne National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-06CH11357.
- This research used resources in Lawrence Berkeley National Laboratory and the National Energy Research Scientific Computing Center, which are supported by the U.S. Department of Energy Office of Science’s Advanced Scientific Computing Research program under contract number DE-AC02-05CH11231.
@cite zhao2018 Zhao, Tuowen, Samuel Williams, Mary Hall, and Hans Johansen. "Delivering Performance-Portable Stencil Computations on CPUs and GPUs Using Bricks." In 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), pp. 59-70. IEEE, 2018.
@cite zhao2019 Zhao, Tuowen, Protonu Basu, Samuel Williams, Mary Hall, and Hans Johansen. "Exploiting reuse and vectorization in blocked stencil computations on CPUs and GPUs." In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p. 52. ACM, 2019.