-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: add RAJA kernel launches and basic CUDA support #1026
Open
johnbowen42
wants to merge
100
commits into
develop
Choose a base branch
from
feature/bowen/raja-for-all
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 27 commits
Commits
Show all changes
100 commits
Select commit
Hold shift + click to select a range
b47db88
Clang format
johnbowen42 325f868
Cleanup print statements and commented code
johnbowen42 9c5d5b1
Refactor more raja for all loops
johnbowen42 4e9e7a9
Passing more tests
johnbowen42 a8c1dd6
tmp
johnbowen42 8b5dc3d
adding memcpy ops
johnbowen42 162052a
Add memcpy operations
johnbowen42 f35d677
Unit test passing for hexahedron
johnbowen42 bb181ec
enable more unit tests
johnbowen42 8d51959
Remove Cmakelists cruft
johnbowen42 dc9739f
Format RAJA launch kernels
johnbowen42 831ec04
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 6aa995c
format RAJA kernels
johnbowen42 7c598cb
re-enable functional shape derivatives
johnbowen42 83f8bcd
Re-enable functional_basic_h1_vector
johnbowen42 fdb4b23
Fix more unit tests not compiling
johnbowen42 2480dc3
Convert lambda use to functors. Change signature of interpolate to a…
johnbowen42 96049f1
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 61130b4
Refactor interpolate API for finite elements
johnbowen42 7838993
Change umpire usage
johnbowen42 0e02e98
Refactor boundary_integral_kernels to use CUDA. Make interpolate void
johnbowen42 e4224e2
Eliminate unused code
johnbowen42 bffae94
Fixing some unit tests
johnbowen42 4c1f6e5
Fix more unit tests
johnbowen42 274f432
Fix various solid unit tests
johnbowen42 fcbf4a8
fix thermal unit tests
johnbowen42 be9abc1
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 72e6d30
TMP: use mfem vector/device to manage memory
johnbowen42 be05190
Fix functional with domain heap overflow
johnbowen42 caeec16
tmp
johnbowen42 865848e
Fix bug
johnbowen42 9836836
reenable test
johnbowen42 1bca89d
Fix compilation error
johnbowen42 f3d03f7
Fix unused variable and narrowing warnings
johnbowen42 10da80a
modify CMakeLists
johnbowen42 5dca932
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 46bf790
fix compilation errors
johnbowen42 dddf080
Fix internal compiler error in functional shape derivatives. Add do…
johnbowen42 c47ded4
More docs and gcc build errors
johnbowen42 2c96666
add doc strings
johnbowen42 474d0bd
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 da53cca
docs and format
johnbowen42 b4ec3c2
clang format
johnbowen42 25223b3
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 7e823ff
remove headers
johnbowen42 e1014e7
delete more headers
johnbowen42 bec4cde
decrease level of optimization
johnbowen42 6a0a0d4
Merge branch 'develop' of github.com:LLNL/serac into feature/bowen/ra…
jamiebramwell e2165ea
Revert assemble()
jamiebramwell 0f843d0
lower job slots to see if memory is the issue
white238 e5c91f6
remove unneeded options
white238 a6605b6
quiet warnings, unify defines
white238 84feae7
Merge branch 'develop' into feature/bowen/raja-for-all
white238 a9da335
style
white238 0c95469
remove using statements from elements
johnbowen42 7d58b5d
remove using statements from headers
johnbowen42 3545588
Move RAJA types into header, cleanup conditional compilation in evalu…
johnbowen42 5d7cdc7
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 d77c24b
Format element headers
johnbowen42 5f4d580
increase parallelism in build script
johnbowen42 974d6c4
Add RAJA includes
johnbowen42 194f0c3
re-enable tests
johnbowen42 9d16d2c
Rename CUDA execution macro. Add comments. Remove unnecessary expli…
johnbowen42 2047461
Add cuda scalar unit test
johnbowen42 ced8b7b
clang style
johnbowen42 3665bdf
Add CUDA unit tests.
johnbowen42 d822b96
Merge branch 'develop' into feature/bowen/raja-for-all
white238 723062e
fix merge error
white238 98fc557
get bug_boundary_qoi to compile
white238 54d01c9
Merge branch 'develop' of github.com:LLNL/serac into feature/bowen/ra…
jamiebramwell 4954a36
Merge branch 'develop' of github.com:LLNL/serac into feature/bowen/ra…
jamiebramwell 13c7c20
Merge branch 'feature/bowen/raja-for-all' of github.com:LLNL/serac in…
johnbowen42 ea744ec
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 27510b4
Enable functional comparisons unit test for CUDA
johnbowen42 6125197
Fix build issues
johnbowen42 e58dfab
blt submodule
johnbowen42 09f4282
Fix unit tests
johnbowen42 091ccf8
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 1d8a6aa
fix compilation bug
johnbowen42 364b46a
fix compilation bug
johnbowen42 fa98c4c
Plum ExecutionSpace template parameter through more classes
johnbowen42 1e2f7dc
Add exec space parameter to finite elements
johnbowen42 7c4067f
attempt to fix ambiguous call error
johnbowen42 c062c52
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 2ded3ec
delete whitespace changes
johnbowen42 e912aab
Merge branch 'feature/bowen/raja-v2' into feature/bowen/raja-for-all
johnbowen42 9721e45
tmp
johnbowen42 7ac6b25
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 8995b61
Change cuda execution interface
johnbowen42 abac93d
remove whitespace changes
johnbowen42 bdafb37
Remove conditional compilation
johnbowen42 0a8b42e
Complete addition of execution space parameter to code
johnbowen42 17cd58c
Fix link error
johnbowen42 26b99de
Fix fit test
johnbowen42 151d793
Add docs and try to fix build error
johnbowen42 0dc4d88
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 0a720b7
increase timeout
johnbowen42 7686463
Fix docs
johnbowen42 4b2916e
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 01c8e6a
Merge branch 'develop' into feature/bowen/raja-for-all
johnbowen42 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,3 +13,4 @@ | |
*.orig | ||
__pycache__/ | ||
view | ||
*.cache* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
johnbowen42 marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -30,13 +30,15 @@ if(DEFINED ENV{SPACK_CC}) | |
else() | ||
|
||
set(CMAKE_C_COMPILER "/usr/tce/packages/clang/clang-ibm-10.0.1-gcc-8.3.1/bin/clang" CACHE PATH "") | ||
#set(CMAKE_C_COMPILER "/usr/tce/packages/clang/clang-13.0.1-gcc-8.3.1/bin/clang" CACHE PATH "") | ||
|
||
set(CMAKE_CXX_COMPILER "/usr/tce/packages/clang/clang-ibm-10.0.1-gcc-8.3.1/bin/clang++" CACHE PATH "") | ||
#set(CMAKE_CXX_COMPILER "/usr/tce/packages/clang/clang-13.0.1-gcc-8.3.1/bin/clang++" CACHE PATH "") | ||
|
||
set(CMAKE_Fortran_COMPILER "/usr/tce/packages/gcc/gcc-8.3.1/bin/gfortran" CACHE PATH "") | ||
|
||
endif() | ||
|
||
#-fopenmp -gdwarf-4 -fgpu-rdc | ||
set(CMAKE_C_STANDARD_LIBRARIES "-lgfortran" CACHE STRING "") | ||
|
||
set(CMAKE_CXX_STANDARD_LIBRARIES "-lgfortran" CACHE STRING "") | ||
|
@@ -69,29 +71,34 @@ set(BLT_MPI_COMMAND_APPEND "mpibind" CACHE STRING "") | |
# Cuda | ||
#------------------------------------------------ | ||
|
||
set(CUDAToolkit_ROOT "/usr/tce/packages/cuda/cuda-11.2.0" CACHE PATH "") | ||
set(CUDAToolkit_ROOT "/usr/tce/packages/cuda/cuda-12.0.0" CACHE PATH "") | ||
johnbowen42 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
#set(CUDAToolkit_ROOT "/usr/tce/packages/cuda/cuda-10.1.105" CACHE PATH "") | ||
|
||
set(CMAKE_CUDA_COMPILER "${CUDAToolkit_ROOT}/bin/nvcc" CACHE PATH "") | ||
|
||
set(CMAKE_CUDA_HOST_COMPILER "${CMAKE_CXX_COMPILER}" CACHE PATH "") | ||
|
||
set(CUDA_TOOLKIT_ROOT_DIR "/usr/tce/packages/cuda/cuda-11.2.0" CACHE PATH "") | ||
set(CUDA_TOOLKIT_ROOT_DIR "/usr/tce/packages/cuda/cuda-12.0.0" CACHE PATH "") | ||
#set(CUDA_TOOLKIT_ROOT_DIR "/usr/tce/packages/cuda/cuda-10.1.105" CACHE PATH "") | ||
|
||
set(CMAKE_CUDA_ARCHITECTURES "70" CACHE STRING "") | ||
|
||
set(ENABLE_OPENMP ON CACHE BOOL "") | ||
|
||
set(ENABLE_CUDA ON CACHE BOOL "") | ||
|
||
set(CMAKE_CUDA_SEPARABLE_COMPILATION ON CACHE BOOL "") | ||
set(ENABLE_CLANG_CUDA OFF CACHE BOOL "") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please remove any setting of clang cuda |
||
|
||
set(CMAKE_CUDA_FLAGS " --expt-extended-lambda --expt-relaxed-constexpr " CACHE STRING "") | ||
|
||
set(CMAKE_CUDA_ARCHITECTURES "70" CACHE STRING "") | ||
set(CMAKE_CUDA_SEPARABLE_COMPILATION ON CACHE BOOL "") | ||
|
||
#set(CMAKE_CUDA_FLAGS " --expt-extended-lambda --expt-relaxed-constexpr " CACHE STRING "") | ||
#set(CMAKE_CUDA_FLAGS "-fopenmp" CACHE STRING "") | ||
|
||
# nvcc does not like gtest's 'pthreads' flag | ||
|
||
set(gtest_disable_pthreads ON CACHE BOOL "") | ||
set(gtest_disable_pthreads OFF CACHE BOOL "") | ||
johnbowen42 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
set(BLT_CMAKE_IMPLICIT_LINK_DIRECTORIES_EXCLUDE "/usr/tce/packages/gcc/gcc-4.9.3/lib64;/usr/tce/packages/gcc/gcc-4.9.3/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3;/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib64;/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3" CACHE STRING "") | ||
|
||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think this is needed, if it is then probably only in the codevelop build.