Simplify the CMake ROCm detection #419

haampie · 2021-01-12T21:59:54Z

This is work in progress and untested atm, but opening a pr anyways to get early feedback.

In SIRIUS and SpFFT we had some more success with find_packge(...) to locate ROCm libraries, even when using spack to build the ROCm packages.

A spack install of ROCm is generally a useful way to check your cmake, since it does not have AMD's favorite directory /opt/rocm, nor does it have llvm installed in $ROCM_PATH/llvm, etc.

Edit: this is done

jenkins-cscs · 2021-01-12T22:03:03Z

Can one of the admins verify this patch?

codecov · 2021-01-12T22:04:32Z

Codecov Report

Merging #419 (f415150) into develop (ba7f143) will decrease coverage by 0.0%.
The diff coverage is n/a.

@@            Coverage Diff            @@
##           develop    #419     +/-   ##
=========================================
- Coverage     63.1%   63.1%   -0.1%     
=========================================
  Files           86      86             
  Lines        25625   25612     -13     
=========================================
- Hits         16190   16174     -16     
- Misses        9435    9438      +3

Flag	Coverage Δ
unittests	`63.1% <ø> (-0.1%)`	⬇️
with-blas	`63.1% <ø> (-0.1%)`	⬇️
with-libxsmm	`62.3% <ø> (-0.9%)`	⬇️
with-mpi	`63.6% <ø> (+<0.1%)`	⬆️
with-openmp	`62.3% <ø> (ø)`
without-mpi	`59.2% <ø> (-0.2%)`	⬇️
without-openmp	`62.7% <ø> (+0.4%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/acc/dbcsr_acc_device.F	`44.4% <ø> (ø)`
src/acc/dbcsr_acc_devmem.F	`10.0% <ø> (ø)`
src/acc/dbcsr_acc_event.F	`0.0% <ø> (ø)`
src/acc/dbcsr_acc_hostmem.F	`0.0% <ø> (ø)`
src/acc/dbcsr_acc_stream.F	`30.4% <ø> (ø)`
src/mpi/dbcsr_mpiwrap.F	`39.0% <0.0%> (-0.5%)`	⬇️
src/mm/dbcsr_mm_hostdrv.F	`60.7% <0.0%> (-0.4%)`	⬇️
src/utils/dbcsr_toollib.F	`69.2% <0.0%> (-0.4%)`	⬇️
src/block/dbcsr_block_operations.F	`54.4% <0.0%> (-0.1%)`	⬇️
src/core/dbcsr_lib.F	`81.2% <0.0%> (+0.6%)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ba7f143...f415150. Read the comment docs.

src/CMakeLists.txt

src/acc/libsmm_acc/CMakeLists.txt

haampie · 2021-01-13T14:35:33Z

@dev-zero thanks for the early suggestions. This PR solves the issue where the device compiler's OpenMP is used for linking, which doesn't really make sense. Right now it's always using the host compiler for OpenMP (and things simplify, cause we can just use find_package(OpenMP REQUIRED) only). This works as long as device code is not using OpenMP.

haampie · 2021-01-13T14:39:32Z

Do you have opinions about including FindHIP module inside the sources, or should I bring back the HIP_PATH variable again for locating an external FindHIP module?

Currently it's required to find this HIP module through find_package(HIP MODULE) because it will otherwise detect hip-config.cmake, which is the hip config file (which we need too). The "problem' is that this module search mode only considers very restricted search paths (not CMAKE_PREFIX_PATH, only CMAKE_MODULE_PATH). From discussions on the internet, there's mixed opinions about having external module files.

dev-zero · 2021-01-13T14:56:05Z

Do you have opinions about including FindHIP module inside the sources, or should I bring back the HIP_PATH variable again for locating an external FindHIP module?

Currently it's required to find this HIP module through find_package(HIP MODULE) because it will otherwise detect hip-config.cmake, which is the hip config file (which we need too). The "problem' is that this module search mode only considers very restricted search paths (not CMAKE_PREFIX_PATH, only CMAKE_MODULE_PATH). From discussions on the internet, there's mixed opinions about having external module files.

ugh, is there any chance upstream is gonna fix this mess within reasonable time?
If yes I'd prefer if ROCm users would have to specify CMAKE_MODULE_PATH, that's easy to drop once fixed, doesn't hurt much if it proliferates (despite being not required anymore) and can be adapted to also build an older already released DBCSR against newer HIP.
Including FindHIP in our tree means we have to monitor upstream for changes and has the possibility that already released DBCSR versions break with newer ROCm framework versions. From that point of view I would probably prefer the HIP_PATH from before as it seemed reasonably easy to maintain.
But it's perfectly possible I am completely wrong here.
pre-commit install --install-hooks should save you some headaches ;-)

haampie · 2021-01-13T15:11:55Z

I'll drop the FindHIP module from here.

haampie · 2021-01-13T15:37:11Z

pre-commit install --install-hooks should save you some headaches ;-)

unfortunately this relies on python, which gives me headaches too :D it complains about python 3.6+ to be installed, but python 3.8 is the default on my desktop.

dev-zero · 2021-01-13T15:56:13Z

interesting, what does which pre-commit say? My guess would be a pip install --user pre-commit on a system where pip is for python2 and instead pip3 install --user pre-commit should've been used

haampie · 2021-01-13T15:56:59Z

Oh, I installed it with snap since Ubuntu suggested that to me. Maybe it shipped its own python...

src/CMakeLists.txt

haampie · 2021-01-13T17:20:03Z

Ok, I just tried compiling everything (rocm ecosystem & dbcsr) from source through spack, and I'm hitting

ABORT in dbcsr_lib.F:217 DBCSR compiled w/ threading support while libsmm_acc compiled w/o threading support.

Apparently the openmp issue is not entirely solved.

dev-zero · 2021-01-13T18:53:04Z

Ok, I just tried compiling everything (rocm ecosystem & dbcsr) from source through spack, and I'm hitting
ABORT in dbcsr_lib.F:217 DBCSR compiled w/ threading support while libsmm_acc compiled w/o threading support.
Apparently the openmp issue is not entirely solved.

Since the verbose makefile flag is set you should be able to check the compiler invocations in the log...

haampie · 2021-01-14T09:53:13Z

Hm, I'm not really getting it to work. Also without OpenMP the unit tests fail:

HIPRTC ERROR: CompileProgram failed with error HIPRTC_ERROR_COMPILATION

Seems like it is thrown whenever a jit program is re-compiled: https://github.com/ROCm-Developer-Tools/HIP/blob/rocm-3.9.0/src/hiprtc.cpp#L501

haampie · 2021-01-14T17:29:44Z

Boy, that was no fun. I should have looked in the issues too, cause #261 is related.

Previously when using just the host compiler (gcc) for openmp everywhere, I got this runtime error:

Assertion `d != acc_device_none && d != acc_device_default && d != acc_device_not_host' failed

Turns GNU's libgomp.so defines OpenACC functions, including acc_init, and I believe when the dbcsr fortran function acc_init is called, which in turn calls the C-function acc_init, it ends up calling a function in libgomp instead of the C-interface of dbcsr's acc lib. Very confusing.

With the current patches I can finally build dbcsr for ROCm using only the host compiler's implementation of OpenMP, which makes sense to me. Previously I would end up with both libomp.so and libgomp.so in the dbcsr lib.

haampie · 2021-01-14T17:35:58Z

Remaining issues:

HIPRTC ERROR: CompileProgram failed with error HIPRTC_ERROR_COMPILATION
Some sources that get compiled wit the host compiler include hip header files that want __HIP_PLATFORM_HCC__ or __HIP_PLATFORM_NVCC__ to be defined.

haampie · 2021-01-14T19:36:12Z

@dev-zero, I realized just now that all device code is 100% jitted, is that correct? In that case we can drop hipcc / the device compiler altogether from cmake? Or are there cases in which kernels are compiled ahead of time too?

haampie · 2021-01-18T16:50:51Z

ping @mtaillefumier

haampie · 2021-01-18T17:01:07Z

So, the TL;DR if this PR is:

Drop searching for the device compiler during cmake configuration, since we only have to link to ROCm libraries
Use host compiler's openmp for everything, it used to use the device compiler's openmp implementation.
Seems like rocm 3.5.0 and above has moved to using ROCclr which improved their jit/hiprtc* code as it now does not call hipcc anymore but only clang directly, which is good. But they don't seem to forward this -D__HIP flag, which caused the jit to fail. Fixed by looking for HIP_ROCclr.
The C api is now prefixed with c_dbsr_* because otherwise the Fortran code ended up calling acc_init in libgomp.so instead of the C-part of dbcsr 🙃.
Added -Wno-error=...deprecation warnings, because the relevant code actually already works around that. (Or maybe we should accept that ROCm just has a different interface and a simple macro for NVCC & HIP doesn't work?)
Updated the docs to reflect what's to be configured for HIP

haampie · 2021-01-18T20:59:59Z

Tests are passing on Ault btw:

Test project dbcsr-project/dbcsr/spack-build-le4geew
      Start  1: dbcsr_perf:inputs/test_H2O.perf
 1/21 Test  #1: dbcsr_perf:inputs/test_H2O.perf .......................   Passed    6.48 sec
      Start  2: dbcsr_perf:inputs/test_rect1_dense.perf
 2/21 Test  #2: dbcsr_perf:inputs/test_rect1_dense.perf ...............   Passed    1.75 sec
      Start  3: dbcsr_perf:inputs/test_rect1_sparse.perf
 3/21 Test  #3: dbcsr_perf:inputs/test_rect1_sparse.perf ..............   Passed    2.64 sec
      Start  4: dbcsr_perf:inputs/test_rect2_dense.perf
 4/21 Test  #4: dbcsr_perf:inputs/test_rect2_dense.perf ...............   Passed    1.73 sec
      Start  5: dbcsr_perf:inputs/test_rect2_sparse.perf
 5/21 Test  #5: dbcsr_perf:inputs/test_rect2_sparse.perf ..............   Passed    2.28 sec
      Start  6: dbcsr_perf:inputs/test_singleblock.perf
 6/21 Test  #6: dbcsr_perf:inputs/test_singleblock.perf ...............   Passed    1.65 sec
      Start  7: dbcsr_perf:inputs/test_square_dense.perf
 7/21 Test  #7: dbcsr_perf:inputs/test_square_dense.perf ..............   Passed    1.64 sec
      Start  8: dbcsr_perf:inputs/test_square_sparse.perf
 8/21 Test  #8: dbcsr_perf:inputs/test_square_sparse.perf .............   Passed    1.85 sec
      Start  9: dbcsr_perf:inputs/test_square_sparse_bigblocks.perf
 9/21 Test  #9: dbcsr_perf:inputs/test_square_sparse_bigblocks.perf ...   Passed    5.31 sec
      Start 10: dbcsr_perf:inputs/test_square_sparse_rma.perf
10/21 Test #10: dbcsr_perf:inputs/test_square_sparse_rma.perf .........   Passed    1.87 sec
      Start 11: dbcsr_unittest1
11/21 Test #11: dbcsr_unittest1 .......................................   Passed   59.63 sec
      Start 12: dbcsr_unittest2
12/21 Test #12: dbcsr_unittest2 .......................................   Passed   22.83 sec
      Start 13: dbcsr_unittest3
13/21 Test #13: dbcsr_unittest3 .......................................   Passed   67.32 sec
      Start 14: dbcsr_unittest4
14/21 Test #14: dbcsr_unittest4 .......................................   Passed    0.56 sec
      Start 15: dbcsr_tensor_unittest
15/21 Test #15: dbcsr_tensor_unittest .................................   Passed   69.53 sec
      Start 16: dbcsr_tas_unittest
16/21 Test #16: dbcsr_tas_unittest ....................................   Passed  112.41 sec
      Start 17: dbcsr_test_csr_conversions
17/21 Test #17: dbcsr_test_csr_conversions ............................   Passed    1.19 sec
      Start 18: libsmm_acc_unittest_multiply
18/21 Test #18: libsmm_acc_unittest_multiply ..........................   Passed   13.97 sec
      Start 19: libsmm_acc_unittest_transpose
19/21 Test #19: libsmm_acc_unittest_transpose .........................   Passed   10.58 sec
      Start 20: libsmm_acc_timer_multiply-autotuned
20/21 Test #20: libsmm_acc_timer_multiply-autotuned ...................   Passed   15.37 sec
      Start 21: libsmm_acc_timer_multiply-predicted
21/21 Test #21: libsmm_acc_timer_multiply-predicted ...................   Passed    0.12 sec

haampie · 2021-01-27T14:23:36Z

@dev-zero, this PR is done, can you review?

dev-zero

looks good, OpenMP/CMake-remark might be solved by #406

CMakeLists.txt

cmake/CompilerConfiguration.cmake

docs/guide/2-user-guide/1-installation/index.md

src/CMakeLists.txt

src/acc/libsmm_acc/CMakeLists.txt

tests/CMakeLists.txt

This is to make sure we can use CUDA toolkit without requiring the language to be enabled for the project

alazzaro · 2021-02-04T06:46:00Z

retest this please

alazzaro · 2021-02-04T07:33:03Z

OK, we are a step forward...
Now it cannot find the executable (I assume because of a wrong name):

      Start 20: /scratch/snx3000/jenkg90/jenkins-g90-DBCSR-681.intel/tests/libsmm_acc_unittest_multiply.cpp
Process not started
 /scratch/snx3000/jenkg90/jenkins-g90-DBCSR-681.intel/tests/libsmm_acc_unittest_multiply.cpp
[permission denied]
20/24 Test #20: /scratch/snx3000/jenkg90/jenkins-g90-DBCSR-681.intel/tests/libsmm_acc_unittest_multiply.cpp ...***Not Run   0.00 sec
      Start 21: /scratch/snx3000/jenkg90/jenkins-g90-DBCSR-681.intel/tests/libsmm_acc_timer_multiply.cpp
Process not started
 /scratch/snx3000/jenkg90/jenkins-g90-DBCSR-681.intel/tests/libsmm_acc_timer_multiply.cpp
[permission denied]
21/24 Test #21: /scratch/snx3000/jenkg90/jenkins-g90-DBCSR-681.intel/tests/libsmm_acc_timer_multiply.cpp ......***Not Run   0.00 sec
      Start 22: libsmm_acc_unittest_transpose.cpp
Could not find executable libsmm_acc_unittest_transpose.cpp
Looked in the following places:
libsmm_acc_unittest_transpose.cpp
libsmm_acc_unittest_transpose.cpp
Release/libsmm_acc_unittest_transpose.cpp
Release/libsmm_acc_unittest_transpose.cpp
Debug/libsmm_acc_unittest_transpose.cpp
Debug/libsmm_acc_unittest_transpose.cpp
MinSizeRel/libsmm_acc_unittest_transpose.cpp
MinSizeRel/libsmm_acc_unittest_transpose.cpp
RelWithDebInfo/libsmm_acc_unittest_transpose.cpp
RelWithDebInfo/libsmm_acc_unittest_transpose.cpp
Deployment/libsmm_acc_unittest_transpose.cpp
Deployment/libsmm_acc_unittest_transpose.cpp
Development/libsmm_acc_unittest_transpose.cpp
Development/libsmm_acc_unittest_transpose.cpp
Unable to find executable: libsmm_acc_unittest_transpose.cpp

hfp · 2021-02-04T08:07:22Z

Oof... it might as well be that there is still maintenance going on with the node I'm on 😅 both clinfo and rocm_agent_enumerator stopped listing the GPU, even though there are still entries in /dev/dri. Maybe disregard my 'can reproduce' comment.

Yes, because the benchmark cannot reproduce the problem I mentioned earlier. It must be something on the node, indeed clinfo is the right tool to check. This (clinfo) is btw also noted in the install notes for ACC/OpenCL.

Let's merge it then! There's definitely some improvements in this PR over develop, and all sources now use OpenMP when that's enabled.

The scope of this PR increased quite a bit from "simplify CMake for ROCm" to resolving #261. I will merge the PR when CI passes.

Just to be sure, did you see 70825c3 too? It was necessary for me.

I tested AMD's OpenCL stack only under macOS using a Vega56 card, which is probably quite different from Linux (maybe only the ICD is from AMD whereas the "OpenCL platform" comes from Apple; not sure though). The reason for the double quotes was for two-component typenames like "unsigned int" (I played with different foundational types wrt atomics). Anyhow, OpenCL permits typenames like "uint" or "ulong" (quotes are not necessary).

dev-zero · 2021-02-04T08:20:32Z

@hfp @haampie thank you very much for taking care of this and also debugging and properly fixing the CMake configuration!

hfp · 2021-02-04T08:43:11Z

One more note (just for the record), prefixing functions (c_dbcsr_) in both benchmark drivers (acc_bench_trans.c, and acc_bench_smm.c) dropped some code/function calls (libsmm_acc_init and libsmm_acc_finalize). I will reintroduce this in an upcoming PR (once this PR is merged).

hfp · 2021-02-04T08:49:07Z

Focus now is to fix CI. GNU and Intel run-tests point to the same issue.

haampie · 2021-02-04T10:16:10Z

I pressume I can't trigger ci, but let's see:

retest this please

otherwise can someone do it for me?

alazzaro · 2021-02-04T11:06:20Z

retest this please

CMakeLists.txt

hfp · 2021-02-04T10:31:11Z

src/acc/acc_bench_smm.c

-#if !defined(__CUDA)
-    CHECK(libsmm_acc_finalize(), NULL);
-#endif
-    CHECK(acc_finalize(), NULL);


Do not worry about this!

We can adjust as we go. For example, I believe libsmm_acc may also be moved underneath of cuda or hip backend folder. Indeed, the CUDA backend depends on DBCSR library and the other way around (because of the timer stuff but also because of confusing init/fini flow). Rearding timers, DBCSR itself solved the problem with CP2K more elegant by taking a function pointer during init in order to deal with CP2K facility rather than a built-in statistic/timer.

hfp · 2021-02-04T12:01:48Z

Daint-CI seems to be missing once more...

@haampie did the previous test pass with Daint?
If so, your latest changes should not affect it, and the PR could be merged...

haampie · 2021-02-04T12:07:12Z

I've run the cmake + make by hand with the latest Sprinkle ... commit on Daint and it seems to work, so good to go then.

And yes, tests for e9bcce3 passed https://object.cscs.ch/v1/AUTH_40b5d92b316940098ceb15cf46fb815e/dbcsr-artifacts/logs/build-682/

haampie · 2021-02-04T12:29:07Z

Thanks @hfp :)

…ing scripts. Minor fixes after cp2k#419. * Introduced (runtime-)verbosity level. Print device name (non-zero verbosity). * Fixed issue (cp2k#419 (comment)). * Renamed ACC_OPENCL_VERBOSE to ACC_OPENCL_DEBUG. * ACC benchmark drivers: inform if no device was found. * Improved documentation and documented ACC_OPENCL_VERBOSE. * Introduced verbose output (time needed for kernel compilation, etc). * tune_multiply.py: option to only rely on primary objective. * tune_multiply.py: catch CTRL-C and save configuration. * tune_multiply.sh: relay result code of failing script. * tune_multiply.sh: continuation with wrapper script.

…(accommodate changes from cp2k#419).

alazzaro · 2021-02-04T13:41:37Z

I had no time to review the PR before the merge...
In any case, I left some comments.

Few other remarks here:

Is the Daint-CI happy? It seems we didn't run it at the end...
I see you replaced .cu with .cpp. Unfortunately, probably this will break the compilation in CP2K with the Makefile, forcing us to move to the cmake (which is good, we have to do that anyway)
adding the suffix c_dbcsr_acc_ is a good choice, but I would have preferred to have change the Fortran name too...

hfp · 2021-02-04T13:59:07Z

Let's not worry, this became brittle since the scope of the PR increased a lot from just CMake for ROCm to some resolution of #261 (touching a lot of source code rather than just CMake stuff). I guess @haampie is happy to help with any other work needed on top of what we got...

alazzaro · 2021-02-04T14:04:11Z

@hfp definitely, another PR by @haampie to address my comments is always welcome 😄

haampie · 2021-02-04T14:13:45Z

Regarding

I see you replaced .cu with .cpp. Unfortunately, probably this will break the compilation in CP2K with the Makefile, forcing us to move to the cmake (which is good, we have to do that anyway)

I found this a bit funny, as it results in cmake switching to the device compiler for that particular source file. I've now disabled the device compiler entirely (that is, CUDA is not an enabled language, and I'm not using hip_add_library for ROCm) since all device code is compiled at runtime anyways. Wasn't aware this caused upstream issues... I could undo it and then set the language of that particular file to CXX so that it uses the right compiler.

Is the Daint-CI happy? It seems we didn't run it at the end...

I did run it by hand, only for GNU, and it was fine. But maybe good to run it on develop again?

adding the suffix c_dbcsr_acc_ is a good choice, but I would have preferred to have change the Fortran name too...

Yeah, first I just added dbcsr_* but that conflicted with the fortran function names, so I used c_dbcsr_*. If you want we can do also change Fortran names in a separate PR?

alazzaro

Regarding

I see you replaced .cu with .cpp. Unfortunately, probably this will break the compilation in CP2K with the Makefile, forcing us to move to the cmake (which is good, we have to do that anyway)

I found this a bit funny, as it results in cmake switching to the device compiler for that particular source file. I've now disabled the device compiler entirely (that is, CUDA is not an enabled language, and I'm not using hip_add_library for ROCm) since all device code is compiled at runtime anyways. Wasn't aware this caused upstream issues... I could undo it and then set the language of that particular file to CXX so that it uses the right compiler.

No, that's OK.

Is the Daint-CI happy? It seems we didn't run it at the end...

I did run it by hand, only for GNU, and it was fine. But maybe good to run it on develop again?

Well, it is running in the new PR with your changes, let's see how it goes 😄

adding the suffix c_dbcsr_acc_ is a good choice, but I would have preferred to have change the Fortran name too...

Yeah, first I just added dbcsr_* but that conflicted with the fortran function names, so I used c_dbcsr_*. If you want we can do also change Fortran names in a separate PR?

Yeap, name duplication Fortran-C is an issue with the old GNU compiler (fixed with the new compiler).
Currently, we ask to have the dbcsr_ prefix in Fortran only for the functions exposed in the API (check here). The reason was to avoid changing tons of internal code... But now we have a situation where we probably want to have a compatible naming between C and Fortran... In short, yes ;) but no rush...

On the other side, could you fix the two minor comments I left in the code?

cmake/CompilerConfiguration.cmake

CMakeLists.txt

…, minor fixes after #419 (#425) * OpenCL-BE/LIBSMM: verbose output and documentation. Improved auto-tuning scripts. Minor fixes after #419. * Fixed Makefile used to build acc_bench_trans/acc_bench_smm with CUDA (accommodate changes from #419). * Fixed issue (#419 (comment)). * More prefixes (global variables, etc) in follow-up of #419 (c_dbcsr_). * Introduced (runtime-)verbosity level. Print device name (non-zero verbosity). * Renamed ACC_OPENCL_VERBOSE to ACC_OPENCL_DEBUG. * Improved documentation and documented ACC_OPENCL_VERBOSE. * Introduced verbose output (time needed for kernel compilation, etc). * ACC benchmark drivers: inform if no device was found. * Warn about potentially exclusive device-mode. * tune_multiply.py: option to only rely on primary objective. * tune_multiply.py: catch CTRL-C and save configuration. * tune_multiply.sh: relay result code of failing script. * tune_multiply.sh: continuation with wrapper script. * Enabled runtime-test OpenCL BE/LIBSMM. * Unrelated: removed tabs from source file.

haampie force-pushed the simplify-rocm-cmake branch from fa1a62b to c18ebce Compare January 12, 2021 22:03

dev-zero reviewed Jan 13, 2021

View reviewed changes

src/CMakeLists.txt Outdated Show resolved Hide resolved

dev-zero reviewed Jan 13, 2021

View reviewed changes

src/acc/libsmm_acc/CMakeLists.txt Outdated Show resolved Hide resolved

dev-zero reviewed Jan 13, 2021

View reviewed changes

src/CMakeLists.txt Outdated Show resolved Hide resolved

haampie force-pushed the simplify-rocm-cmake branch from c9a1294 to 765d009 Compare January 13, 2021 23:25

haampie force-pushed the simplify-rocm-cmake branch from 765d009 to 1259a0e Compare January 14, 2021 17:19

haampie marked this pull request as ready for review January 18, 2021 20:12

haampie requested a review from dev-zero January 27, 2021 14:23

dev-zero requested changes Jan 27, 2021

View reviewed changes

hfp mentioned this pull request Jan 28, 2021

OpenCL based ACC-backend and SMM library #406

Merged

haampie mentioned this pull request Jan 28, 2021

ACC init/finalize issues #422

Open

Bump required CMake version to 3.17

ef06ce2

This is to make sure we can use CUDA toolkit without requiring the language to be enabled for the project

haampie added 3 commits February 4, 2021 10:57

Fix comment

5d89b94

Fix test names

65dc8c5

Fix tests

e9bcce3

hfp reviewed Feb 4, 2021

View reviewed changes

Sprinkle REQUIRED and make OpenCL require libxsmm

f415150

hfp merged commit f3f60cf into cp2k:develop Feb 4, 2021

haampie deleted the simplify-rocm-cmake branch February 4, 2021 12:26

hfp added a commit to hfp/dbcsr that referenced this pull request Feb 4, 2021

Fixed Makefile used to build acc_bench_trans/acc_bench_smm with CUDA …

69a0e85

…(accommodate changes from cp2k#419).

hfp mentioned this pull request Feb 4, 2021

OpenCL verbose output and documentation, improved auto-tuning scripts, minor fixes after #419 #425

Merged

alazzaro reviewed Feb 4, 2021

View reviewed changes

cmake/CompilerConfiguration.cmake Show resolved Hide resolved

CMakeLists.txt Show resolved Hide resolved

hfp added a commit to hfp/dbcsr that referenced this pull request Feb 5, 2021

More prefixes in follow-up of cp2k#419 (c_dbcsr_).

8ac6f1d

alazzaro mentioned this pull request Feb 5, 2021

Make nvToolsExt conditional on WITH_CUDA_PROFILING #428

Merged

Simplify the CMake ROCm detection #419

Simplify the CMake ROCm detection #419

Conversation

haampie commented Jan 12, 2021 • edited Loading

jenkins-cscs commented Jan 12, 2021

codecov bot commented Jan 12, 2021 • edited Loading

Codecov Report

haampie commented Jan 13, 2021 • edited Loading

haampie commented Jan 13, 2021 • edited Loading

dev-zero commented Jan 13, 2021

haampie commented Jan 13, 2021

haampie commented Jan 13, 2021

dev-zero commented Jan 13, 2021

haampie commented Jan 13, 2021

haampie commented Jan 13, 2021

dev-zero commented Jan 13, 2021

haampie commented Jan 14, 2021

haampie commented Jan 14, 2021

haampie commented Jan 14, 2021 • edited Loading

haampie commented Jan 14, 2021

haampie commented Jan 18, 2021

haampie commented Jan 18, 2021 • edited Loading

haampie commented Jan 18, 2021

haampie commented Jan 27, 2021

dev-zero left a comment

Choose a reason for hiding this comment

alazzaro commented Feb 4, 2021

alazzaro commented Feb 4, 2021

hfp commented Feb 4, 2021

dev-zero commented Feb 4, 2021

hfp commented Feb 4, 2021

hfp commented Feb 4, 2021

haampie commented Feb 4, 2021

alazzaro commented Feb 4, 2021

hfp Feb 4, 2021

Choose a reason for hiding this comment

hfp commented Feb 4, 2021

haampie commented Feb 4, 2021 • edited Loading

haampie commented Feb 4, 2021

alazzaro commented Feb 4, 2021

hfp commented Feb 4, 2021

alazzaro commented Feb 4, 2021 • edited Loading

haampie commented Feb 4, 2021 • edited Loading

alazzaro left a comment

Choose a reason for hiding this comment

haampie commented Jan 12, 2021 •

edited

Loading

codecov bot commented Jan 12, 2021 •

edited

Loading

haampie commented Jan 13, 2021 •

edited

Loading

haampie commented Jan 13, 2021 •

edited

Loading

haampie commented Jan 14, 2021 •

edited

Loading

haampie commented Jan 18, 2021 •

edited

Loading

haampie commented Feb 4, 2021 •

edited

Loading

alazzaro commented Feb 4, 2021 •

edited

Loading

haampie commented Feb 4, 2021 •

edited

Loading