Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify the CMake ROCm detection #419

Merged
merged 44 commits into from
Feb 4, 2021
Merged

Conversation

haampie
Copy link
Contributor

@haampie haampie commented Jan 12, 2021

This is work in progress and untested atm, but opening a pr anyways to get early feedback.

In SIRIUS and SpFFT we had some more success with find_packge(...) to locate ROCm libraries, even when using spack to build the ROCm packages.

A spack install of ROCm is generally a useful way to check your cmake, since it does not have AMD's favorite directory /opt/rocm, nor does it have llvm installed in $ROCM_PATH/llvm, etc.

Edit: this is done

@jenkins-cscs
Copy link
Collaborator

Can one of the admins verify this patch?

@codecov
Copy link

codecov bot commented Jan 12, 2021

Codecov Report

Merging #419 (f415150) into develop (ba7f143) will decrease coverage by 0.0%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##           develop    #419     +/-   ##
=========================================
- Coverage     63.1%   63.1%   -0.1%     
=========================================
  Files           86      86             
  Lines        25625   25612     -13     
=========================================
- Hits         16190   16174     -16     
- Misses        9435    9438      +3     
Flag Coverage Δ
unittests 63.1% <ø> (-0.1%) ⬇️
with-blas 63.1% <ø> (-0.1%) ⬇️
with-libxsmm 62.3% <ø> (-0.9%) ⬇️
with-mpi 63.6% <ø> (+<0.1%) ⬆️
with-openmp 62.3% <ø> (ø)
without-mpi 59.2% <ø> (-0.2%) ⬇️
without-openmp 62.7% <ø> (+0.4%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/acc/dbcsr_acc_device.F 44.4% <ø> (ø)
src/acc/dbcsr_acc_devmem.F 10.0% <ø> (ø)
src/acc/dbcsr_acc_event.F 0.0% <ø> (ø)
src/acc/dbcsr_acc_hostmem.F 0.0% <ø> (ø)
src/acc/dbcsr_acc_stream.F 30.4% <ø> (ø)
src/mpi/dbcsr_mpiwrap.F 39.0% <0.0%> (-0.5%) ⬇️
src/mm/dbcsr_mm_hostdrv.F 60.7% <0.0%> (-0.4%) ⬇️
src/utils/dbcsr_toollib.F 69.2% <0.0%> (-0.4%) ⬇️
src/block/dbcsr_block_operations.F 54.4% <0.0%> (-0.1%) ⬇️
src/core/dbcsr_lib.F 81.2% <0.0%> (+0.6%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ba7f143...f415150. Read the comment docs.

src/CMakeLists.txt Outdated Show resolved Hide resolved
@haampie
Copy link
Contributor Author

haampie commented Jan 13, 2021

@dev-zero thanks for the early suggestions. This PR solves the issue where the device compiler's OpenMP is used for linking, which doesn't really make sense. Right now it's always using the host compiler for OpenMP (and things simplify, cause we can just use find_package(OpenMP REQUIRED) only). This works as long as device code is not using OpenMP.

@haampie
Copy link
Contributor Author

haampie commented Jan 13, 2021

Do you have opinions about including FindHIP module inside the sources, or should I bring back the HIP_PATH variable again for locating an external FindHIP module?

Currently it's required to find this HIP module through find_package(HIP MODULE) because it will otherwise detect hip-config.cmake, which is the hip config file (which we need too). The "problem' is that this module search mode only considers very restricted search paths (not CMAKE_PREFIX_PATH, only CMAKE_MODULE_PATH). From discussions on the internet, there's mixed opinions about having external module files.

@dev-zero
Copy link
Contributor

Do you have opinions about including FindHIP module inside the sources, or should I bring back the HIP_PATH variable again for locating an external FindHIP module?

Currently it's required to find this HIP module through find_package(HIP MODULE) because it will otherwise detect hip-config.cmake, which is the hip config file (which we need too). The "problem' is that this module search mode only considers very restricted search paths (not CMAKE_PREFIX_PATH, only CMAKE_MODULE_PATH). From discussions on the internet, there's mixed opinions about having external module files.

ugh, is there any chance upstream is gonna fix this mess within reasonable time?
If yes I'd prefer if ROCm users would have to specify CMAKE_MODULE_PATH, that's easy to drop once fixed, doesn't hurt much if it proliferates (despite being not required anymore) and can be adapted to also build an older already released DBCSR against newer HIP.
Including FindHIP in our tree means we have to monitor upstream for changes and has the possibility that already released DBCSR versions break with newer ROCm framework versions. From that point of view I would probably prefer the HIP_PATH from before as it seemed reasonably easy to maintain.
But it's perfectly possible I am completely wrong here.
pre-commit install --install-hooks should save you some headaches ;-)

@haampie
Copy link
Contributor Author

haampie commented Jan 13, 2021

I'll drop the FindHIP module from here.

@haampie
Copy link
Contributor Author

haampie commented Jan 13, 2021

pre-commit install --install-hooks should save you some headaches ;-)

unfortunately this relies on python, which gives me headaches too :D it complains about python 3.6+ to be installed, but python 3.8 is the default on my desktop.

@dev-zero
Copy link
Contributor

interesting, what does which pre-commit say? My guess would be a pip install --user pre-commit on a system where pip is for python2 and instead pip3 install --user pre-commit should've been used

@haampie
Copy link
Contributor Author

haampie commented Jan 13, 2021

Oh, I installed it with snap since Ubuntu suggested that to me. Maybe it shipped its own python...

src/CMakeLists.txt Outdated Show resolved Hide resolved
@haampie
Copy link
Contributor Author

haampie commented Jan 13, 2021

Ok, I just tried compiling everything (rocm ecosystem & dbcsr) from source through spack, and I'm hitting

ABORT in dbcsr_lib.F:217 DBCSR compiled w/ threading support while libsmm_acc compiled w/o threading support.

Apparently the openmp issue is not entirely solved.

@dev-zero
Copy link
Contributor

Ok, I just tried compiling everything (rocm ecosystem & dbcsr) from source through spack, and I'm hitting

ABORT in dbcsr_lib.F:217 DBCSR compiled w/ threading support while libsmm_acc compiled w/o threading support.

Apparently the openmp issue is not entirely solved.

Since the verbose makefile flag is set you should be able to check the compiler invocations in the log...

@haampie
Copy link
Contributor Author

haampie commented Jan 14, 2021

Hm, I'm not really getting it to work. Also without OpenMP the unit tests fail:

HIPRTC ERROR: CompileProgram failed with error HIPRTC_ERROR_COMPILATION

Seems like it is thrown whenever a jit program is re-compiled: https://github.com/ROCm-Developer-Tools/HIP/blob/rocm-3.9.0/src/hiprtc.cpp#L501

@haampie
Copy link
Contributor Author

haampie commented Jan 14, 2021

Boy, that was no fun. I should have looked in the issues too, cause #261 is related.

Previously when using just the host compiler (gcc) for openmp everywhere, I got this runtime error:

Assertion `d != acc_device_none && d != acc_device_default && d != acc_device_not_host' failed

Turns GNU's libgomp.so defines OpenACC functions, including acc_init, and I believe when the dbcsr fortran function acc_init is called, which in turn calls the C-function acc_init, it ends up calling a function in libgomp instead of the C-interface of dbcsr's acc lib. Very confusing.

With the current patches I can finally build dbcsr for ROCm using only the host compiler's implementation of OpenMP, which makes sense to me. Previously I would end up with both libomp.so and libgomp.so in the dbcsr lib.

@haampie
Copy link
Contributor Author

haampie commented Jan 14, 2021

Remaining issues:

  • HIPRTC ERROR: CompileProgram failed with error HIPRTC_ERROR_COMPILATION
  • Some sources that get compiled wit the host compiler include hip header files that want __HIP_PLATFORM_HCC__ or __HIP_PLATFORM_NVCC__ to be defined.

@haampie
Copy link
Contributor Author

haampie commented Jan 14, 2021

@dev-zero, I realized just now that all device code is 100% jitted, is that correct? In that case we can drop hipcc / the device compiler altogether from cmake? Or are there cases in which kernels are compiled ahead of time too?

@haampie
Copy link
Contributor Author

haampie commented Jan 18, 2021

ping @mtaillefumier

@haampie
Copy link
Contributor Author

haampie commented Jan 18, 2021

So, the TL;DR if this PR is:

  • Drop searching for the device compiler during cmake configuration, since we only have to link to ROCm libraries
  • Use host compiler's openmp for everything, it used to use the device compiler's openmp implementation.
  • Seems like rocm 3.5.0 and above has moved to using ROCclr which improved their jit/hiprtc* code as it now does not call hipcc anymore but only clang directly, which is good. But they don't seem to forward this -D__HIP flag, which caused the jit to fail. Fixed by looking for HIP_ROCclr.
  • The C api is now prefixed with c_dbsr_* because otherwise the Fortran code ended up calling acc_init in libgomp.so instead of the C-part of dbcsr 🙃.
  • Added -Wno-error=...deprecation warnings, because the relevant code actually already works around that. (Or maybe we should accept that ROCm just has a different interface and a simple macro for NVCC & HIP doesn't work?)
  • Updated the docs to reflect what's to be configured for HIP

@haampie haampie marked this pull request as ready for review January 18, 2021 20:12
@haampie
Copy link
Contributor Author

haampie commented Jan 18, 2021

Tests are passing on Ault btw:

Test project dbcsr-project/dbcsr/spack-build-le4geew
      Start  1: dbcsr_perf:inputs/test_H2O.perf
 1/21 Test  #1: dbcsr_perf:inputs/test_H2O.perf .......................   Passed    6.48 sec
      Start  2: dbcsr_perf:inputs/test_rect1_dense.perf
 2/21 Test  #2: dbcsr_perf:inputs/test_rect1_dense.perf ...............   Passed    1.75 sec
      Start  3: dbcsr_perf:inputs/test_rect1_sparse.perf
 3/21 Test  #3: dbcsr_perf:inputs/test_rect1_sparse.perf ..............   Passed    2.64 sec
      Start  4: dbcsr_perf:inputs/test_rect2_dense.perf
 4/21 Test  #4: dbcsr_perf:inputs/test_rect2_dense.perf ...............   Passed    1.73 sec
      Start  5: dbcsr_perf:inputs/test_rect2_sparse.perf
 5/21 Test  #5: dbcsr_perf:inputs/test_rect2_sparse.perf ..............   Passed    2.28 sec
      Start  6: dbcsr_perf:inputs/test_singleblock.perf
 6/21 Test  #6: dbcsr_perf:inputs/test_singleblock.perf ...............   Passed    1.65 sec
      Start  7: dbcsr_perf:inputs/test_square_dense.perf
 7/21 Test  #7: dbcsr_perf:inputs/test_square_dense.perf ..............   Passed    1.64 sec
      Start  8: dbcsr_perf:inputs/test_square_sparse.perf
 8/21 Test  #8: dbcsr_perf:inputs/test_square_sparse.perf .............   Passed    1.85 sec
      Start  9: dbcsr_perf:inputs/test_square_sparse_bigblocks.perf
 9/21 Test  #9: dbcsr_perf:inputs/test_square_sparse_bigblocks.perf ...   Passed    5.31 sec
      Start 10: dbcsr_perf:inputs/test_square_sparse_rma.perf
10/21 Test #10: dbcsr_perf:inputs/test_square_sparse_rma.perf .........   Passed    1.87 sec
      Start 11: dbcsr_unittest1
11/21 Test #11: dbcsr_unittest1 .......................................   Passed   59.63 sec
      Start 12: dbcsr_unittest2
12/21 Test #12: dbcsr_unittest2 .......................................   Passed   22.83 sec
      Start 13: dbcsr_unittest3
13/21 Test #13: dbcsr_unittest3 .......................................   Passed   67.32 sec
      Start 14: dbcsr_unittest4
14/21 Test #14: dbcsr_unittest4 .......................................   Passed    0.56 sec
      Start 15: dbcsr_tensor_unittest
15/21 Test #15: dbcsr_tensor_unittest .................................   Passed   69.53 sec
      Start 16: dbcsr_tas_unittest
16/21 Test #16: dbcsr_tas_unittest ....................................   Passed  112.41 sec
      Start 17: dbcsr_test_csr_conversions
17/21 Test #17: dbcsr_test_csr_conversions ............................   Passed    1.19 sec
      Start 18: libsmm_acc_unittest_multiply
18/21 Test #18: libsmm_acc_unittest_multiply ..........................   Passed   13.97 sec
      Start 19: libsmm_acc_unittest_transpose
19/21 Test #19: libsmm_acc_unittest_transpose .........................   Passed   10.58 sec
      Start 20: libsmm_acc_timer_multiply-autotuned
20/21 Test #20: libsmm_acc_timer_multiply-autotuned ...................   Passed   15.37 sec
      Start 21: libsmm_acc_timer_multiply-predicted
21/21 Test #21: libsmm_acc_timer_multiply-predicted ...................   Passed    0.12 sec

@haampie haampie requested a review from dev-zero January 27, 2021 14:23
@haampie
Copy link
Contributor Author

haampie commented Jan 27, 2021

@dev-zero, this PR is done, can you review?

Copy link
Contributor

@dev-zero dev-zero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, OpenMP/CMake-remark might be solved by #406

CMakeLists.txt Outdated Show resolved Hide resolved
cmake/CompilerConfiguration.cmake Outdated Show resolved Hide resolved
docs/guide/2-user-guide/1-installation/index.md Outdated Show resolved Hide resolved
src/CMakeLists.txt Outdated Show resolved Hide resolved
src/acc/libsmm_acc/CMakeLists.txt Outdated Show resolved Hide resolved
tests/CMakeLists.txt Outdated Show resolved Hide resolved
This is to make sure we can use CUDA toolkit without requiring the
language to be enabled for the project
@alazzaro
Copy link
Member

alazzaro commented Feb 4, 2021

retest this please

@alazzaro
Copy link
Member

alazzaro commented Feb 4, 2021

OK, we are a step forward...
Now it cannot find the executable (I assume because of a wrong name):

      Start 20: /scratch/snx3000/jenkg90/jenkins-g90-DBCSR-681.intel/tests/libsmm_acc_unittest_multiply.cpp
Process not started
 /scratch/snx3000/jenkg90/jenkins-g90-DBCSR-681.intel/tests/libsmm_acc_unittest_multiply.cpp
[permission denied]
20/24 Test #20: /scratch/snx3000/jenkg90/jenkins-g90-DBCSR-681.intel/tests/libsmm_acc_unittest_multiply.cpp ...***Not Run   0.00 sec
      Start 21: /scratch/snx3000/jenkg90/jenkins-g90-DBCSR-681.intel/tests/libsmm_acc_timer_multiply.cpp
Process not started
 /scratch/snx3000/jenkg90/jenkins-g90-DBCSR-681.intel/tests/libsmm_acc_timer_multiply.cpp
[permission denied]
21/24 Test #21: /scratch/snx3000/jenkg90/jenkins-g90-DBCSR-681.intel/tests/libsmm_acc_timer_multiply.cpp ......***Not Run   0.00 sec
      Start 22: libsmm_acc_unittest_transpose.cpp
Could not find executable libsmm_acc_unittest_transpose.cpp
Looked in the following places:
libsmm_acc_unittest_transpose.cpp
libsmm_acc_unittest_transpose.cpp
Release/libsmm_acc_unittest_transpose.cpp
Release/libsmm_acc_unittest_transpose.cpp
Debug/libsmm_acc_unittest_transpose.cpp
Debug/libsmm_acc_unittest_transpose.cpp
MinSizeRel/libsmm_acc_unittest_transpose.cpp
MinSizeRel/libsmm_acc_unittest_transpose.cpp
RelWithDebInfo/libsmm_acc_unittest_transpose.cpp
RelWithDebInfo/libsmm_acc_unittest_transpose.cpp
Deployment/libsmm_acc_unittest_transpose.cpp
Deployment/libsmm_acc_unittest_transpose.cpp
Development/libsmm_acc_unittest_transpose.cpp
Development/libsmm_acc_unittest_transpose.cpp
Unable to find executable: libsmm_acc_unittest_transpose.cpp

@hfp
Copy link
Member

hfp commented Feb 4, 2021

Oof... it might as well be that there is still maintenance going on with the node I'm on 😅 both clinfo and rocm_agent_enumerator stopped listing the GPU, even though there are still entries in /dev/dri. Maybe disregard my 'can reproduce' comment.

Yes, because the benchmark cannot reproduce the problem I mentioned earlier. It must be something on the node, indeed clinfo is the right tool to check. This (clinfo) is btw also noted in the install notes for ACC/OpenCL.

Let's merge it then! There's definitely some improvements in this PR over develop, and all sources now use OpenMP when that's enabled.

The scope of this PR increased quite a bit from "simplify CMake for ROCm" to resolving #261. I will merge the PR when CI passes.

Just to be sure, did you see 70825c3 too? It was necessary for me.

I tested AMD's OpenCL stack only under macOS using a Vega56 card, which is probably quite different from Linux (maybe only the ICD is from AMD whereas the "OpenCL platform" comes from Apple; not sure though). The reason for the double quotes was for two-component typenames like "unsigned int" (I played with different foundational types wrt atomics). Anyhow, OpenCL permits typenames like "uint" or "ulong" (quotes are not necessary).

@dev-zero
Copy link
Contributor

dev-zero commented Feb 4, 2021

@hfp @haampie thank you very much for taking care of this and also debugging and properly fixing the CMake configuration!

@hfp
Copy link
Member

hfp commented Feb 4, 2021

One more note (just for the record), prefixing functions (c_dbcsr_) in both benchmark drivers (acc_bench_trans.c, and acc_bench_smm.c) dropped some code/function calls (libsmm_acc_init and libsmm_acc_finalize). I will reintroduce this in an upcoming PR (once this PR is merged).

@hfp
Copy link
Member

hfp commented Feb 4, 2021

Focus now is to fix CI. GNU and Intel run-tests point to the same issue.

@haampie
Copy link
Contributor Author

haampie commented Feb 4, 2021

I pressume I can't trigger ci, but let's see:

retest this please

otherwise can someone do it for me?

@alazzaro
Copy link
Member

alazzaro commented Feb 4, 2021

retest this please

CMakeLists.txt Outdated Show resolved Hide resolved
Comment on lines -121 to -124
#if !defined(__CUDA)
CHECK(libsmm_acc_finalize(), NULL);
#endif
CHECK(acc_finalize(), NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not worry about this!

We can adjust as we go. For example, I believe libsmm_acc may also be moved underneath of cuda or hip backend folder. Indeed, the CUDA backend depends on DBCSR library and the other way around (because of the timer stuff but also because of confusing init/fini flow). Rearding timers, DBCSR itself solved the problem with CP2K more elegant by taking a function pointer during init in order to deal with CP2K facility rather than a built-in statistic/timer.

@hfp
Copy link
Member

hfp commented Feb 4, 2021

Daint-CI seems to be missing once more...

@haampie did the previous test pass with Daint?
If so, your latest changes should not affect it, and the PR could be merged...

@haampie
Copy link
Contributor Author

haampie commented Feb 4, 2021

I've run the cmake + make by hand with the latest Sprinkle ... commit on Daint and it seems to work, so good to go then.

And yes, tests for e9bcce3 passed https://object.cscs.ch/v1/AUTH_40b5d92b316940098ceb15cf46fb815e/dbcsr-artifacts/logs/build-682/

@hfp hfp merged commit f3f60cf into cp2k:develop Feb 4, 2021
@haampie haampie deleted the simplify-rocm-cmake branch February 4, 2021 12:26
@haampie
Copy link
Contributor Author

haampie commented Feb 4, 2021

Thanks @hfp :)

hfp added a commit to hfp/dbcsr that referenced this pull request Feb 4, 2021
…ing scripts. Minor fixes after cp2k#419.

* Introduced (runtime-)verbosity level. Print device name (non-zero verbosity).
* Fixed issue (cp2k#419 (comment)).
* Renamed ACC_OPENCL_VERBOSE to ACC_OPENCL_DEBUG.
* ACC benchmark drivers: inform if no device was found.
* Improved documentation and documented ACC_OPENCL_VERBOSE.
* Introduced verbose output (time needed for kernel compilation, etc).
* tune_multiply.py: option to only rely on primary objective.
* tune_multiply.py: catch CTRL-C and save configuration.
* tune_multiply.sh: relay result code of failing script.
* tune_multiply.sh: continuation with wrapper script.
hfp added a commit to hfp/dbcsr that referenced this pull request Feb 4, 2021
@alazzaro
Copy link
Member

alazzaro commented Feb 4, 2021

I had no time to review the PR before the merge...
In any case, I left some comments.

Few other remarks here:

  1. Is the Daint-CI happy? It seems we didn't run it at the end...
  2. I see you replaced .cu with .cpp. Unfortunately, probably this will break the compilation in CP2K with the Makefile, forcing us to move to the cmake (which is good, we have to do that anyway)
  3. adding the suffix c_dbcsr_acc_ is a good choice, but I would have preferred to have change the Fortran name too...

@hfp
Copy link
Member

hfp commented Feb 4, 2021

Let's not worry, this became brittle since the scope of the PR increased a lot from just CMake for ROCm to some resolution of #261 (touching a lot of source code rather than just CMake stuff). I guess @haampie is happy to help with any other work needed on top of what we got...

@alazzaro
Copy link
Member

alazzaro commented Feb 4, 2021

@hfp definitely, another PR by @haampie to address my comments is always welcome 😄

@haampie
Copy link
Contributor Author

haampie commented Feb 4, 2021

Regarding

I see you replaced .cu with .cpp. Unfortunately, probably this will break the compilation in CP2K with the Makefile, forcing us to move to the cmake (which is good, we have to do that anyway)

I found this a bit funny, as it results in cmake switching to the device compiler for that particular source file. I've now disabled the device compiler entirely (that is, CUDA is not an enabled language, and I'm not using hip_add_library for ROCm) since all device code is compiled at runtime anyways. Wasn't aware this caused upstream issues... I could undo it and then set the language of that particular file to CXX so that it uses the right compiler.

Is the Daint-CI happy? It seems we didn't run it at the end...

I did run it by hand, only for GNU, and it was fine. But maybe good to run it on develop again?

adding the suffix c_dbcsr_acc_ is a good choice, but I would have preferred to have change the Fortran name too...

Yeah, first I just added dbcsr_* but that conflicted with the fortran function names, so I used c_dbcsr_*. If you want we can do also change Fortran names in a separate PR?

Copy link
Member

@alazzaro alazzaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding

I see you replaced .cu with .cpp. Unfortunately, probably this will break the compilation in CP2K with the Makefile, forcing us to move to the cmake (which is good, we have to do that anyway)

I found this a bit funny, as it results in cmake switching to the device compiler for that particular source file. I've now disabled the device compiler entirely (that is, CUDA is not an enabled language, and I'm not using hip_add_library for ROCm) since all device code is compiled at runtime anyways. Wasn't aware this caused upstream issues... I could undo it and then set the language of that particular file to CXX so that it uses the right compiler.

No, that's OK.

Is the Daint-CI happy? It seems we didn't run it at the end...

I did run it by hand, only for GNU, and it was fine. But maybe good to run it on develop again?

Well, it is running in the new PR with your changes, let's see how it goes 😄

adding the suffix c_dbcsr_acc_ is a good choice, but I would have preferred to have change the Fortran name too...

Yeah, first I just added dbcsr_* but that conflicted with the fortran function names, so I used c_dbcsr_*. If you want we can do also change Fortran names in a separate PR?

Yeap, name duplication Fortran-C is an issue with the old GNU compiler (fixed with the new compiler).
Currently, we ask to have the dbcsr_ prefix in Fortran only for the functions exposed in the API (check here). The reason was to avoid changing tons of internal code... But now we have a situation where we probably want to have a compatible naming between C and Fortran... In short, yes ;) but no rush...

On the other side, could you fix the two minor comments I left in the code?

cmake/CompilerConfiguration.cmake Show resolved Hide resolved
CMakeLists.txt Show resolved Hide resolved
hfp added a commit to hfp/dbcsr that referenced this pull request Feb 5, 2021
hfp added a commit that referenced this pull request Feb 8, 2021
…, minor fixes after #419 (#425)

* OpenCL-BE/LIBSMM: verbose output and documentation. Improved auto-tuning scripts. Minor fixes after #419.

* Fixed Makefile used to build acc_bench_trans/acc_bench_smm with CUDA (accommodate changes from #419).
* Fixed issue (#419 (comment)).
* More prefixes (global variables, etc) in follow-up of #419 (c_dbcsr_).
* Introduced (runtime-)verbosity level. Print device name (non-zero verbosity).
* Renamed ACC_OPENCL_VERBOSE to ACC_OPENCL_DEBUG.
* Improved documentation and documented ACC_OPENCL_VERBOSE.
* Introduced verbose output (time needed for kernel compilation, etc).
* ACC benchmark drivers: inform if no device was found.
* Warn about potentially exclusive device-mode.
* tune_multiply.py: option to only rely on primary objective.
* tune_multiply.py: catch CTRL-C and save configuration.
* tune_multiply.sh: relay result code of failing script.
* tune_multiply.sh: continuation with wrapper script.
* Enabled runtime-test OpenCL BE/LIBSMM.
* Unrelated: removed tabs from source file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants