-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify the CMake ROCm detection #419
Conversation
Can one of the admins verify this patch? |
fa1a62b
to
c18ebce
Compare
Codecov Report
@@ Coverage Diff @@
## develop #419 +/- ##
=========================================
- Coverage 63.1% 63.1% -0.1%
=========================================
Files 86 86
Lines 25625 25612 -13
=========================================
- Hits 16190 16174 -16
- Misses 9435 9438 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@dev-zero thanks for the early suggestions. This PR solves the issue where the device compiler's OpenMP is used for linking, which doesn't really make sense. Right now it's always using the host compiler for OpenMP (and things simplify, cause we can just use |
Do you have opinions about including FindHIP module inside the sources, or should I bring back the HIP_PATH variable again for locating an external FindHIP module? Currently it's required to find this HIP module through |
ugh, is there any chance upstream is gonna fix this mess within reasonable time? |
I'll drop the FindHIP module from here. |
pre-commit install --install-hooks should save you some headaches ;-) unfortunately this relies on python, which gives me headaches too :D it complains about python 3.6+ to be installed, but python 3.8 is the default on my desktop. |
interesting, what does |
Oh, I installed it with |
Ok, I just tried compiling everything (rocm ecosystem & dbcsr) from source through spack, and I'm hitting
Apparently the openmp issue is not entirely solved. |
Since the verbose makefile flag is set you should be able to check the compiler invocations in the log... |
c9a1294
to
765d009
Compare
Hm, I'm not really getting it to work. Also without OpenMP the unit tests fail:
Seems like it is thrown whenever a jit program is re-compiled: https://github.com/ROCm-Developer-Tools/HIP/blob/rocm-3.9.0/src/hiprtc.cpp#L501 |
765d009
to
1259a0e
Compare
Boy, that was no fun. I should have looked in the issues too, cause #261 is related. Previously when using just the host compiler (gcc) for openmp everywhere, I got this runtime error:
Turns GNU's With the current patches I can finally build dbcsr for ROCm using only the host compiler's implementation of OpenMP, which makes sense to me. Previously I would end up with both libomp.so and libgomp.so in the dbcsr lib. |
Remaining issues:
|
@dev-zero, I realized just now that all device code is 100% jitted, is that correct? In that case we can drop hipcc / the device compiler altogether from cmake? Or are there cases in which kernels are compiled ahead of time too? |
ping @mtaillefumier |
So, the TL;DR if this PR is:
|
Tests are passing on Ault btw:
|
@dev-zero, this PR is done, can you review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, OpenMP/CMake-remark might be solved by #406
This is to make sure we can use CUDA toolkit without requiring the language to be enabled for the project
retest this please |
OK, we are a step forward...
|
Yes, because the benchmark cannot reproduce the problem I mentioned earlier. It must be something on the node, indeed
The scope of this PR increased quite a bit from "simplify CMake for ROCm" to resolving #261. I will merge the PR when CI passes.
I tested AMD's OpenCL stack only under macOS using a Vega56 card, which is probably quite different from Linux (maybe only the ICD is from AMD whereas the "OpenCL platform" comes from Apple; not sure though). The reason for the double quotes was for two-component typenames like "unsigned int" (I played with different foundational types wrt atomics). Anyhow, OpenCL permits typenames like "uint" or "ulong" (quotes are not necessary). |
One more note (just for the record), prefixing functions ( |
I pressume I can't trigger ci, but let's see: retest this please otherwise can someone do it for me? |
retest this please |
#if !defined(__CUDA) | ||
CHECK(libsmm_acc_finalize(), NULL); | ||
#endif | ||
CHECK(acc_finalize(), NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not worry about this!
We can adjust as we go. For example, I believe libsmm_acc may also be moved underneath of cuda or hip backend folder. Indeed, the CUDA backend depends on DBCSR library and the other way around (because of the timer stuff but also because of confusing init/fini flow). Rearding timers, DBCSR itself solved the problem with CP2K more elegant by taking a function pointer during init in order to deal with CP2K facility rather than a built-in statistic/timer.
Daint-CI seems to be missing once more... @haampie did the previous test pass with Daint? |
I've run the And yes, tests for e9bcce3 passed https://object.cscs.ch/v1/AUTH_40b5d92b316940098ceb15cf46fb815e/dbcsr-artifacts/logs/build-682/ |
Thanks @hfp :) |
…ing scripts. Minor fixes after cp2k#419. * Introduced (runtime-)verbosity level. Print device name (non-zero verbosity). * Fixed issue (cp2k#419 (comment)). * Renamed ACC_OPENCL_VERBOSE to ACC_OPENCL_DEBUG. * ACC benchmark drivers: inform if no device was found. * Improved documentation and documented ACC_OPENCL_VERBOSE. * Introduced verbose output (time needed for kernel compilation, etc). * tune_multiply.py: option to only rely on primary objective. * tune_multiply.py: catch CTRL-C and save configuration. * tune_multiply.sh: relay result code of failing script. * tune_multiply.sh: continuation with wrapper script.
…(accommodate changes from cp2k#419).
I had no time to review the PR before the merge... Few other remarks here:
|
Regarding
I found this a bit funny, as it results in cmake switching to the device compiler for that particular source file. I've now disabled the device compiler entirely (that is, CUDA is not an enabled language, and I'm not using hip_add_library for ROCm) since all device code is compiled at runtime anyways. Wasn't aware this caused upstream issues... I could undo it and then set the language of that particular file to CXX so that it uses the right compiler.
I did run it by hand, only for GNU, and it was fine. But maybe good to run it on develop again?
Yeah, first I just added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding
I see you replaced .cu with .cpp. Unfortunately, probably this will break the compilation in CP2K with the Makefile, forcing us to move to the cmake (which is good, we have to do that anyway)
I found this a bit funny, as it results in cmake switching to the device compiler for that particular source file. I've now disabled the device compiler entirely (that is, CUDA is not an enabled language, and I'm not using hip_add_library for ROCm) since all device code is compiled at runtime anyways. Wasn't aware this caused upstream issues... I could undo it and then set the language of that particular file to CXX so that it uses the right compiler.
No, that's OK.
Is the Daint-CI happy? It seems we didn't run it at the end...
I did run it by hand, only for GNU, and it was fine. But maybe good to run it on develop again?
Well, it is running in the new PR with your changes, let's see how it goes 😄
adding the suffix c_dbcsr_acc_ is a good choice, but I would have preferred to have change the Fortran name too...
Yeah, first I just added
dbcsr_*
but that conflicted with the fortran function names, so I usedc_dbcsr_*
. If you want we can do also change Fortran names in a separate PR?
Yeap, name duplication Fortran-C is an issue with the old GNU compiler (fixed with the new compiler).
Currently, we ask to have the dbcsr_
prefix in Fortran only for the functions exposed in the API (check here). The reason was to avoid changing tons of internal code... But now we have a situation where we probably want to have a compatible naming between C and Fortran... In short, yes ;) but no rush...
On the other side, could you fix the two minor comments I left in the code?
…, minor fixes after #419 (#425) * OpenCL-BE/LIBSMM: verbose output and documentation. Improved auto-tuning scripts. Minor fixes after #419. * Fixed Makefile used to build acc_bench_trans/acc_bench_smm with CUDA (accommodate changes from #419). * Fixed issue (#419 (comment)). * More prefixes (global variables, etc) in follow-up of #419 (c_dbcsr_). * Introduced (runtime-)verbosity level. Print device name (non-zero verbosity). * Renamed ACC_OPENCL_VERBOSE to ACC_OPENCL_DEBUG. * Improved documentation and documented ACC_OPENCL_VERBOSE. * Introduced verbose output (time needed for kernel compilation, etc). * ACC benchmark drivers: inform if no device was found. * Warn about potentially exclusive device-mode. * tune_multiply.py: option to only rely on primary objective. * tune_multiply.py: catch CTRL-C and save configuration. * tune_multiply.sh: relay result code of failing script. * tune_multiply.sh: continuation with wrapper script. * Enabled runtime-test OpenCL BE/LIBSMM. * Unrelated: removed tabs from source file.
This is work in progress and untested atm, but opening a pr anyways to get early feedback.
In SIRIUS and SpFFT we had some more success with
find_packge(...)
to locate ROCm libraries, even when using spack to build the ROCm packages.A spack install of ROCm is generally a useful way to check your cmake, since it does not have AMD's favorite directory
/opt/rocm
, nor does it have llvm installed in$ROCM_PATH/llvm
, etc.Edit: this is done