-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement HIP/ROCm build rules and test packages #40627
Comments
assign core |
assign heterogeneous |
New categories assigned: heterogeneous,core @fwyzard,@Dr15Jones,@smuzaffar,@makortel,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks |
A new Issue was created by @fwyzard Andrea Bocci. @Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
@smuzaffar could you try to set up these rules ? |
@smuzaffar thank you for cms-sw/cmsdist#8268 and cms-sw/cmsdist#8271, I've experimented a bit with them and I have some follow up:
If you could make these changes, I can experiment with them in a couple of days. |
This is fixed in ROCm 5.2.x and later. |
@fwyzard , I have few questions
|
yes: for the alpaka/rocm backend,
yes (see above)
I think we should never have both |
@fwyzard , as we do not distribute |
I know the ROCm distribution is big - that's why we are using symlinks in the first place - but I don't really know how we can trim it. We definitely do not use all the components and libraries that it provides, but I'm not sure what we should include and what we can drop. |
correct but I am not sure if grid sites do a auto mount or explicit mount of cvmfs repos. Sites with explicit mount have to explicitly include patatrack.cern.ch in mount list. May be add a rocm-runtime package which can just copy all the runtime libs and ship it along with cmssw? |
Mhm, I don't think this aspect is terribly urgent, because so far I'm not aware of any site with AMD GPUs. |
Also what about system rocm installation? I assume system with AMD GPU will have some local libs too. Do we also need to do some dynamic env setting to use system lib or libs from cms externals? |
I have absolutely no idea about how to deal with the system driver and runtime... this too goes into the "let's get more experience part". Of course if anybody with experience with ROCm and AMD GPUs wants to help, that would be great :-) |
The |
+heterogeneous |
@smuzaffar Do we have anything to add to this issue from |
humm, I thought build rules were not complete for alpaka/rocm backend. @fwyzard did you try enabling the rocm backend and did it work? If yes then we need to update the build rules to enable rocm by default |
Mhm, my bad, I though we had already enabled them by default... instead, I have the change in my local area diff --git a/Projects/CMSSW/Self.xml b/Projects/CMSSW/Self.xml
index 24bc3a9..154617f 100644
--- a/Projects/CMSSW/Self.xml
+++ b/Projects/CMSSW/Self.xml
@@ -25,7 +25,7 @@
<flags CHECK_PRIVATE_HEADERS="1"/>
<flags SCRAM_TARGETS="haswell sandybridge nehalem"/>
<flags OVERRIDABLE_FLAGS="CPPDEFINES CXXFLAGS FFLAGS CFLAGS CPPFLAGS LDFLAGS CUDA_FLAGS CUDA_LDFLAGS LTO_FLAGS ROCM_FLAGS ROCM_LDFLAGS"/>
- <flags ALPAKA_BACKENDS="cuda serial"/>
+ <flags ALPAKA_BACKENDS="cuda rocm serial"/>
<flags CODE_CHECK_ALPAKA_BACKEND="serial"/>
<flags ENABLE_LTO="@ENABLE_LTO@"/>
</client> but I never made a PR with it. |
@fwyzard , I just tested locally and surprised that alpaka/rocm backend works (at least builds). It generate a lot warnings though
|
Yes, that's something I'm following up inside Alpaka. |
@smuzaffar some of the warnings can be fixed inside Alpaka (done in alpaka-group/alpaka#1914), while some are caused by the HIP headers. That is, changing
to
? |
I think I found it: <flags SYSTEM_INCLUDE="1"/> |
yes |
I think that with the set of PRs that entered CMSSW_13_0_0 and 13_1_0_pre1, everything is in pace and working. |
+heterogeneous |
+core |
This issue is fully signed and ready to be closed. |
@cmsbuild, please close |
Here is a first attempt at defining the build rules to build HIP files for the ROCm backend.
I am confident that these are not final, but having something to play with will likely help me make progress towards more robust rules.
affected files
These rules apply to
*.hip.cc
files undersrc
,plugins
,test
, andbin
directoriesalpaka/*.dev.cc
under the same directories, when building an Alpaka-based package for the ROCm backendcompilation
file.hip.cc
should be compiled byhipcc
using the variables defined in therocm
scram tool:where
${ROCM_FLAGS}
are the ROCm specific flags for the target; mostly they are defined by therocm
tool, but each target could add or remove flags as usual;${CPPFLAGS}
are the preprocessor flags (-D...
,-U...
,-I...
, etc.) for the target;${CXXFLAGS}
are the c++ compiler flags for the target, filtered out by the${ROCM_HOST_REM_CXXFLAGS}
variable.Note: as
hipcc
supports the same options as the correspondingclang
compiler, some of thegcc
-specific options should be removed;${ROCM_HOST_REM_CXXFLAGS}
should match the${REM_CXXFLAGS}
used byllvm-cxxcompiler
; it's possible that the two have diverged since we defined the tool and more (or less) options may need to be removed, for example those related to the LTO builds.Note 2: currently
rocm
has-fno-gpu-rdc
, but we should change it to-fgpu-rdc
.linking a device library
When building a library (not a plugin or binary), all the
*.hip.cc.o
files should be bundled into a static library, for use by other libraries and plugins that need to link to ROCm device code.For example, one library could define a
__device__
function, and a plugin could define a__global__
kernel that calls that function.This is similar to the
libTARGET_nv.a
static library we build for CUDA packages, but (for the moment) it will contain also the compiled host code. If it works, I'll look into keeping only the device code at a later time.If a library, plugin or binary with any
*.hip.cc
files "uses" a library with*.hip.cc
files, that library'slibTARGET_rocm.a
should be linked in (see next part).I think this is what are doing for the CUDA
libTARGET_nv.a
library as well ?linking a host library or binary
When building a library, plugin or binary that contains
*.hip.cc
files, we should do the linking withhipcc
instead ofg++
:it should link any dependency's
_rocm.a
library:and it should use the target's flag for building a shared library or binary.
Note: with respect to the CUDA rules, using
hipcc --hip-link
avoids the need for the extra linking step.If we settle on this approach, we may be able to simplify the CUDA rules in a similar way.
The text was updated successfully, but these errors were encountered: