Clarify build strategy for heterogeneous applications (and clean all build options) #318

valassi · 2021-12-17T22:13:42Z

I am opening an issue that is a bit of catch-all container in the area of build options, of c++ vs cuda, of host vs device.

This started off from the work I want to do to integrate the bridge, with a simple test emulating fortran random/sampling to connect to cuda ME.

Take a component like rambo for instance. This can do the work on the host or on the device, even if in both cases the ME is computed on the device. The point is that I need a build of the c++/host version of rambo to link to the cuda/device version of the GPU. So far, fo rthings like rambo we only had EITHER a gcc build of the c++/host version, OR a nvcc build of the cuda/device version. Now I'd like to add also a c++/host version that I can use with the ME cuda. (All these issues will become the norm for truly heterogeneous workloads as in #85).

The easiest would be essentially to build rambo c++/host with nvcc. After all, nvcc is a c++ compiler too. In a way, it would be nice for instance to test SIMD C++ vectorization in an nvcc build. The problem is that setting simply CXX=nvcc runs into various other issues. Some may be fixed using the options to forward unknow to compiler/linker, but not all issues. There are also some -ccbin and -Xcompiler to clean up. Also, is CXXFLAGS really needed on all link instructions in the Makefile? There is quite some cleanup to do.

On the code side, there are (my fault) many different namespaces for cuda and c++. I am converging on the idea of having just two, say mg5amcOnCpu and mg5amcOnGpu: the latter is for a CUDACC (ie nvcc) build, the former for a gcc/clang build.

Also on the code side, setting things like rambo as both device and host should ensure that a single nvcc build makes it usable both on the CPU and on the GPU.

So, in principle, one could aim for

CPU-only application: mg5amcOnCPU namespace, build all using your favorite gcc/clang/ipcx compiler
CPU+GPU application (which for instance requires cudaMallocHost instead of malloc on the host): mg5amcOnGpu namespace, build all using nvcc, making sure that it delegates the c++ stuff correctly to your favorite glcc/clang/icpx compiler (so in principle you should get the same performance, even from the ME vectorization in c++)

This is not urgent, but about some of the issues it's better to think earlier rather than later..

valassi · 2021-12-18T08:09:50Z

On second thought. It does not make sense to build the c++ simd versions with nvcc anyway also because it uses a different definition of neppV and fptype_sv every time, the SIMD types are different. One probably needs to do separate builds and link them together, as in the multi simd idea #177.

Probably best to rethink the API and cleanly separate data classes and processing classes, and strip ownership of data from the kernel launchers. That is, three sets of classes: data classes (own the data), data access classes/methods (interpret the AOSOA patterns if/where required), computational classes. The distinction betweebn host and device, and between gcc vs nvcc, is slightly different in each of the cases.

…mcOnGpu for nvcc builds (madgraph5#318)

… to be both global and host (madgraph5#318)

… to be both global and host (madgraph5#318) (The code builds but RamboSamplingKernels is incomplete - and not yet linked to check_sa.cc)

…cc and runTest.cc - same performance (The code builds and runs on host/c++ and device/cuda - but not yet on host/cuda, issue madgraph5#318)

valassi · 2022-01-11T15:01:16Z

There is one important issue (I realised this while looking at #307): note that things like cxtype have currently two diferent definitions within gcc and nvcc, an dthey even live within the same typedef! This is really horrible, it's a recipe for clashes and disasters. Should clean up these namespaces.

valassi · 2022-01-12T17:50:15Z

Another random comment: note that things like FFV functions can only be EITHER global OR host in nvcc builds. There are two ways out

foresee FFV as host+device functions, and add global wrappers to call them as kernels: this looks very cumbersome (but would allow building FFVs also as host functions in nvcc)
or cleanly decide that FFV on the host is only built with gcc (as we are doing now), while FFV on the device is built as a global kernel with nvcc (as we do now)

The second option sounds much better - but then we need to link both gcc and nvcc objects together, at least for ME calculations. Probably it's what we do anyway (see also #319, using CXX=nvcc is really cumbersome).

…adgraph5#318, madgraph5#319, madgraph5#333)

valassi · 2023-07-19T10:44:51Z

Note: in MR #723 fixing #725, I improved the separation of cpu and gpu namespaces (so now it is a bit sfare to mix the two codes... though it is maybe a better idea not to do that anyway). So #723 is doing a lot of work described here...

valassi mentioned this issue Dec 17, 2021

WIP: a few fixes to use CXX=nvcc #319

Closed

valassi added a commit to valassi/madgraph4gpu that referenced this issue Dec 21, 2021

[apidata] more consistent namespaces: mg5amcOnCpu for c++ build, mg5a…

ae749b1

…mcOnGpu for nvcc builds (madgraph5#318)

valassi mentioned this issue Dec 21, 2021

API (2) MemoryBuffers classes - separate data and computing classes #320

Merged

valassi added a commit to valassi/madgraph4gpu that referenced this issue Dec 21, 2021

[apirambo] First version of RamboSamplingKernels - must now fix rambo…

ba1af55

… to be both global and host (madgraph5#318)

valassi mentioned this issue Dec 21, 2021

API (3) Rambo kernel launchers #321

Merged

valassi mentioned this issue Dec 24, 2021

Workplan for January 2022 #323

Closed

valassi mentioned this issue Jan 6, 2022

API(5) MatrixElement kernel launcher (and ME buffers and mememory access) #324

Merged

valassi added a commit to valassi/madgraph4gpu that referenced this issue Jan 26, 2022

[cpp17] improve a comment about nvcc builds of c++ code with c++17 (m…

4739f55

…adgraph5#318, madgraph5#319, madgraph5#333)

valassi mentioned this issue Feb 1, 2022

Shared libraries + Bridge + Cleaner Makefiles #367

Merged

This was referenced May 29, 2023

Build and test with HIP on AMD GPUs #311

Closed

add support for nvc++ #531

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify build strategy for heterogeneous applications (and clean all build options) #318

Clarify build strategy for heterogeneous applications (and clean all build options) #318

valassi commented Dec 17, 2021 •

edited

Loading

valassi commented Dec 18, 2021

valassi commented Jan 11, 2022

valassi commented Jan 12, 2022

valassi commented Jul 19, 2023

Clarify build strategy for heterogeneous applications (and clean all build options) #318

Clarify build strategy for heterogeneous applications (and clean all build options) #318

Comments

valassi commented Dec 17, 2021 • edited Loading

valassi commented Dec 18, 2021

valassi commented Jan 11, 2022

valassi commented Jan 12, 2022

valassi commented Jul 19, 2023

valassi commented Dec 17, 2021 •

edited

Loading