-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify build strategy for heterogeneous applications (and clean all build options) #318
Comments
On second thought. It does not make sense to build the c++ simd versions with nvcc anyway also because it uses a different definition of neppV and fptype_sv every time, the SIMD types are different. One probably needs to do separate builds and link them together, as in the multi simd idea #177. Probably best to rethink the API and cleanly separate data classes and processing classes, and strip ownership of data from the kernel launchers. That is, three sets of classes: data classes (own the data), data access classes/methods (interpret the AOSOA patterns if/where required), computational classes. The distinction betweebn host and device, and between gcc vs nvcc, is slightly different in each of the cases. |
…mcOnGpu for nvcc builds (madgraph5#318)
… to be both global and host (madgraph5#318)
… to be both global and host (madgraph5#318) (The code builds but RamboSamplingKernels is incomplete - and not yet linked to check_sa.cc)
…cc and runTest.cc - same performance (The code builds and runs on host/c++ and device/cuda - but not yet on host/cuda, issue madgraph5#318)
…cc and runTest.cc - same performance (The code builds and runs on host/c++ and device/cuda - but not yet on host/cuda, issue madgraph5#318)
There is one important issue (I realised this while looking at #307): note that things like cxtype have currently two diferent definitions within gcc and nvcc, an dthey even live within the same typedef! This is really horrible, it's a recipe for clashes and disasters. Should clean up these namespaces. |
Another random comment: note that things like FFV functions can only be EITHER global OR host in nvcc builds. There are two ways out
The second option sounds much better - but then we need to link both gcc and nvcc objects together, at least for ME calculations. Probably it's what we do anyway (see also #319, using CXX=nvcc is really cumbersome). |
I am opening an issue that is a bit of catch-all container in the area of build options, of c++ vs cuda, of host vs device.
This started off from the work I want to do to integrate the bridge, with a simple test emulating fortran random/sampling to connect to cuda ME.
Take a component like rambo for instance. This can do the work on the host or on the device, even if in both cases the ME is computed on the device. The point is that I need a build of the c++/host version of rambo to link to the cuda/device version of the GPU. So far, fo rthings like rambo we only had EITHER a gcc build of the c++/host version, OR a nvcc build of the cuda/device version. Now I'd like to add also a c++/host version that I can use with the ME cuda. (All these issues will become the norm for truly heterogeneous workloads as in #85).
The easiest would be essentially to build rambo c++/host with nvcc. After all, nvcc is a c++ compiler too. In a way, it would be nice for instance to test SIMD C++ vectorization in an nvcc build. The problem is that setting simply CXX=nvcc runs into various other issues. Some may be fixed using the options to forward unknow to compiler/linker, but not all issues. There are also some -ccbin and -Xcompiler to clean up. Also, is CXXFLAGS really needed on all link instructions in the Makefile? There is quite some cleanup to do.
On the code side, there are (my fault) many different namespaces for cuda and c++. I am converging on the idea of having just two, say mg5amcOnCpu and mg5amcOnGpu: the latter is for a CUDACC (ie nvcc) build, the former for a gcc/clang build.
Also on the code side, setting things like rambo as both device and host should ensure that a single nvcc build makes it usable both on the CPU and on the GPU.
So, in principle, one could aim for
This is not urgent, but about some of the issues it's better to think earlier rather than later..
The text was updated successfully, but these errors were encountered: