-
Notifications
You must be signed in to change notification settings - Fork 12
Codeobject kernels
In cpp_standalone
, codeobjects define a single function that runs that
codeobject. In that function, there exists a {{pointer_lines}}
template
variables in the templates in order to add keywords to pointer definitions
(e.g. __restrict__
in order to get some compiler optimizations). This looks
like this:
In the template file,
{{pointer_lines}}
is replaced by multiple lines of the form
double __restrict__ _ptr_some_variable = some_variable;
double ptr_other_not_restrict_var = other_not_restrict_var;
For cuda_standalone
, the codeobjects are a bit more complicated. There is a
host function wihtout parameters (just as in the cpp_standalone
files). But
this host function calls a kernel function that runs the code on the GPU.
For cuda_standalone
, we don't use the {{pointer_lines}}
template variable
(it is just ignored I believe). Instead we define the following variables in
the templates, which are replaced with code lines in brian2cuda/device.py.
-
%CONSTANTS%
This is also defined in
cpp_standalone
codeobjects and is used in the same way: Constant variables needed in the host code of thecuda_standalone
kernel are defined here. In thestatemonitor.cu
kernel, this is for example used to extract the raw pointer variable from a thrust device vector object:///// CONSTANTS ///// double* const _array_statemonitor_t = thrust::raw_pointer_cast(&dev_dynamic_array_statemonitor_t[0]);
-
%HOST_PARAMETERS%
These are function arguments passed to the kernel called in the host code. For the statemonitor example, this would be replaced by:
kernel_statemonitor_codeobject<<<..., ...>>>( ..., // parameters that are hard coded in the template ///// HOST_PARAMETERS ///// _array_statemonitor_t, );
-
%DEVICE_PARAMETERS%
These are the paramters in the device kernel definition. For the statemonitor, it looks like this:
__global__ void kernel_statemonitor_codeobject( ..., // parameters that are hard coded in the tempalte ///// DEVICE_PARAMETERS ///// double* par__array_statemonitor_t, ) { // kernel implementation }
-
%KERNEL_PARAMETERS%
These are the
cuda_standalone
equivalent tocpp_standalone
s{{pointer_lines}}
. Here, we can add keywords such as__restrict
(the cuda version of__restrict__
). Currently, we are not doing that yet (see #53). Instead we are just renaming the pointer such that they are compatible with the pointer names used in{{scalar_code}}
and{{vector_code}}
, which is the same as incpp_standalone
. For the statemonitor, it looks like this:///// KERNEL_PARAMETERS ///// double* _ptr_array_statemonitor_t = par__array_statemonitor_t;
If t
is used in the {{scalar_code}}
or {{vector_code}}
in the
template, it will be called _ptr_array_statemonitor_t
. Currently we do
copy a bunch of variable like this even when they are not used. This is
actually the case f _ptr_array_statemonitor_t
. I once tested performance
when removing the unnecessary KERNEL_PARAMETERS
and didn't see any
improvement. So they might just get optimized out by the compiler. But there
might be some effect on register usage in the kernel that could be
investigated, see #69.