You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many thanks to the developers of this amazing package! I thought I would submit a list of gotchas and issues I ran into trying to get my kernel to work. Most of these are known, and I will try to provide links where appropriate.
I realize this may be more appropriate in the documentation, but unfortunately it is a very fluid situation and I couldn't figure out where I should contribute my usage notes.
The power operator ^ does not work in any kernel code. Use CUDAnative.pow()
Ordinary array constructors do not work, ie
[1, 2, 3], or Array([1,2,3]) etc.
Theoretically you could use StaticArrays() and Marrays, Mvectors etc. Unfortunately a bug may cause intermittent memory faults see JuliaGPU/CUDAnative.jl#340JuliaGPU/CuArrays.jl#278 . Hope to see this resolved soon, though the current #master branch for this package does not compile on my linux mint 19 machine.
What does work are tuples, for small arrays, ie (arr[1,2], arr[2,2], arr[3,2]) instead of arr[:,2] . I'm also seeing MUCH faster performance using tuples than Mvectors for this sort of thing.
reshape() does not work and you can not do array slicing in a kernel. However you can use view().
ie instead of arr[:,j] do view(arr, :, j)
Instead of sqrt() use CUDAnative.sqrt() .
There is no real support for complex exponentials. So instead of doing exp(j * theta) do CUDAnative.cos(theta) + 1im * CUDAnative.sin(theta).
@device_code_warntype in front of your @cuda call is quite useful for ironing out type instabilities, which must be quashed in your kernel code.
The stack traces for kernel code that does not compile are often not quite right. I think this is going to be resolved shortly. JuliaGPU/CUDAnative.jl#306
Since the affordable Nvidia GPUs have nerfed support for double's one often desires to use Float32 for your floating point arrays. As such remember to properly create Float32 and avoid mixing Float64 with Float32. The same applies to complex types. Also you can force 32 bit floating point constants using a syntax like 1.0f0. This will avoid a bunch of spurious type instabilities.
The text was updated successfully, but these errors were encountered:
Many thanks to the developers of this amazing package! I thought I would submit a list of gotchas and issues I ran into trying to get my kernel to work. Most of these are known, and I will try to provide links where appropriate.
I realize this may be more appropriate in the documentation, but unfortunately it is a very fluid situation and I couldn't figure out where I should contribute my usage notes.
The power operator ^ does not work in any kernel code. Use CUDAnative.pow()
Ordinary array constructors do not work, ie
[1, 2, 3], or Array([1,2,3]) etc.
Theoretically you could use StaticArrays() and Marrays, Mvectors etc. Unfortunately a bug may cause intermittent memory faults see JuliaGPU/CUDAnative.jl#340 JuliaGPU/CuArrays.jl#278 . Hope to see this resolved soon, though the current #master branch for this package does not compile on my linux mint 19 machine.
What does work are tuples, for small arrays, ie
(arr[1,2], arr[2,2], arr[3,2])
instead ofarr[:,2]
. I'm also seeing MUCH faster performance using tuples than Mvectors for this sort of thing.reshape() does not work and you can not do array slicing in a kernel. However you can use view().
ie instead of
arr[:,j]
doview(arr, :, j)
Instead of sqrt() use CUDAnative.sqrt() .
There is no real support for complex exponentials. So instead of doing
exp(j * theta)
doCUDAnative.cos(theta) + 1im * CUDAnative.sin(theta)
.@device_code_warntype
in front of your@cuda
call is quite useful for ironing out type instabilities, which must be quashed in your kernel code.The stack traces for kernel code that does not compile are often not quite right. I think this is going to be resolved shortly. JuliaGPU/CUDAnative.jl#306
Since the affordable Nvidia GPUs have nerfed support for double's one often desires to use Float32 for your floating point arrays. As such remember to properly create Float32 and avoid mixing Float64 with Float32. The same applies to complex types. Also you can force 32 bit floating point constants using a syntax like 1.0f0. This will avoid a bunch of spurious type instabilities.
The text was updated successfully, but these errors were encountered: