Replies: 1 comment
-
Hi, $> ./a.out
[GPU, direct] a[0] == 12.000000
[GPU, direct] a[nb_cell-1] == 12.000000
[GPU, direct] b[0] == 437.000000
[GPU, direct] b[nb_cell-1] == 437.000000
[GPU, backward] da[0] == 72.000000
[GPU, backward] da[nb_cell-1] == 72.000000
[GPU, backward] db[0] == 0.000000
[GPU, backward] db[nb_cell-1] == 0.000000 I just adapted the first version with shadow parameter for |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
Sorry to re-open the subject (cf. #1565), but I try now to upgrade the CPU version above to a GPU's one for the backward mode.
CPU implementation
Here, you will find the updated above CPU example for forward and backward mode with functor instead of lambda function : https://fwd.gymni.ch/zEqXXH
The approach wrote on the CPU example is really generic; I now can defined any functor and it will be differentiated by calling :
for the forward mode or
for the backward mode.
GPU implementation
I really does not known yet if it will be possible to use this
defined loop+functor
approach on GPU.Acording to the documentation on GPU, a naive transcription of the CPU version into CUDA will not work, because of the call of
loop_on_dof_gpu<<<grd_topo, blk_topo>>>(nb_cell, f);
: https://fwd.gymni.ch/5CWooaTo do it right, I probably need to use Enzyme’s custom derivative registration to define a custom forward and reverse pass for the wrapper function of
compute_gpu
as follows : https://fwd.gymni.ch/murZk4NB: In fact, I'm having trouble creating the appropriate environment (
llvm-14+cuda-11.2+clang-14+enzyme-0.81
), so i do not have any opportunity to test the last code.Am I on the right track to being able to calculate the gradient with Enzyme on GPU ?
If I am completly wrong, can you highlight to me if this approach with a
defined loop+functor
has any chance of working on GPU ?If it is good :
void* __enzyme_register_gradient_compute_gpu_...[3] = {...}
which would be perfect !!!__enzyme_fwddiff<void>((void*)compute_gpu<SimpleFunctor>, nb_cell, enzyme_dup, (void*)&f_gpu, (void*)&df_gpu);
as done on CPU ?Thanks a lot for your precious help.
Best regards
Beta Was this translation helpful? Give feedback.
All reactions