Implemented normalization to power raising method for DataGrid. #143

NikoOinonen · 2023-04-12T15:01:21Z

Fixes #136

ProkopHapala

I think it is great we have this functionality. Probaly it can be optimized in future, if it is worth it
Maybe it woudl be nice to have some performace comparison to other kernells of similar size - e.g. time of reduction vs time of FFT
There is a test, so I guess it works :)

ProkopHapala · 2023-04-12T15:16:20Z

ppafm/ocl/cl/FF.cl

+    // In the end, the value at index 0 is the total sum for this work group. Save this to
+    // global memory.
+    if (iL == 0) {
+        array_out[iGr] = shMem[0];


The output is then used as an input for next level of GPU reduction? I mean is normalizeSumReduce() called iteratively to reduce 1/256, 1/256^2, 1/256^2 ... etc ?

ProkopHapala · 2023-04-12T15:19:16Z

ppafm/ocl/cl/FF.cl

+    barrier(CLK_LOCAL_MEM_FENCE);
+    for (int s = reduceGroupSize / 2; s > 0; s /= 2) {
+        if (iL < s) {
+            shMem[iL] += shMem[iL + s];


would not it be better to first save the sum to __private variable, and only at the end save the result to __local (shared) memory so that other processors can see it? __private memory is still much faster than __local

This memory definitely needs to be visible to all workers in the group. As far as I understand, the __private variables are only visible to the current worker.

Yes, if you reduce just 2 items per and then synchronize, then you need it to be shared.

What I was thinking maybe single procesor can read multiple items (>2) and save them into private temporary variable, then write this variable into shared memory, this aloso mean less often synchronization

(I write from mobile, try to sketch some code when I get to Computer)

Unless I misunderstand, this is exactly what is happening in the first loop, where the current worker first sequentially reads several items from the global memory, sums them together, and then puts it into the shared memory for the parallel part. Finding some good balance between the two would take more testing.

ProkopHapala · 2023-04-12T15:20:17Z

ppafm/ocl/field.py

+        # First do sums of the input array within each work group...
+        cl_program.normalizeSumReduce(queue, global_size, local_size, array_in, array_out, n)
+        # ... then sum the results of the first kernel call
+        cl_program.sumSingleGroup(queue, local_size, local_size, array_out, n_groups)


Aha OK, I see, there are 2 passes...

Yes, I purposefully set the number of groups in the first pass to be at most the size of the work group so that in the second pass it can be done by a single work group.

NikoOinonen · 2023-04-13T06:33:19Z

Maybe it woudl be nice to have some performace comparison to other kernells of similar size - e.g. time of reduction vs time of FFT

Just on a quick test using that 350^3 array in the test script, the time for the sum reduction on my Intel integrated GPU is ~4 ms, and the time taken to compute the exponent is ~20ms. I think the FFTs are typically in the range of some tens of ms to some hundreds ms, depending on the scan size. So overall, I would say that the sum reduction here is pretty trivial, and probably not worth optimizing for now.

Implemented normalization to power raising method for DataGrid.

4cc6041

NikoOinonen requested a review from ProkopHapala April 12, 2023 15:01

ProkopHapala approved these changes Apr 12, 2023

View reviewed changes

NikoOinonen merged commit f3d3adc into main Apr 13, 2023

NikoOinonen deleted the density-exponent-fix branch April 13, 2023 06:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented normalization to power raising method for DataGrid. #143

Implemented normalization to power raising method for DataGrid. #143

NikoOinonen commented Apr 12, 2023

ProkopHapala left a comment

ProkopHapala Apr 12, 2023

ProkopHapala Apr 12, 2023

NikoOinonen Apr 13, 2023

ProkopHapala Apr 13, 2023 •

edited

Loading

NikoOinonen Apr 13, 2023

ProkopHapala Apr 12, 2023

NikoOinonen Apr 13, 2023

NikoOinonen commented Apr 13, 2023

Implemented normalization to power raising method for DataGrid. #143

Implemented normalization to power raising method for DataGrid. #143

Conversation

NikoOinonen commented Apr 12, 2023

ProkopHapala left a comment

Choose a reason for hiding this comment

ProkopHapala Apr 12, 2023

Choose a reason for hiding this comment

ProkopHapala Apr 12, 2023

Choose a reason for hiding this comment

NikoOinonen Apr 13, 2023

Choose a reason for hiding this comment

ProkopHapala Apr 13, 2023 • edited Loading

Choose a reason for hiding this comment

NikoOinonen Apr 13, 2023

Choose a reason for hiding this comment

ProkopHapala Apr 12, 2023

Choose a reason for hiding this comment

NikoOinonen Apr 13, 2023

Choose a reason for hiding this comment

NikoOinonen commented Apr 13, 2023

ProkopHapala Apr 13, 2023 •

edited

Loading