You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It turns out that passing free variables that are not of function type as parameters, as opposed to fetching them from the closure may be beneficial for tight loops. One example appears in the benchmark mlkit-bench/benchmarks/mandelbrot.sml. Here the two variables c_re and c_im are free in the function loop3. Two optimisations will almost double the performance of the code:
passing the variables c_re and c_im as parameters (they will then be unboxed and passed in floating point registers.)
passing the constant 4.0 as a parameter to the function.
It is straightforward to verify the effect of the two optimisations; here is how the function loop3 can be hand-optimised by the programmer:
Notice that this optimisation is not in conflict with the function specialisation optimisation that specializes recursive functions that, invariantly, take other functions as parameters.
Here is the resulting assembler code - labels are slightly shortened:
Compared to the original version of mandelbrot, we have also changed the parameter count to be of type word, which turns out not to have a huge effect (only one jo __overflow instruction is saved, after the addq instruction).
The text was updated successfully, but these errors were encountered:
It turns out that passing free variables that are not of function type as parameters, as opposed to fetching them from the closure may be beneficial for tight loops. One example appears in the benchmark mlkit-bench/benchmarks/mandelbrot.sml. Here the two variables
c_re
andc_im
are free in the functionloop3
. Two optimisations will almost double the performance of the code:c_re
andc_im
as parameters (they will then be unboxed and passed in floating point registers.)4.0
as a parameter to the function.It is straightforward to verify the effect of the two optimisations; here is how the function
loop3
can be hand-optimised by the programmer:Notice that this optimisation is not in conflict with the function specialisation optimisation that specializes recursive functions that, invariantly, take other functions as parameters.
Here is the resulting assembler code - labels are slightly shortened:
Compared to the original version of mandelbrot, we have also changed the parameter
count
to be of typeword
, which turns out not to have a huge effect (only onejo __overflow
instruction is saved, after theaddq
instruction).The text was updated successfully, but these errors were encountered: