You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So, sum(x, 1) is comparable to my implementation, but sum(x, 2) is 5x slower. The key problem is that it uses a perhaps easier approach -- simply doing reduction for each slide, instead of the cache-friendily approach that organizes the computation according to the memory layout.
Many reductions can be considered as recursively application of a binary operation can be implemented in such cache friendly way (such as, max, min, prod). There are also other functions can benefit from this (e.g. mean, var, and std).
The text was updated successfully, but these errors were encountered:
I tried to give another example at GIST: https://gist.github.com/lindahua/4967432#file-ju_reduc-jl
Here is the benchmark result on my Mac:
So,
sum(x, 1)
is comparable to my implementation, butsum(x, 2)
is 5x slower. The key problem is that it uses a perhaps easier approach -- simply doing reduction for each slide, instead of the cache-friendily approach that organizes the computation according to the memory layout.Many reductions can be considered as recursively application of a binary operation can be implemented in such cache friendly way (such as,
max
,min
,prod
). There are also other functions can benefit from this (e.g.mean
,var
, andstd
).The text was updated successfully, but these errors were encountered: