-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
type stable strides
#16541
type stable strides
#16541
Conversation
Would |
👍 If these are performance-critical, then you might need some |
I'll try to add an What is the best way to check if |
The best way is to test performance: @nanosoldier |
I should add that it's always possible that the benchmarks don't really depend on using BenchmarkTools
function benchstrides(f, A)
s = f(A)
sum(s)
end
A = rand(3,3,3,3,3,3,3);
# I'm on master, and I named your new function "newstrides"
julia> @benchmark benchstrides(strides, A)
Trial(765.00 ns)
julia> @benchmark benchstrides(newstrides, A)
Trial(3.73 μs) Rather surprisingly, it's slower, and adding You can also use Note that the apparently-unstable version of strides could now be written strides{T,N}(a::AbstractArray{T,N}) = ntuple(i->stride(a,i), Val{N}) On |
Weird. It's calls to Even though different by only a tiny bit, this is the fastest of them all: newstrides3{T}(a::AbstractArray{T,0}) = ()
newstrides3(a::AbstractArray) = _strides3(a, (1,))
_strides3{_,N}(a::AbstractArray{_,N}, s::NTuple{N,Int}) = s
_strides3{_,N,M}(a::AbstractArray{_,N}, s::NTuple{M,Int}) = _strides3(a, (s...,s[M]*size(a, M))) I guess I've just gotten in the habit of passing the array, and didn't realize how bad it was to call |
Array benchmarks are unfortunately broken on master right now due to some linalg bug(s). I'm planning a patch to BenchmarkTools that will allow other benchmarks to run even if some error out, but until that patch lands or master is fixed, array benchmarks will fail to execute. |
|
||
strides(a::AbstractArray) = ntuple(i->stride(a,i), ndims(a))::Dims | ||
stride(a::AbstractArray, d) = d <= 1 ? 1 : size(a, d-1)*stride(a, d-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
julia> str(a::AbstractArray, d) = d <= 1 ? 1 : size(a, d-1)*str(a, d-1)
julia> a = rand(2,2,2,2);
julia> f(n, a) = for i=1:n;stride(a,4);end
julia> g(n, a) = for i=1:n;str(a,4);end
julia> @time f(10000000, a);
0.055935 seconds (4 allocations: 160 bytes)
julia> @time g(10000000, a);
0.096278 seconds (4 allocations: 160 bytes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it is still slower, though I hadn't thought it was a factor of two. I'll change it back to the old more lengthy definition. I guess it was to be expected that loops win over recursion. I just wanted to mimick the strides
definition where recursive is the way to go to construct tuples in a type stable manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, the first one evaluates in the same time, the second one in about 0.06 seconds, so about 20 % slower. The fastest one, however, seems to be the following definitition
mystride2(a::AbstractArray, d) = d < 1 ? 1 : d > ndims(a) ? length(a) : strides(a)[d]
where strides
is definitely already the one from this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that's probably only true if you are probing the last stride, otherwise you are calculating too much. I just don't have that much difference between f
and g
, but it is true that f
(the old definition of strides
) is slightly faster.
I finally got some time to update this. If the tests turn green (hopefully), is there anything that remains to be done? What about documentation? (I haven't created any PR anymore since the new doc system is in place so I am a bit outdated on that). |
It looks like these are actually documented already, is anything that you're changing here externally visible to users? It may be worth taking the opportunity to move the docstrings out of |
_strides{T,N}(out::NTuple{N}, A::AbstractArray{T,N}) = out | ||
function _strides{M,T,N}(out::NTuple{M}, A::AbstractArray{T,N}) | ||
@_inline_meta | ||
_strides((out..., out[M]*size(a, M)), A) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
capital A
?
Thanks for the review and for catching the bug. I changed to capital A in the last minute since I thought that was more common in Base, though both seem to be used. Nothing needs to be changed to the docs, but I add the docstrings inline tonight. |
Ok I think this completes this PR, unless there are other comments |
if i > ndims(a) | ||
return length(a) | ||
end | ||
i > ndims(a) && return length(a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes the code coverage reporting a little less specific in terms of which branches are getting tested, but I guess we've got enough other issues with coverage right now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should actively change code to a format that generates worse coverage specificity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good time to bump #11802?
LGTM, merge if green I think. |
Not sure what happened with Appveyor? |
Thanks |
This PR introduces a type stable definition for
strides
, and a companion replacement forstride
that looks nicer (though that's up to interpretation and debate). I think / hope that this follows the (non@generated function
) programming style of some of the other array changes introduced by @timholy and @mbauman .