type stable `strides` #16541

Jutho · 2016-05-23T21:27:30Z

This PR introduces a type stable definition for strides, and a companion replacement for stride that looks nicer (though that's up to interpretation and debate). I think / hope that this follows the (non @generated function) programming style of some of the other array changes introduced by @timholy and @mbauman .

tkelman · 2016-05-23T23:18:28Z

Would @inferred be able to test the type stability here?

timholy · 2016-05-24T09:39:03Z

👍

If these are performance-critical, then you might need some @_inline_metas to avoid the splat penalty.

Jutho · 2016-05-24T09:46:48Z

I'll try to add an @inferred test later today.

What is the best way to check if @_inline_meta is necessary? I have not been closely following all the developments recently. Is this always required with these kind of constructions (involving splatting and recursive tuple construction) or only to force inlining when the default inline detection mechanism rules otherwise?

timholy · 2016-05-24T10:33:07Z

The best way is to test performance: @nanosoldier runbenchmarks("array", vs = ":master")

timholy · 2016-05-24T11:19:29Z

I should add that it's always possible that the benchmarks don't really depend on strides, so you might run your own. If they pick up problems that the existing benchmarks miss, it's pretty easy to add them to BaseBenchmarks.jl.

using BenchmarkTools

function benchstrides(f, A)
    s = f(A)
    sum(s)
end

A = rand(3,3,3,3,3,3,3);

# I'm on master, and I named your new function "newstrides"
julia> @benchmark benchstrides(strides, A)
Trial(765.00 ns)

julia> @benchmark benchstrides(newstrides, A)
Trial(3.73 μs)

Rather surprisingly, it's slower, and adding @inline doesn't seem to make it any faster.

You can also use @code_llvm to monitor the number of apparent stack allocations, but I'm not sure it's actually reflective of the actual memory usage once JIT-compiled. In the end, performance is really the only thing that matters.

Note that the apparently-unstable version of strides could now be written

strides{T,N}(a::AbstractArray{T,N}) = ntuple(i->stride(a,i), Val{N})

On benchstrides, this rates as the fastest by far (it's more than 4x faster than the version in master, and more than 20x faster than the one in this PR). I'm rather confused by this.

timholy · 2016-05-24T11:30:50Z

Weird. It's calls to asize_from that nukes your implementation. CC @JeffBezanson (afd4eb0)

Even though different by only a tiny bit, this is the fastest of them all:

newstrides3{T}(a::AbstractArray{T,0}) = ()
newstrides3(a::AbstractArray) = _strides3(a, (1,))
_strides3{_,N}(a::AbstractArray{_,N}, s::NTuple{N,Int}) = s
_strides3{_,N,M}(a::AbstractArray{_,N}, s::NTuple{M,Int}) = _strides3(a, (s...,s[M]*size(a, M)))

I guess I've just gotten in the habit of passing the array, and didn't realize how bad it was to call size.

timholy · 2016-05-24T12:01:10Z

See #16553.

@jrevels, looks like a nanosoldier error.

jrevels · 2016-05-24T13:31:34Z

@jrevels, looks like a nanosoldier error.

Array benchmarks are unfortunately broken on master right now due to some linalg bug(s). I'm planning a patch to BenchmarkTools that will allow other benchmarks to run even if some error out, but until that patch lands or master is fixed, array benchmarks will fail to execute.

JeffBezanson · 2016-05-27T04:59:02Z

base/abstractarray.jl


-strides(a::AbstractArray) = ntuple(i->stride(a,i), ndims(a))::Dims
+stride(a::AbstractArray, d) = d <= 1 ? 1 : size(a, d-1)*stride(a, d-1)


julia> str(a::AbstractArray, d) = d <= 1 ? 1 : size(a, d-1)*str(a, d-1) julia> a = rand(2,2,2,2); julia> f(n, a) = for i=1:n;stride(a,4);end julia> g(n, a) = for i=1:n;str(a,4);end julia> @time f(10000000, a); 0.055935 seconds (4 allocations: 160 bytes) julia> @time g(10000000, a); 0.096278 seconds (4 allocations: 160 bytes)

Indeed, it is still slower, though I hadn't thought it was a factor of two. I'll change it back to the old more lengthy definition. I guess it was to be expected that loops win over recursion. I just wanted to mimick the strides definition where recursive is the way to go to construct tuples in a type stable manner.

For me, the first one evaluates in the same time, the second one in about 0.06 seconds, so about 20 % slower. The fastest one, however, seems to be the following definitition

mystride2(a::AbstractArray, d) = d < 1 ? 1 : d > ndims(a) ? length(a) : strides(a)[d]

where strides is definitely already the one from this PR.

But that's probably only true if you are probing the last stride, otherwise you are calculating too much. I just don't have that much difference between f and g, but it is true that f (the old definition of strides) is slightly faster.

Jutho · 2016-06-12T20:43:03Z

I finally got some time to update this. If the tests turn green (hopefully), is there anything that remains to be done? What about documentation? (I haven't created any PR anymore since the new doc system is in place so I am a bit outdated on that).

tkelman · 2016-06-12T21:16:07Z

It looks like these are actually documented already, is anything that you're changing here externally visible to users? It may be worth taking the opportunity to move the docstrings out of base/docs/helpdb/Base.jl and put them inline instead if you're going to make any changes to them. Running make docs will propagate those changes to the RST version of the docs which you can then also commit.

tkelman · 2016-06-12T21:16:54Z

base/abstractarray.jl

+_strides{T,N}(out::NTuple{N}, A::AbstractArray{T,N}) = out
+function _strides{M,T,N}(out::NTuple{M}, A::AbstractArray{T,N})
+    @_inline_meta
+    _strides((out..., out[M]*size(a, M)), A)


capital A ?

Jutho · 2016-06-13T06:08:46Z

Thanks for the review and for catching the bug. I changed to capital A in the last minute since I thought that was more common in Base, though both seem to be used.

Nothing needs to be changed to the docs, but I add the docstrings inline tonight.

Jutho · 2016-06-14T06:05:47Z

Ok I think this completes this PR, unless there are other comments

tkelman · 2016-06-14T18:31:50Z

base/abstractarray.jl

-    if i > ndims(a)
-        return length(a)
-    end
+    i > ndims(a) && return length(a)


this makes the code coverage reporting a little less specific in terms of which branches are getting tested, but I guess we've got enough other issues with coverage right now

I don't think we should actively change code to a format that generates worse coverage specificity.

Good time to bump #11802?

tkelman · 2016-06-15T21:53:37Z

LGTM, merge if green I think.

Jutho · 2016-06-16T06:27:53Z

Not sure what happened with Appveyor?

tkelman · 2016-06-16T09:54:31Z

#16556

Jutho · 2016-06-16T10:05:46Z

Thanks

timholy mentioned this pull request May 24, 2016

Speed up size on Arrays #16553

Merged

Jutho force-pushed the master branch from 0d3b720 to ebe4c24 Compare May 25, 2016 21:13

JeffBezanson reviewed May 27, 2016
View reviewed changes

Jutho force-pushed the master branch from ebe4c24 to e16240c Compare June 12, 2016 20:38

tkelman reviewed Jun 12, 2016
View reviewed changes

Jutho force-pushed the master branch from e16240c to bdde484 Compare June 13, 2016 22:36

tkelman reviewed Jun 14, 2016
View reviewed changes

type stable strides + inferred test + docs

5dfbd50

Jutho force-pushed the master branch from bdde484 to 5dfbd50 Compare June 15, 2016 19:34

tkelman merged commit e124787 into JuliaLang:master Jun 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

type stable `strides` #16541

type stable `strides` #16541

Jutho commented May 23, 2016

tkelman commented May 23, 2016

timholy commented May 24, 2016

Jutho commented May 24, 2016

timholy commented May 24, 2016

timholy commented May 24, 2016

timholy commented May 24, 2016

timholy commented May 24, 2016

jrevels commented May 24, 2016

JeffBezanson May 27, 2016

Jutho May 27, 2016

Jutho May 27, 2016

Jutho May 27, 2016

Jutho commented Jun 12, 2016

tkelman commented Jun 12, 2016

tkelman Jun 12, 2016

Jutho commented Jun 13, 2016

Jutho commented Jun 14, 2016

tkelman Jun 14, 2016

KristofferC Jun 14, 2016

Sacha0 Jun 14, 2016

tkelman commented Jun 15, 2016

Jutho commented Jun 16, 2016

tkelman commented Jun 16, 2016

Jutho commented Jun 16, 2016


		strides(a::AbstractArray) = ntuple(i->stride(a,i), ndims(a))::Dims
		stride(a::AbstractArray, d) = d <= 1 ? 1 : size(a, d-1)*stride(a, d-1)

type stable strides #16541

type stable strides #16541

Conversation

Jutho commented May 23, 2016

tkelman commented May 23, 2016

timholy commented May 24, 2016

Jutho commented May 24, 2016

timholy commented May 24, 2016

timholy commented May 24, 2016

timholy commented May 24, 2016

timholy commented May 24, 2016

jrevels commented May 24, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jutho commented Jun 12, 2016

tkelman commented Jun 12, 2016

Choose a reason for hiding this comment

Jutho commented Jun 13, 2016

Jutho commented Jun 14, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkelman commented Jun 15, 2016

Jutho commented Jun 16, 2016

tkelman commented Jun 16, 2016

Jutho commented Jun 16, 2016

type stable `strides` #16541

type stable `strides` #16541