-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: ReshapedArrays #10507
WIP: ReshapedArrays #10507
Conversation
I should have added, you can see that most of that penalty is accounted for by comparison with |
Wow! |
It's great to see Linear indexing into an With respect to |
I agree with the name, will see what I can come up with.
While I haven't timed it myself, I doubt it will be so simple. For a single call, I'm pretty sure I will look into the grouping part, too. But I suspect we want it the way it is, since |
I should have explaied myself better. I didn't mean to imply that |
Put the ReshapedArray tests inside a module, since it defines new types.
3ad4b2d
to
f5905b1
Compare
OK, this is basically working now (the failures are OSX timeouts), with one big omission not caught by our current tests: indexing a ReshapedArray with the "wrong" number of indexes (neither 1 nor |
# Special type to handle div by 1 | ||
immutable FastDivInteger1{T} <: FastDivInteger{T} end | ||
|
||
(*)(a::FastDivInteger, b::FastDivInteger) = a.divisor*b.divisor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe when a and b are both FastDivInteger
there product should also be one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a reasonable suggestion.
I'll try to spend some more time with my abstract fallbacks this weekend. But there's still some major design issues. First and foremost is how we have array types like this spell their function getindex(A::MyType, I::Int...)
ndims(A) != length(I) && invoke(getindex, tuple(AbstractArray, typeof(I)...), A, I...)
…
end |
I had a little more time and finally looked into this. Let me show you the benchmark: julia> A = rand(5,5,5);
julia> B = sub(A, 1:5, 1:5, 1:5);
julia> mi5 = Base.FastDivInteger(5)
Base.SignedFastDivInteger{Int64}(5,7378697629483820647,0,0x01)
julia> mi = (mi5, mi5, mi5)
(Base.SignedFastDivInteger{Int64}(5,7378697629483820647,0,0x01),Base.SignedFastDivInteger{Int64}(5,7378697629483820647,0,0x01),Base.SignedFastDivInteger{Int64}(5,7378697629483820647,0,0x01))
julia> ind = Base.Reshaped.IndexMD(mi, size(B))
Base.Reshaped.IndexMD{3,3,(Base.SignedFastDivInteger{Int64},Base.SignedFastDivInteger{Int64},Base.SignedFastDivInteger{Int64})}((Base.SignedFastDivInteger{Int64}(5,7378697629483820647,0,0x01),Base.SignedFastDivInteger{Int64}(5,7378697629483820647,0,0x01),Base.SignedFastDivInteger{Int64}(5,7378697629483820647,0,0x01)),(5,5,5))
julia> R1 = Base.Reshaped.ReshapedArray{eltype(B),ndims(B),typeof(B),(Colon,Colon,Colon)}(B, (:,:,:), size(B));
julia> R2 = Base.Reshaped.ReshapedArray{eltype(B),ndims(B),typeof(B),(typeof(ind),)}(B, (ind,), size(B));
julia> function runsum(R, n)
s = 0.0
for k = 1:n
for I in eachindex(R)
s += R[I]
end
end
s
end
runsum (generic function with 1 method)
julia> runsum(R1, 1)
62.84683709443972
julia> runsum(R2, 1)
62.84683709443972
julia> @time runsum(R1, 10^5)
elapsed time: 0.153061342 seconds (32 kB allocated)
6.2846837094109e6
julia> @time runsum(R2, 10^5)
elapsed time: 0.39375505 seconds (112 bytes allocated)
6.2846837094109e6
julia> @time runsum(R1, 10^5)
elapsed time: 0.152238983 seconds (112 bytes allocated)
6.2846837094109e6
julia> @time runsum(R2, 10^5)
elapsed time: 0.401355134 seconds (112 bytes allocated)
6.2846837094109e6 So, the two options appear to be:
Thoughts on how to chose between these? |
For anyone who might be interested: I now suspect the best approach is not to stress too much about making "ordinary" indexing all that fast for a reshaped array. The better approach (consistent, I think, with suggestions that @simonster has made), is to build special iterators for reshaped arrays. The key question is whether we want to have them "pretend" to the outside world that they have the external shape (in which case they'll need to maintain both an "external" and and "internal" iterator), or whether it's OK for them to just provide the "internal" iterator. In other words, should algorithms that use two arrays look like this: for I in eachindex(A, B)
A[I] = B[I]
end or like this: for (IA, IB) in zip(eachindex(A), eachindex(B))
A[IA] = B[IB]
end or (if you're really worried about performance) like this: RA = eachindex(A)
RB = eachindex(B)
if RA == RB
for I in RA
A[I] = B[I]
end
else
for (IA, IB) in zip(RA, RB)
A[IA] = B[IB]
end
end The latter gives you the advantage of sometimes only needing to check/update a single iterator, but using two when the two arrays are not commensurate. The hard parts: handling reductions & broadcasting the way we do now (see |
Since folks seem to be thinking about this a bit, let me point out that the 3x runtime penalty seems like it might not be inevitable. All the operations for fast division should be able to be performed in registers, which makes me think that (even though it's more computations) it should barely be measurable on the scale of, say, cache misses. If (through better codegen?) we could reduce that penalty to something much smaller, then this PR wouldn't be stuck between unpleasant alternatives. Also, I'll note it's been a long time since I benchmarked this, and I'm going by memory here. |
Superseded by #15449. |
This one is rather more fun than my last PR. It adds the ability to reshape an arbitrary
AbstractArray
without making a copy, creating a new type of view. The logic and a demonstration of this approach was described in #9874 (comment). Specifically, reshaping does not compose with subarray-indexing, so you really need two types of views (unfortunately).To get good performance out of this, I had to optimize the tar out of it. (Honestly, I tried not to.) Any splatting/slurping was deadly, so I had to manually unpack each element of every tuple. I also had to add
@inline
markers in lots of places. And the performance still isn't great, I think mostly because of tuples (see below). Finally, I stole @simonster's neat work on speeding updiv
(I credited the commit to him), updating it for julia 0.4. I noted that I got errors if I tried to create a multiplicative inverse for 1, so I had to introduce a whole new type just to handle this special case. If any of the smart number-focused folks around here has a better idea, I'm all ears.One way in which this is incomplete (i.e., won't pass tests) is that it doesn't allow one to index a ReshapedArray with anything other than (1) a scalar index (linear indexing) or (2) scalar indexing with the full
N
indexes, if the array isN
-dimensional. I could implement that forReshapedArrays
just like I did for SubArrays, but frankly this kind of stuff is no fun to implement time and time again. I'm hoping @mbauman delivers us a general framework (see #10458 (comment)) so that people who implementAbstractArray
s only need to worry about those two types of indexing.This also won't pass the tests because currently it reshapes a
Range
. SinceRange
s are "read-only", I suspect that's not what we want? They are also currently used heavily for creating of arrays in combination withreshape
.Finally, I also took the opportunity to create a performance test suite for our array indexing. The results have revealed a few unexpected trouble spots, particularly for small arrays---the main problems seem to come from extracting tuple fields. Here's the key:
sumelt
refers tofor a in A
indexing;sumeach
tofor I in eachindex(A)
;sumfast
tofor i = 1:length(A)
ifA
has a fast linear indexing trait set (otherwise it's redundant withsumeach
).I
refers to integer,F
to float. The latter is better-SIMDable.s
means small (3-by-5),b
means big (300-by-500)ArrayLF
has fast linear indexing,ArrayLS
does not.ArrayStrides
andArrayStride1
are two implementations of very simple strided array types, used for comparison against other types that exploit similar operations.As you can see, indexing with ReshapedArrays has an unfortunate penalty with respect to Arrays.