Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make eachindex more efficient for linear arrays #10704

Merged
merged 2 commits into from
Apr 2, 2015

Conversation

mbauman
Copy link
Member

@mbauman mbauman commented Mar 31, 2015

This greatly improves the ability to use eachindex in generic code. For arrays with fast linear storage, eachindex now simply returns a UnitRange to linearly index the array. For arrays with slow linear access, this still returns a CartesianRange. In typical code constructs, the resulting elements from the iterators can be used identically. (If not, I would consider it a missing functionality in CartesianIndex.)

As a result, we can simplify AbstractArray cartesian iteration by moving all the complexity into the eachindex iterator.

Cc @timholy, ref the discussion in #4774 (comment)

@@ -713,7 +713,7 @@ function skip_deleted(h::Dict, i)
end

start(t::Dict) = skip_deleted(t, 1)
done(t::Dict, i) = done(t.vals, i)
done(t::Dict, i) = i > length(t.vals)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why this change is necessary...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the iterator state to be a tuple of (IndexIterator, IteratorState) instead of just the next index. Dict was (incorrectly) passing a state to the Array iteration protocol that did not come from the Array's start or next.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks.

@JeffBezanson
Copy link
Member

+1
This is definitely the way forward!

I worry about the extra complexity in AbstractArray iteration. Maybe we should still use the old, super-simple definitions for Array, in the interest of code size if not performance.

@timholy
Copy link
Member

timholy commented Mar 31, 2015

I'm completely in favor.

Have you examined performance and/or looked at the generated code to make sure there is no performance loss? Given Jeff's concern, it would be interesting to know whether all the apparent complexity ends up getting inlined out for Arrays. (That said, for people digging into base with @which trying to figure out how it all works, there's no doubt the more general code is harder to understand...so Jeff's point has validity even aside from the generated code.)

@JeffBezanson
Copy link
Member

I imagine everything can be inlined, and this can be really fast. But iterating over Arrays is so common that this could create significant extra work for type inference and the inliner, even if there is no effect on the final machine code.

@timholy
Copy link
Member

timholy commented Mar 31, 2015

True indeed. I'm in favor of the idea of reintroducing specialized versions for Arrays.

@mbauman
Copy link
Member Author

mbauman commented Apr 1, 2015

That makes sense. For linear fast arrays, this inlines perfectly and has no noticeable performance impact, but I agree that we should keep the specialized methods for Array to reduce the code size. I had only been looking at performance for fast linear arrays…

The bad news is that this is not inlined for Linear Slow arrays with a huge performance impact. I suppose that makes sense, though… the previous next methods that this replaces were explicitly inlined. But even with that change I'm seeing a 10x performance regression. I'll have to look closer at this.

@mbauman mbauman changed the title Make eachindex more efficient for linear arrays WIP: Make eachindex more efficient for linear arrays Apr 1, 2015
@StefanKarpinski
Copy link
Member

What are "Linear Slow arrays"?

@mbauman
Copy link
Member Author

mbauman commented Apr 1, 2015

I updated the documentation for eachindex here to try to make this a bit more clear:

Creates an iterable object for visiting each index of an AbstractArray A in an efficient manner. For array types that have opted into fast linear indexing (like Array), this is simply the range 1:length(A). For other array types, this returns a specialized Cartesian range to efficiently index into the array with indices specified for every dimension.

I think that Base.linearindexing should also be documented (albeit still not exported since it is typically only extended but not called). Here's a stab at what it might say:

linearindexing defines how an AbstractArray most efficiently accesses its elements. If Base.linearindexing(A) returns Base.LinearFast(), this means that linear indexing with only one index is an efficient operation. If it instead returns Base.LinearSlow() (by default), this means that the array intrinsically accesses its elements with indices specified for every dimension. Since converting a linear index to multiple indexing subscripts is typically very expensive, this allows generic code to be written in an efficient manner for all array types with trait-based dispatch.

@mbauman
Copy link
Member Author

mbauman commented Apr 1, 2015

Sprinkling in a few more inline annotations has helped some, but I'm still allocating 128 bytes per loop for iterating over a sparse matrix.

@timholy
Copy link
Member

timholy commented Apr 1, 2015

It's almost perfect 😄. I would consider rewording the last sentence to something like

...this provides a traits-based mechanism allowing one to write generic code in an efficient manner for all array types.

@mbauman
Copy link
Member Author

mbauman commented Apr 1, 2015

OK! Getting closer to parity with master here. No more allocations (I think the core issue was having nested tuples). This is now actually a bit faster for loops like for i in eachindex(A). Since this depends upon the iteration protocol of eachindex, that's a hard limit as to how fast we can iterate for a in A… but somehow master is able to iterate 5x faster for small arrays. It is currently cheating, but the perf tests I've been running wrap everything in @inbounds anyways. Edit: performance comparison - sumelt is for a in A, sumeach is for a in eachindex(A), sumfast is what this PR does. Is is Int Small, Fb is Float big.

Now to chase down those test failures...

@timholy
Copy link
Member

timholy commented Apr 1, 2015

Nice!

There were already a few oddities about our performance, and overall even on master we're a bit slow on small arrays (see the benchmarks in #10507, focusing on Array, ArrayLS, and ArrayLF).

@mbauman
Copy link
Member Author

mbauman commented Apr 1, 2015

Ah, great suggestion. I ran the full suite (and updated the gist). I'm on-par or better than everything except that wonky SubArray case. In some cases performance here is 10x better. So I'm considering performance fixed.

@timholy
Copy link
Member

timholy commented Apr 1, 2015

That's awesome! For any other performance oddities, also bear #9080 in mind. It seems to affect all "complex" iterators. (Unless you've solved it??)

@mbauman
Copy link
Member Author

mbauman commented Apr 1, 2015

julia> @time sumcart_manual(A)
elapsed time: 0.14881743 seconds (184 bytes allocated)
5.000083609636129e7

julia> @time sumcart_iter(A)
elapsed time: 0.145975981 seconds (184 bytes allocated)
5.000083609636129e7

:)

@timholy
Copy link
Member

timholy commented Apr 1, 2015

Yes, but I'm guessing you're cheating :-). With this PR, you have to switch sumcart_iter to for I in eachindex(size(A)).

@mbauman
Copy link
Member Author

mbauman commented Apr 1, 2015

Aw, shucks, I didn't notice that. Right you are, this doesn't solve it.

@timholy
Copy link
Member

timholy commented Apr 1, 2015

You had me hoping there...anyway, just wanted to make sure you knew about that issue---might help interpret any performance oddities.

This greatly improves the ability to use `eachindex` in generic code.  For arrays with fast linear storage, `eachindex` now simply returns a `UnitRange` to linearly index the array. For arrays with slow linear access, this still returns a `CartesianRange`. In typical code constructs, the resulting elements from the iterators can be used identically.

As a result, we can simplify AbstractArray cartesian iteration by moving all the complexity into the `eachindex` iterator.
@mbauman mbauman changed the title WIP: Make eachindex more efficient for linear arrays Make eachindex more efficient for linear arrays Apr 2, 2015
@mbauman
Copy link
Member Author

mbauman commented Apr 2, 2015

Alright! All tests pass locally (including the full subarray suite). I've squashed the commits so this should be good to go (pending CI and a final thumbs up).

# 0-d cartesian ranges are special-cased to iterate once and only once
start{I<:CartesianIndex{0}}(iter::CartesianRange{I}) = false
next{I<:CartesianIndex{0}}(iter::CartesianRange{I}, state) = iter.start, true
done{I<:CartesianIndex{0}}(iter::CartesianRange{I}, state) = state

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's indeed much nicer than the old trick with finished.

@Jutho
Copy link
Contributor

Jutho commented Apr 2, 2015

Looks great to me!

## iteration support for arrays as ranges ##
## iteration support for arrays ##
macro _inline_meta()
Expr(:meta, :inline)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment here about pushmeta! not yet being available in the bootstrap might be handy. Otherwise
there's a (small) risk we might see usages of Base.@_inline_meta popping up in package code from folks who learn by browsing through the source.

@timholy
Copy link
Member

timholy commented Apr 2, 2015

👍 Thanks for doing this!

mbauman added a commit that referenced this pull request Apr 2, 2015
Make eachindex more efficient for linear arrays
@mbauman mbauman merged commit d61c7e4 into JuliaLang:master Apr 2, 2015
@mbauman mbauman deleted the mb/eachindex branch April 2, 2015 13:30
@mbauman mbauman mentioned this pull request Mar 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants