Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Move Colon translation out of the parser #10331

Merged
merged 2 commits into from
Jun 12, 2015
Merged

Conversation

mbauman
Copy link
Member

@mbauman mbauman commented Feb 25, 2015

This is a work in progress to move the translation of [:] out of the parser. I've squashed-and-split @andreasnoack's ambitious work from #9150 to exclude the bigger change (returning views from indexing with UnitRanges) and just focus on indexing with Colon. While this previously included the code that adapted base array types to accept Colon() as an index, that portion was broken out and subsumed by #10525. This is now only the parser change.

This is still a work-in-progress as it hasn't made it through the test suite yet. Right now the blocker is in getindex(::SubArray, ::Colon), which is a pretty gnarly piece of metaprogramming. But I think it's pretty close. Would you have a chance to look into it, @timholy?

This takes a cue from SubArray and doesn't even create a range for indexing into Array types, which is really cool. In general, this approach will require all custom AbstractArray types to allow indexing with Colon. We could define a fallback getindex(::AbstractArray, ::Union(Real, AbstractArray, Colon)…) to do the lowering step, but it's then a little tricky for custom types to define their indexing methods without ambiguities.

Todo:

Fixes #9419.

@@ -67,6 +67,8 @@ immutable UnitRange{T<:Real} <: OrdinalRange{T,Int}
end
UnitRange{T<:Real}(start::T, stop::T) = UnitRange{T}(start, stop)

colon() = Colon()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this. colon is never called with 0 arguments as far as I know. The front end can just produce (top :) or (call (top Colon)) instead.

@JeffBezanson
Copy link
Member

Nice! This is something I've wanted for a while.

@JeffBezanson
Copy link
Member

ref #9419

@timholy timholy mentioned this pull request Feb 26, 2015
15 tasks
@timholy
Copy link
Member

timholy commented Feb 26, 2015

Sorry, can you give a little more detail about what you want me to look at? Currently I don't think there is a getindex(::SubArray, ::Colon)---are you saying that implementing it will be a gnarly bit of metaprogramming or something else?

My current proposal is that this shouldn't involve any metaprogramming at all: it should be

getindex(S::SubArray, ::Colon) = reshape(S, Colon())

and let a new reshape handle it. I suspect it will be cleaner and more efficient. (There is at least one nasty SubArray method I'm looking forward to deleting, if that proposal passes muster.)

@mbauman
Copy link
Member Author

mbauman commented Feb 26, 2015

Sure. I'm slowly grokking Cartesian more and more as I work my way through this, myself, too. Right now, indexing a SubArray with a Colon() bails in the middle of the complicated merge_indexes_div. I saw that and I threw my hands in the air. But you're probably right that there'd be a better place to handle Colons for SubArrays. I can keep toying with it myself, but I just felt so close that I figured I'd post it and see if folks had any pointers.

Thanks for taking a look. :)

@timholy
Copy link
Member

timholy commented Feb 26, 2015

merge_indexes and merge_indexes_div are indeed the methods I'm so, so looking forward to deleting.

@timholy
Copy link
Member

timholy commented Feb 26, 2015

Any comments you have on the reshape proposal would certainly be helpful, though. I'm worried about the nesting, but since it allows us to avoid calling div or allocating big Vector{Int} indexes, I suspect it will be a win overall.

@mbauman
Copy link
Member Author

mbauman commented Feb 26, 2015

By using recursive multiple dispatch for the index_shape with that extra dims argument, we seem to be running up against an inference boundary one dimension sooner than on master here:

julia> Base.return_types(getindex, (Array{Int,4}, UnitRange{Int}, Int))
1-element Array{Any,1}:
 Array{Int64,1}

julia> # This is well-typed and returns a Matrix on master, but not on this branch
       Base.return_types(getindex, (Array{Int,4}, UnitRange{Int}, UnitRange{Int}, Int))
1-element Array{Any,1}:
 Array{T,N}

julia> # Inference isn't precise on either my branch or master
       Base.return_types(getindex, (Array{Int,4}, UnitRange{Int}, UnitRange{Int}, UnitRange{Int}, Int))
1-element Array{Any,1}:
 Array{T,N}

@Jutho
Copy link
Contributor

Jutho commented Feb 26, 2015

Any comments you have on the reshape proposal would certainly be helpful, though. I'm worried about the nesting, but since it allows us to avoid calling div or allocating big Vector{Int} indexes, I suspect it will be a win overall.

I like the reshape proposal, but I don't understand the comment that this avoids calling div. The ind2sub call is filled with div and rem no? I guess it's probably also time to replace ind2sub with a stagedfunction that generates specialized code for every N in dims::NTuple{N,Int}?

@timholy
Copy link
Member

timholy commented Feb 26, 2015

Thanks for looking!

Regarding div, the current state does not reflect the final state. When more fully fleshed out, it will avoid calling div via #8188 (comment). Those magic numbers will be computed at the time of construction, and once computed they can be used for faster indexing. That's by far the most important optimization to make, but I'm sure you recognize many others, too.

@Jutho
Copy link
Contributor

Jutho commented Feb 26, 2015

Great, looking forward to that. But probably it's still a good idea to reimplement ind2sub as well, as probably many people are using this. E.g., with the following:
(updated to avoid one rem for the Nth subscript)

stagedfunction myind2sub{N}(dims::NTuple{N,Integer}, ind::Integer)
    subsyms = [symbol("i$k") for k=1:N]
    subex = Expr(:block,[:($(subsyms[k])=rem(ind,dims[$k])+1;ind=div(ind,dims[$k])) for k=1:N-1]...)
    quote
        ind-=1
        $(subex)
        $(subsyms[N]) = ind+1
        $(Expr(:tuple,subsyms...))
    end
end

I already obtain

julia> test(10000)
elapsed time: 0.003096647 seconds (2 MB allocated)
elapsed time: 0.000611261 seconds (0 bytes allocated)

I'll submit a PR later today.

@timholy
Copy link
Member

timholy commented Feb 26, 2015

👍

@Jutho
Copy link
Contributor

Jutho commented Feb 26, 2015

On a related note, although I apologize for hijacking the original discussion, should we also have a PermutedArray to have a lazy/data-sharing version of permutedims? This could have efficient indexing if we are willing to encode the permutation as type parameter. The type parameter can not be determined at compile time, but I guess that's the same with ReshapedArray.

@timholy
Copy link
Member

timholy commented Feb 26, 2015

There seemed to be some enthusiasm for this general idea in #6837.

Not exactly sure what you mean by "type parameter cannot be determined at compile time," can you elaborate? I presume you're thinking of putting the actual permutation as a tuple into the parameters. If you can't get julia to dispatch on the tuple, you can use a 2-stage process:

getindex{T,N,P}(A::PermutedArray{T,N,P}, indexes...) = getindexP(A, P, indexes...)
stagedfunction getindexP{TT}(A, P::TT, indexes...)
    ...
end

The TT ("tuple-type") forces specialization on the type of P, even if P is a tuple. If P is encoded as a tuple of Vals (or you use a val(P) to convert it into one), then I think you can discover the permutation at compile time.

@Jutho
Copy link
Contributor

Jutho commented Feb 26, 2015

I did indeed not explain myself properly, I meant, at the time where the PermutedArray is created; e.g. creating a B::PermutedArray{T,N,A<:AbstractArray,P} from B=permutedims(parent::AbstractArray,P::(Int...,)) will require to transfer P from the value domain to the type domain, which I guess needs to happen dynamically. But then, once B is created, calling getindex on it has the information of P in the type domain and can thus generate efficient code that just permutes the indices as required.

@timholy
Copy link
Member

timholy commented Feb 26, 2015

Got it. I agree that it's not necessary to infer the permutation at compile time. Any computationally-heavy work should be in a separate function anyway.

@Jutho Jutho mentioned this pull request Feb 26, 2015
@JeffBezanson
Copy link
Member

we seem to be running up against an inference boundary

I'm on the case! I may have a fix for this soon. I've been limiting argument lists too aggressively. I'm trying to switch this to only limit argument lists that are growing due to recursion.

@mbauman
Copy link
Member Author

mbauman commented Feb 28, 2015

Updated! Now the only blocker is #10341. I temporarily lowered the number of tested dimensions in the test-suite to see if AppVeyor or Travis will catch any other failures (all other tests pass on my machine).

The other thing I think needs a conscious decision here is if or how we want to handle a deprecation-like warning. Custom array types in packages probably aren't prepared to handle ::Colon in their indexing methods.

@mbauman
Copy link
Member Author

mbauman commented Feb 28, 2015

Does Travis use 32-bit BLAS binaries? That was handy for catching a missing method that wasn't covered in the test suite. (It's tested now).

@tkelman
Copy link
Contributor

tkelman commented Feb 28, 2015

32-bit-ints, yeah. Generally distributions don't ship ILP64 BLAS, if they do they better be really careful about marking them as conflicting with every other BLAS package.

@mbauman
Copy link
Member Author

mbauman commented Feb 28, 2015

Hm, now there's a strange OS X failure that I'm not seeing locally:

    From worker 3:       * arrayops            Worker 3 terminated.
Assertion failed: (rval->getType() == jl_pvalue_llvmt || rval->getType() == NoopType), function emit_assignment, file codegen.cpp, line 2920.

signal (6): Abort trap: 6
__pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)

All other platforms seem to be fine.

mbauman added a commit to JuliaArrays/AxisArrays.jl that referenced this pull request Feb 28, 2015
Requires JuliaLang/julia#10331. But I think it is also running into a MAX_TUPLE_LEN issue, which is messing up my workaround for JuliaLang/julia#10191.

Needs tests, too.
@mbauman mbauman force-pushed the mb/colon-indexing branch from 9a46031 to 3908e2b Compare March 5, 2015 20:44
@JeffBezanson
Copy link
Member

Ok #10341 is merged.

@mbauman mbauman force-pushed the mb/colon-indexing branch from 3908e2b to 48fcacf Compare March 6, 2015 14:15
@tkelman
Copy link
Contributor

tkelman commented Jun 4, 2015

error during bootstrap:
UndefVarError(var=:string)

wha?

@mbauman
Copy link
Member Author

mbauman commented Jun 4, 2015

Hah, I suppose I should have tried building this first. It's just that part of the indexing machinery isn't available by the time inference wants to use it. Should be a simple fix.

@mbauman
Copy link
Member Author

mbauman commented Jun 4, 2015

5/6 tests passed (one OOMkill). I'll rip the space-time continuum and reorder the commits so we don't have an intermediate breakage. I'd prefer to keep them separate since they're discrete operations.

Then this will actually be good to go.

@mbauman mbauman force-pushed the mb/colon-indexing branch from 82b5024 to ca99175 Compare June 4, 2015 17:32
@tkelman
Copy link
Contributor

tkelman commented Jun 4, 2015

reorder the commits

Good call, thanks. I think I restarted a few other OOM failures here so the actual numbers were worse, but +1 for merging.

@StefanKarpinski
Copy link
Member

@mbauman is having a hell of a week

@mbauman
Copy link
Member Author

mbauman commented Jun 11, 2015

Phew! Some week indeed. :)

Now that the dust is starting to settle from #10525, I'd like to merge this soon. This shouldn't be nearly as troublesome, but it will test #10525 more thoroughly.

@StefanKarpinski
Copy link
Member

I'm all for it.

@mlubin
Copy link
Member

mlubin commented Jun 11, 2015

Yes please

mbauman added a commit that referenced this pull request Jun 12, 2015
Move Colon translation out of the parser
@mbauman mbauman merged commit 53a222f into master Jun 12, 2015
@mbauman mbauman deleted the mb/colon-indexing branch June 12, 2015 00:29
@mbauman
Copy link
Member Author

mbauman commented Jun 12, 2015

Done!

@joehuchette
Copy link
Member

💯 awesome!

@yuyichao
Copy link
Contributor

Cool!

@tkelman
Copy link
Contributor

tkelman commented Jun 12, 2015

We should really start re-running appveyor on some of these more complicated PR's right before merging. This is freezing during second-stage inference builds on win64 https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.5619/job/xln3xt63xsxs4m1i, 100% of the time (edit: oops, sorry, apparently not 100%). If no one has a fix right away I think we'll need to revert.

tkelman added a commit that referenced this pull request Jun 12, 2015
we clearly need a better long-term strategy for this

[av skip] because there's a separate failure right now, freezing from #10331
@yuyichao
Copy link
Contributor

@tkelman I was going to re-enable the bad commit in my gc codgen PR to test this. Do you have a backtrace to see where it gets stuck?

@tkelman
Copy link
Contributor

tkelman commented Jun 12, 2015

I don't think there's any way of getting a backtrace while frozen in compiling inference. Maybe I could try attaching gdb, but the backtraces aren't always great on Windows. I might have a much better fix, taking Colon setindex! out of core inference and moving it to something later, like arraymath.jl (for lack of a better home). I'll replace and rename my "Revert" PR if that works locally.

@yuyichao
Copy link
Contributor

but the backtraces aren't always great on Windows.

I don't have any experience with backtrace on windows but that might be useful enough...

@tkelman
Copy link
Contributor

tkelman commented Jun 12, 2015

julia-debug.exe does not exhibit the freeze, but I'll see if I can get anything meaningful out of the non-debug exe in gdb. Meanwhile, waiting on CI but I think #11678 will work and be a much better fix than reverting (sorry for being so revert-happy, but the way our CI is set up does not always satisfy the Not Rocket Science Rule).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

allow getindex(::MyType, ::Colon) to be customized