recursive stagedfunction and type inference #8853

Jutho · 2014-10-30T08:49:01Z

I was trying to explore the possibility of using staged functions with tuple arguments in a recursive manner, to reimplement e.g. #6517 . Therefore, I first tried to implement the Fibonacci function using staged functions, where I have an nt::NTuple{N,Int} argument, where the first integer nt[1] corresponds to the Fibonacci index/parameter, and I just do some trivial operations on the other values in the tuple.

This is the code

stagedfunction fibrec{N}(nt::NTuple{N,Int})
    statements=[Expr(:(=),symbol("n$i"),Expr(:call,:+,Expr(:ref,:nt,i),i-1)) for i=1:N]
    args1=Any[symbol("n$i") for i=1:N]
    args1[1]=Expr(:call,:-,:n1,1)
    args2=Any[symbol("n$i") for i=1:N]
    args2[1]=Expr(:call,:-,:n1,2)
    rec1=Expr(:call,:fibrec,Expr(:tuple,args1...))
    rec2=Expr(:call,:fibrec,Expr(:tuple,args2...))

    ex=Expr(:if,:(n1<2),:(return n1),Expr(:return,Expr(:call,:+,rec1,rec2)))
    return Expr(:block,statements...,ex)
end

and for something like N=3 this returned expression is

    n1 = nt[1] + 0
    n2 = nt[2] + 1
    n3 = nt[3] + 2
    if n1 < 2
        return n1
    else 
        return fibrec((n1 - 1,n2,n3)) + fibrec((n1 - 2,n2,n3))
    end

which looks ok.

However, trying to run gives rise to

ERROR: stack overflow
 in _methods at ./reflection.jl:100
 in _methods at /usr/local/julia/usr/lib/julia/sys.dylib
 in _methods at ./reflection.jl:117
 in _methods at /usr/local/julia/usr/lib/julia/sys.dylib
 in _methods at ./reflection.jl:117
 in _methods at /usr/local/julia/usr/lib/julia/sys.dylib
 in _methods at ./reflection.jl:97
 in _methods at /usr/local/julia/usr/lib/julia/sys.dylib
 in abstract_call_gf at ./inference.jl:664
 in abstract_call at ./inference.jl:831
 in abstract_eval_call at ./inference.jl:923
 in abstract_eval at ./inference.jl:948
 in abstract_interpret at ./inference.jl:1103
 in typeinf at ./inference.jl:1432
 in typeinf at /usr/local/julia/usr/lib/julia/sys.dylib
 in typeinf at ./inference.jl:1238
 in abstract_call_gf at ./inference.jl:718
 in abstract_call at ./inference.jl:831
 in abstract_eval_call at ./inference.jl:923
 in abstract_eval at ./inference.jl:948
 in abstract_eval_arg at ./inference.jl:887
 in abstract_eval_call at ./inference.jl:899
 in abstract_eval at ./inference.jl:948
 in abstract_eval_arg at ./inference.jl:887
 in typeinf at ./inference.jl:1495
 in typeinf at /usr/local/julia/usr/lib/julia/sys.dylib
 in typeinf at ./inference.jl:1238
...

and from there on it repeats this last block starting at in abstract_call_gf at ./inference.jl:718 many times.

The text was updated successfully, but these errors were encountered:

mlubin · 2014-10-30T17:32:04Z

👍 Would be cool to see this working.

Jutho · 2014-11-04T22:35:59Z

I can confirm that a similar thing occurs when just building a recursive staged function, but where the return value is not the recursive call and so the return type can be inferred without having to go through the recursion, as for the function permutedimsblock! in the following gist:
https://gist.github.com/Jutho/e10e39a91e1a473315ab

In fact, running

A=randn(10,10);
B=randn(10,10);
permutedimsnew!(B,A,[2,1])

just hangs julia. I didn't get it to throw a stack overflow in any reasonable time and had to force it to give up with Ctrl+C.

Jutho · 2014-12-21T23:41:43Z

Bump.
I think it would be nice to fix this and could lead to some great examples of using stagedfunction. I checked what is going wrong. Essentially, the recursion check in typinf is failing, the reason being is(f.ast,ast0) on line https://github.com/JuliaLang/julia/blob/master/base/inference.jl#L1289 returning false. This is caused because every inference run on a staged function calls func_for_method, which for a staged function calls jl_instantiate_staged ( https://github.com/JuliaLang/julia/blob/master/base/inference.jl#L604 ) every time. While this function always returns the same AST, it does so with a new vector, such that is(f.ast,ast0) fails because it only checks whether the addresses of these objects are the same.

I've checked and simply replacing is(f.ast,ast0) by f.ast==ast0 ( line https://github.com/JuliaLang/julia/blob/master/base/inference.jl#L1289 ) fixes this problem. However, this is probably not the recommended solution? I have no idea about possible side effects of this ...

timholy · 2014-12-22T14:12:35Z

I can't comment on the acceptability of your proposed fix (though it seems correct/unavoidable), but a tiny tip: when linking to line numbers, hit 'y' on the source-code page and it will specialize the URL to the particular commit you're looking at. That way your line numbers don't go out-of-date as the codebase changes.

Jutho · 2014-12-22T14:15:53Z

Thanks for the tip, I'll try to remember. If I don't get any response by tonight, I will just transform this fix into a pull request and see what that gives...

timholy · 2014-12-22T14:23:04Z

I guess the only problem with it might be performance. But if == (or a custom function to perform this particular comparison) first checks addresses and quits if those pass, then you won't even have a performance hit.

Jutho · 2014-12-22T14:25:57Z

I guess (is(f.ast,ast0) || f.ast==ast0) could do?

jakebolewski · 2014-12-22T16:11:53Z

In my tests (is(f.ast,ast0) || f.ast==ast0) is actually slightly faster than what we have currently (counterintuitively). The real fix is to cache the output of jl_instantiate_statged properly, but that is a nontrivial change.

fix #8853

Jutho · 2015-01-18T10:42:42Z

This seems to be still an issue. The fix in #9440 fixed the example above, but the following still fails:

using Base.Cartesian
stagedfunction copy_strided_recursive!{N}(D,startD,stridesD::CartesianIndex{N},S,startS,stridesS::CartesianIndex{N},dims::CartesianIndex{N})
    return quote
        if prod(dims)<2048
            # copy_strided!(D,startD,stridesD,S,startS,stridesS,dims)
        else
            d = $N
            minstrides = min(stridesD,stridesS)
            valued = (dims[d]-1)*minstrides[d]
            @nif $N n->(valued<(dims[n]-1)*minstrides[n]) n->begin
                d = n
                valued = (dims[d]-1)*minstrides[d]
            end

            newdim = dims[d]>>1
            newdims = @ncall $N $dims n->(n==d ? newdim : dims[n])
            copy_strided_recursive!(D,startD,stridesD,S,startS,stridesS,newdims)

            startD += newdim*stridesD[d]
            startS += newdim*stridesS[d]
            newdim = dims[d]-newdim
            newdims = @ncall $N $dims n->(n==d ? newdim : dims[n])
            copy_strided_recursive!(D,startD,stridesD,S,startS,stridesS,newdims)
        end
        return D
    end
end

Jutho · 2015-01-18T12:22:53Z

Strangely enough, acting with code_typed on this example returns the correct, type inferred code. However, actually trying to run it still triggers an infinite recursion in type inference.

Jutho · 2015-01-18T12:28:35Z

Maybe somebody with some better knowledge of how type inference can briefly explain me if code_typed and actually calling a function take the same code path through type inference or not.

Jutho · 2015-01-18T18:14:23Z

This might be related to #8504 . The reason that code_typed worked while running it did not, was because this was being called from within a function where N was an argument and hence its value could not be inferred. Rewriting the function such that N is inferred correctly using a ::Val{N} argument allows to run the function without problems.

Maybe somebody who understands inference.jl knows how this influences the rest of type inference and why recursion is recognized in one case and not in the other case.

Jutho · 2015-01-25T12:16:18Z

Current status:
using the following two test functions:

function test1(N)
    A=Array(Float64,ntuple(N,n->2)::NTuple{N,Int});
    stridesA=CartesianIndex(strides(A))
    dims=CartesianIndex(size(A))
    copy_strided_recursive!(A,1,stridesA,A,1,stridesA,dims)
end
function test2{N}(::Val{N})
    A=Array(Float64,ntuple(N,n->2)::NTuple{N,Int});
    stridesA=CartesianIndex(strides(A))
    dims=CartesianIndex(size(A))
    copy_strided_recursive!(A,1,stridesA,A,1,stridesA,dims)
end

gives rise to the following behavior

@code_typed test1(3) -> N cannot be inferred, stagedfunction compilation is tried twice, but fails since N is unknown, code_typed finished with Any argument
test1(3)-> hangs in infinite inference recursion, the stagedfunction keeps getting compiled with the correct types, so it's the detection of recursion which fails
@code_typed test2(Val{3}()) -> on fresh start, returns fully inferred function after compiling the stagedfunction several times, sometimes with correct and sometimes with Any argument types. This depends on the value of N and on the precise statements inside the function. However, all of this is only after a fresh start; after running something like @code_typed test1(3) first, it hangs and keeps recompiling the stagedfunction with the correct types, so the recursion detection fails.

I tried debugging inference.jl but failed to detect the problem (or get any wiser about what all of this code is doing), despite the useful HackThatBase tool of @ihnorton .

However, caching the stagedfunction generation as suggested by @jakebolewski seems to fix the problem completely, at least in the following naive way as I implemented it. I just cached it in fun_for_method by running the following (just in REPL, doesn't even require to recompile or use HackThatBase):

let stagedcache=Dict{Any,Any}()
    function Base.func_for_method(m::Method, tt, env)
        if !m.isstaged
            return m.func.code
        elseif haskey(stagedcache,(m,tt,env))
            return stagedcache[(m,tt,env)].code
        else
            f=ccall(:jl_instantiate_staged,Any,(Any,Any,Any),m,tt,env)
            stagedcache[(m,tt,env)]=f
            return f.code
        end
    end
end

fixes all of the above problems. Now, since this is probably not the best way or even place to cache the stagedfunction generation, I'll leave it up to somebody who knows the julia internals to implement this in a proper way, but I would like to stress that this is really pertinent. I would not consider the stagedfunction implementation complete as a 0.4-project without fixing this (and #8504 ).

timholy · 2015-01-25T12:50:03Z

Totally agreed that the lingering bugs in stagedfunctions desperately need fixing; there's quite a lot of cleanup of base that should become possible when that happens. (E.g., #9098, needed for #9150, etc).

However, your caching implementation doesn't look so bad to me---indeed, this might be the right way to do it. Care to submit it as a PR?

timholy · 2015-01-25T12:54:16Z

One small concern: stagedcache might become really, really big.

Jutho · 2015-01-25T13:43:24Z

That's why I wanted some input, I don't know what I need to store, which part of the keys are really necessary, where else this could be used, etc. But maybe a PR with RFC is the best way to get the discussion going.

timholy · 2015-01-25T13:51:23Z

I'm not sure about the env part (others will have to comment, I don't even remember what env is), but m and tt are clearly essential. I would say the best solution to the size problem---if it is a problem---would be an LRU-type design. Rather than using a dict, perhaps you could have a cyclic array with fixed capacity of 100 or so.

But it's also not entirely obvious this is a problem: after all, julia has to cache the compiled function, so maybe there isn't a real problem with having a separate cache of the AST.

Jutho · 2015-02-12T19:33:42Z

I forgot to close this issue after PR #9921 got merged, but it now seems we may need to revert that PR because of #10178, so I'll leave it up to somebody else to decide whether this can be closed.

timholy · 2015-02-12T20:29:17Z

I don't think it should be reverted, because redefinition is a less important feature than the fixes this provides. Agreed it's a troublesome situation, however.

I suspect a lot of things would be fixable if we added a version number to methods.

Jutho · 2015-02-12T20:32:07Z

I agree, but the solution to this issue should definitely be revisited by somebody with a better knowledge/understanding of inference.jl.

Jutho · 2015-05-06T09:37:43Z

Yesterday, I ran into a case where the fix of #9921 was not sufficient and this problem popped up again; i.e. a recursive generated function giving rise to an infinite recursion in type inference.

The expression body of my generated function contained a macro, which defined a few local variables, giving rise gensym calls. Indeed, putting a @show in front of the generated expression showed that the @generated function was compiled over and over again, always yielding the same expression up to the name of the variables introduced by the macro. However, the whole point of #9921 was that the @generated function would not be compiled a second time for the same arguments, so I don't know where it goes wrong. Nevertheless, this was clearly the issue since there were two ways to stop type inference from getting stuck:

rewrite the function such that there were no random variable names
add dummy arguments to the function such that the total number of arguments is bigger than 8, I guess at that point type inference gives up?

On a more general note, I was wondering about the following. All of my recursive functions (generated or not) are of the form

function modify_recursive!(arg1, more args ...)
    if some condition
        modify_base!(arg1, more args ...)
    else
        modify_recursive!(arg1, more args for smaller case)
    end
    return arg1
end

I am actually wondering why type inference even bothers trying to compute the return type of the recursive call in the else block of the above definition. It is not used anywhere. So if type inference would only investigate those lines where it needs to infer the types involved to generate optimised code, it could even skip that call. Function calls of which the output is discarded could be ignored in the inference proces, no?

timholy · 2015-05-06T14:13:11Z

Without having any immediate answers to your questions, I'll advertise that my new favorite way to understand what's going inside inference.jl is @ihnorton's amazingly-useful HackThatBase, especially in conjunction with @toivoh's Debug.

Jutho · 2015-05-06T14:24:02Z

I've used both before and they are indeed very useful. I see that you have been committing to those recently to bring them up to date with the latest master. So thanks and 👍 to all three of you!

Jutho · 2015-05-06T17:58:41Z

It seems like the stagedfunction caching fails because it compares the method, the argument types and the env. Now, consider the following test case:

@generated function f{N}(a::NTuple{N,Int})
#code
end
tt, env, m = Base._methods(f,Tuple{Tuple{Int,Int,Int}},-1)[1]
tt2, env2, m2 = Base._methods(f,Tuple{Tuple{Int,Int,Int}},-1)[1]
@show env  #->   svec(N,3)
@show env2 #-> svec(N,3)
@show m==m2, tt==tt2, env==env2 #->  true, true, false

The problem is thus with comparing SimpleVector types. I guess this needs to be fixed, but wouldn't know where to start. I hope @JeffBezanson doesn't mind I involve him in this discussion?

Jutho · 2015-05-06T18:07:23Z

Well, it wasn't actually that hard, just adding the appropriate definition in base.jl. My apologies for the spam @JeffBezanson , especially in light of # 8839 (deliberately broken link).

vtjnash · 2016-03-25T17:19:53Z

this seems to be fixed now

Jutho mentioned this issue Dec 22, 2014

fix #8853 #9440

Merged

Jutho closed this as completed in 239bfb1 Dec 23, 2014

JeffBezanson added a commit that referenced this issue Dec 23, 2014

Merge pull request #9440 from JuliaLang/jh/recursivestagedfix

7261160

fix #8853

Jutho reopened this Jan 18, 2015

Jutho added the bug Indicates an unexpected problem or unintended behavior label Jan 18, 2015

Jutho mentioned this issue Jan 25, 2015

RFC: Add a cache for stagedfunctions #9921

Merged

This was referenced Feb 1, 2015

Towards array nirvana #7941

Closed

Expunging @ngenerate and @nsplat #9098

Merged

Jutho mentioned this issue May 7, 2015

add SimpleVector comparison #11183

Merged

vtjnash closed this as completed Mar 25, 2016

tkelman added the needs tests Unit tests are required for this change label Mar 29, 2016

JKRT mentioned this issue Oct 5, 2020

Other direction JKRT/OMFrontend.jl#5

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recursive stagedfunction and type inference #8853

recursive stagedfunction and type inference #8853

Jutho commented Oct 30, 2014

mlubin commented Oct 30, 2014

Jutho commented Nov 4, 2014

Jutho commented Dec 21, 2014

timholy commented Dec 22, 2014

Jutho commented Dec 22, 2014

timholy commented Dec 22, 2014

Jutho commented Dec 22, 2014

jakebolewski commented Dec 22, 2014

Jutho commented Jan 18, 2015

Jutho commented Jan 18, 2015

Jutho commented Jan 18, 2015

Jutho commented Jan 18, 2015

Jutho commented Jan 25, 2015

timholy commented Jan 25, 2015

timholy commented Jan 25, 2015

Jutho commented Jan 25, 2015

timholy commented Jan 25, 2015

Jutho commented Feb 12, 2015

timholy commented Feb 12, 2015

Jutho commented Feb 12, 2015

Jutho commented May 6, 2015

timholy commented May 6, 2015

Jutho commented May 6, 2015

Jutho commented May 6, 2015

Jutho commented May 6, 2015

vtjnash commented Mar 25, 2016

recursive stagedfunction and type inference #8853

recursive stagedfunction and type inference #8853

Comments

Jutho commented Oct 30, 2014

mlubin commented Oct 30, 2014

Jutho commented Nov 4, 2014

Jutho commented Dec 21, 2014

timholy commented Dec 22, 2014

Jutho commented Dec 22, 2014

timholy commented Dec 22, 2014

Jutho commented Dec 22, 2014

jakebolewski commented Dec 22, 2014

Jutho commented Jan 18, 2015

Jutho commented Jan 18, 2015

Jutho commented Jan 18, 2015

Jutho commented Jan 18, 2015

Jutho commented Jan 25, 2015

timholy commented Jan 25, 2015

timholy commented Jan 25, 2015

Jutho commented Jan 25, 2015

timholy commented Jan 25, 2015

Jutho commented Feb 12, 2015

timholy commented Feb 12, 2015

Jutho commented Feb 12, 2015

Jutho commented May 6, 2015

timholy commented May 6, 2015

Jutho commented May 6, 2015

Jutho commented May 6, 2015

Jutho commented May 6, 2015

vtjnash commented Mar 25, 2016