Do not cache the type before we finish constructing it. #11606

yuyichao · 2015-06-06T19:18:18Z

This fixes the issue mentioned in #11597 (comment) as well as the inconsistency in #11449

yuyichao · 2015-06-06T19:19:55Z

I don't like changes that make weird "working" code raise an error but I think that's the right thing to do....

julia> NTuple{20000, Int}
ERROR: OverflowError()

julia> NTuple{20000, Int}
ERROR: OverflowError()

julia> NTuple{20000, Int}
ERROR: OverflowError()

julia> abstract C{T<:Union(Void, Int)}

julia> type D{T} <: C{T} d::T end

julia> D(1.1)
ERROR: TypeError: C: in T, expected T<:Union(Int64,Void), got Type{Float64}

julia> D{Float64}
ERROR: TypeError: C: in T, expected T<:Union(Int64,Void), got Type{Float64}

julia> D{Float64}
ERROR: TypeError: C: in T, expected T<:Union(Int64,Void), got Type{Float64}

tkelman · 2015-06-06T19:21:43Z

src/jltypes.c

@@ -2047,6 +2045,7 @@ static jl_value_t *inst_datatype(jl_datatype_t *dt, jl_svec_t *p, jl_value_t **i
        ndt->ninitialized = ntp;
    else
        ndt->ninitialized = dt->ninitialized;
+    if (cacheable) jl_cache_type_(ndt);


superficial, but I think an extra newline before this would help readability

@tkelman I go further and add one before and after as well as making it a two line statement (last one is because I hate one like control flow statement in C/C++).

Would be nice if you can abort the old CI runs...

carnaval · 2015-06-06T19:32:13Z

I think this was on purpose because if you have a recursive type you'd want only one instance created ?

yuyichao · 2015-06-06T19:34:08Z

@carnaval Good point. So is there a remove from cache or temporaryly add to cache function?

carnaval · 2015-06-06T19:36:57Z

I'm not going through the source atm, could you explain what was the underlying problem ? Where is the cache used before the initialization is over ?

yuyichao · 2015-06-06T19:39:50Z

@carnaval The issue is that the functions after the type is cached can throw an error (for the two cases above, instantiation of super and calculation of the offset) If it happens, the type will stay in the cache with an invalid state. Causing code using them to behave unexpectedly

yuyichao · 2015-06-06T19:41:05Z

And I guess removing from cache may not be enough because it can be referenced to by other types as well....

yuyichao · 2015-06-06T19:42:25Z

Also please feel free to abort CI...

yuyichao · 2015-06-06T19:46:44Z

The only solution I can think of ATM is to have a temporary global type cache when instantiating the types and only update the real one if everything succeeds.

carnaval · 2015-06-06T19:48:38Z

Even if it makes things more complicated it may be a case where we should do all the checks upfront and only instantiate when it is guaranteed not to fail. Although this doesn't help for some things like OOM exceptions along the way.

Removing from the cache is probably the only fully safe option. We could store a flag on each type saying if it is fully initialized. On error you go through the cache and remove the broken folks.

yuyichao · 2015-06-06T19:49:58Z

I've thought about that too but if I understand correctly the cache is bound to each typenames. Is there a way to iterate all typenames?

yuyichao · 2015-06-06T20:17:25Z

I'm surprised that the test passes.... Maybe we should add some tests for sth like self-refering types?

yuyichao · 2015-06-06T20:36:03Z

Interestingly, the following code doesn't cause an error on my branch.

julia> type B{T}
       b::B{T}
       B() = (b = new(); b.b = b; b)
       end

julia> B{Int}()
B{Int64}(B{Int64}(#= circular reference =#))

julia> abstract A{T}

julia> type C{T} <: A{C{T}}
       end

julia> C{Int}
C{Int64}

@carnaval Any idea why?....

yuyichao · 2015-06-07T12:30:53Z

@carnaval Actually I think this is probably not a problem. There's a jl_typestack_t* that records the types being constructed and from the comment here it seems that this is how recursive instantiation works. (This feels more correct instead of relying on the cache since there's also cachablility etc...)

The travis-CI is just the normal OOM error, the 32-bit AppVeyor is something different. I can't tell if it is related. @JeffBezanson I guess you should have the ultimate (if not the only) word here. (Only when you are not busy with other more important stuff though.)

JeffBezanson · 2015-06-08T03:01:59Z

👍 but I'd like to briefly look into the CI failure.

yuyichao · 2015-06-08T03:08:51Z

@JeffBezanson The failure changes after a rebase. Any hint where to look? e.g. how would the cache interact with other part and why was the cache in that place to begin with (it seems strange to cache sth before it is done constructing....)

tkelman · 2015-06-08T19:00:00Z

Same failure as in #10380 (comment), maybe tuple related?

yuyichao · 2015-06-08T19:20:48Z

The unpredictability of the failure make me think it might be GC related. I'm still going through the code but I've find one place that looks strange.

Is this a missing root or am I seeing things here?

Even if tuple types are always cached I feel like this is too weak an assumption to make here.

@carnaval

yuyichao · 2015-06-08T19:21:26Z

@tkelman = = is there a strip down list of those failures...................

yuyichao · 2015-06-08T19:23:50Z

From the blame afd4eb0#diff-757551c8929efe195c9bab04f9925d05R624 it could certainly be #10380 related...

tkelman · 2015-06-08T19:25:16Z

More worth looking at the commits that I think fixed those issues, 7ec501d or b993658 - you're probably interacting with one of those changes here to bring back the same failures.

yuyichao · 2015-06-08T19:27:06Z

@tkelman You are right that the caching is indeed moving to the front in b993658

JeffBezanson · 2015-06-08T20:52:39Z

@yuyichao Yes, the assumption there was that all concrete tuple types will be cached. In fact they have to be, or there will be many other problems.

yuyichao · 2015-06-08T21:13:34Z

In that case, IMHO this commit should only make a difference for functions that check the type cache which is only inst_datatype AFAICT. Since one of the caller of inst_datatype that can be called recursively (inst_tuple_w_) doesn't check the stack, should we move the stack checking in inst_datatype (after the cache check)?

P.S. what does these *_ *_w *_w_ names means?

vtjnash · 2015-06-08T21:21:27Z

ref #11320, which appears to also show the same issue (if you try to run the example twice)

Do not cache the type before we finish constructing it.

yuyichao · 2015-06-09T13:48:50Z

@JeffBezanson One of the failure seen in this PR shows up on AppVeyor for a few PR's again.....

yuyichao · 2015-06-09T14:15:39Z

So if this is called recursively, it can miss the stack in jl_apply_type_ on line 1997 in inst_datatype ?

vtjnash · 2015-06-09T14:22:04Z

i don't think checking the stack only is sufficient, since it may still be possible to cache a type that points to a partially constructed type (in this case, Vector{Z{I}}:

julia> type Z{I}
         a::Vector{Z{I}}
         b::NTuple{20000,I}
       end

julia> Z{Int}
ERROR: OverflowError()

julia> Vector{Z{Int}}
Array{Z{Int64},1}

yuyichao · 2015-06-09T14:26:59Z

So should I replace the stack with a temporary cache, make sure it is passed to all functions and only merge them to the global cache after everything is done?

yuyichao · 2015-06-09T14:27:06Z

Or is there a better way to do this?

JeffBezanson · 2015-06-09T14:32:38Z

I wouldn't worry so much about this particular failure mode. The right approach is to fix #11597 and #11320. Then it will be very rare if not impossible for type instantiation to fail at this stage.

yuyichao · 2015-06-09T14:34:19Z

I guess someone could still write a very long structure? (or a structure of enough long tuples)?

JeffBezanson · 2015-06-09T14:36:04Z

I think we'll have to allow more offset bits; uint16 is just too small.

yuyichao · 2015-06-09T14:37:34Z

I think we'll have to allow more offset bits; uint16 is just too small.

(And size.) Sure, that works too.... Just wondering if it is this short for some performance argument...

JeffBezanson · 2015-06-09T14:39:07Z

I just wanted to keep it as compact as possible. Before the new tuples it made sense.

vtjnash · 2015-06-09T14:43:53Z

here's an updated example showing where it is in the cache:

julia> abstract A{I}

julia> type Z{I}
                a::A{Z{I}}
                b::NTuple{20000,I}
              end

julia> Z{Int}
ERROR: OverflowError()

julia> A.name.cache[1].parameters[1]
Z{Int64}

yuyichao · 2015-06-09T14:52:57Z

Although even if you increase the size to 32bit, it wouls still be possible to construct invalid types with a few recursions of sth like this without running out of memory

julia> type A{T}
       a::T
       b::T
       end

JeffBezanson · 2015-06-09T15:01:15Z

Actually, it might be preferable to allow making types that are "invalid", and defer errors to run time e.g. when you try to construct an instance. For example you want to be able to ask "is (1,2) a 20000-element tuple?" The question makes sense even if you can't actually construct a 20000-element tuple.

We should probably use size_t. Or use uint16 for small structs, and size_t for bigger ones.

yuyichao · 2015-06-09T15:10:24Z

So e.g. it would be possible to construct an array of NTuple{20000, Int} but you can't push anything to it?

In this case, should querying the properties of the type return an error? If you want isa((1, 2), NTuple{20000, Int}) to be valid than I guess the answer is ~~yes~~ no. But in that case, I guess zeros() of it would happily allocate whatever size it think is appropreate? Or should we store what exactly is the error happening when instantialize the type?

Edit: .... yes -> no

JeffBezanson · 2015-06-09T15:14:46Z

Most of all, we should fix #11597 and #11320. After all, what is the essential reason for the error here? None whatsoever, unless you try to actually allocate an object too big to fit in memory.

carnaval · 2015-06-09T15:18:42Z

What if you get an OOM error while doing the instantiating of a type of reasonable size ? Agreed you're pretty much done at this point but if we're not going to even try handling those we might as well replace it with exit(1).

vtjnash · 2015-06-09T15:20:59Z

at some point, NTuple{ typemax(Int) - 1, NTuple{ typemax(Int) - 2, NTuple{ ... } } } is going to describe a massive data structure. i think we can fix that by conditionally switching from the current fixed size of offsets to a compressed computational format (trading off memory for flops).

JeffBezanson · 2015-06-09T15:21:46Z

Yes, exactly.

JeffBezanson · 2015-06-09T15:25:22Z

After fixing the offsets issue, the exit(1) approach does indeed start to make sense. However there's no need to be so unfriendly; we can print a big warning explaining exactly what happened and warning that the system might be in an unstable state. You want to give people a chance to save work.

ScottPJones · 2015-06-09T18:17:04Z

Please don't take control out of the application developer's hands. Also, printing a big message (in fact, any message) is also not a good idea if you are not at the REPL, or there is no handler for the exception you should give.
If you run out of memory, you should free up any caches that the system might be keeping and run through a full GC, and if the memory allocation still cannot succeed, you should log information that this occurred, throw an OutOfMemory exception, so that the application can figure out how to proceed, freeing up any data structures it no longer needs, committing or rolling back any transactions, etc.

reinstates and hopefully fixes the problem in #11606

reinstates and hopefully fixes the problem in JuliaLang#11606

tkelman reviewed Jun 6, 2015
View reviewed changes

yuyichao force-pushed the delay-cache-type branch from 432c6f4 to 904ff26 Compare June 6, 2015 19:25

yuyichao mentioned this pull request Jun 6, 2015

forward propagate type parameter constraints of super class #11597

Closed

yuyichao force-pushed the delay-cache-type branch from 904ff26 to b04ea0d Compare June 7, 2015 22:40

yuyichao force-pushed the delay-cache-type branch from b04ea0d to 11f5324 Compare June 8, 2015 17:33

yuyichao mentioned this pull request Jun 8, 2015

RFC: run tests with a release+asserts build and 4 workers #11614

Closed

JeffBezanson added a commit that referenced this pull request Jun 9, 2015

Merge pull request #11606 from yuyichao/delay-cache-type

057f573

Do not cache the type before we finish constructing it.

JeffBezanson merged commit 057f573 into JuliaLang:master Jun 9, 2015

yuyichao deleted the delay-cache-type branch June 9, 2015 05:54

yuyichao mentioned this pull request Jun 9, 2015

<<, >>, >>>: use intrinsics for more mixed-type shift operations. #11629

Merged

yuyichao restored the delay-cache-type branch June 9, 2015 14:29

yuyichao mentioned this pull request Jun 9, 2015

Revert "Do not cache the type before we finish constructing it." #11632

Merged

JeffBezanson added a commit that referenced this pull request Jun 16, 2015

fix order of type cache insertion and UID assignment

fecadcb

reinstates and hopefully fixes the problem in #11606

fcard pushed a commit to fcard/julia that referenced this pull request Jul 8, 2015

fix order of type cache insertion and UID assignment

0af1e8b

reinstates and hopefully fixes the problem in JuliaLang#11606

sbromberger mentioned this pull request Jul 8, 2015

Strange type parameterization issue introduced around 30 June (or perhaps 16 June) #12063

Closed

yuyichao deleted the delay-cache-type branch July 25, 2015 21:09

Do not cache the type before we finish constructing it. #11606

Do not cache the type before we finish constructing it. #11606

Conversation

yuyichao commented Jun 6, 2015

yuyichao commented Jun 6, 2015

tkelman Jun 6, 2015

Choose a reason for hiding this comment

yuyichao Jun 6, 2015

Choose a reason for hiding this comment

tkelman Jun 6, 2015

Choose a reason for hiding this comment

carnaval commented Jun 6, 2015

yuyichao commented Jun 6, 2015

carnaval commented Jun 6, 2015

yuyichao commented Jun 6, 2015

yuyichao commented Jun 6, 2015

yuyichao commented Jun 6, 2015

yuyichao commented Jun 6, 2015

carnaval commented Jun 6, 2015

yuyichao commented Jun 6, 2015

yuyichao commented Jun 6, 2015

yuyichao commented Jun 6, 2015

yuyichao commented Jun 7, 2015

JeffBezanson commented Jun 8, 2015

yuyichao commented Jun 8, 2015

tkelman commented Jun 8, 2015

yuyichao commented Jun 8, 2015

yuyichao commented Jun 8, 2015

yuyichao commented Jun 8, 2015

tkelman commented Jun 8, 2015

yuyichao commented Jun 8, 2015

JeffBezanson commented Jun 8, 2015

yuyichao commented Jun 8, 2015

vtjnash commented Jun 8, 2015

yuyichao commented Jun 9, 2015

yuyichao commented Jun 9, 2015

vtjnash commented Jun 9, 2015

yuyichao commented Jun 9, 2015

yuyichao commented Jun 9, 2015

JeffBezanson commented Jun 9, 2015

yuyichao commented Jun 9, 2015

JeffBezanson commented Jun 9, 2015

yuyichao commented Jun 9, 2015

JeffBezanson commented Jun 9, 2015

vtjnash commented Jun 9, 2015

yuyichao commented Jun 9, 2015

JeffBezanson commented Jun 9, 2015

yuyichao commented Jun 9, 2015

JeffBezanson commented Jun 9, 2015

carnaval commented Jun 9, 2015

vtjnash commented Jun 9, 2015

JeffBezanson commented Jun 9, 2015

JeffBezanson commented Jun 9, 2015

ScottPJones commented Jun 9, 2015