Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not cache the type before we finish constructing it. #11606

Merged
merged 2 commits into from
Jun 9, 2015

Conversation

yuyichao
Copy link
Contributor

@yuyichao yuyichao commented Jun 6, 2015

This fixes the issue mentioned in #11597 (comment) as well as the inconsistency in #11449

@JeffBezanson
@andrewcooke

@yuyichao
Copy link
Contributor Author

yuyichao commented Jun 6, 2015

I don't like changes that make weird "working" code raise an error but I think that's the right thing to do....

julia> NTuple{20000, Int}
ERROR: OverflowError()

julia> NTuple{20000, Int}
ERROR: OverflowError()

julia> NTuple{20000, Int}
ERROR: OverflowError()

julia> abstract C{T<:Union(Void, Int)}

julia> type D{T} <: C{T} d::T end

julia> D(1.1)
ERROR: TypeError: C: in T, expected T<:Union(Int64,Void), got Type{Float64}

julia> D{Float64}
ERROR: TypeError: C: in T, expected T<:Union(Int64,Void), got Type{Float64}

julia> D{Float64}
ERROR: TypeError: C: in T, expected T<:Union(Int64,Void), got Type{Float64}

@@ -2047,6 +2045,7 @@ static jl_value_t *inst_datatype(jl_datatype_t *dt, jl_svec_t *p, jl_value_t **i
ndt->ninitialized = ntp;
else
ndt->ninitialized = dt->ninitialized;
if (cacheable) jl_cache_type_(ndt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

superficial, but I think an extra newline before this would help readability

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tkelman I go further and add one before and after as well as making it a two line statement (last one is because I hate one like control flow statement in C/C++).

Would be nice if you can abort the old CI runs...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@carnaval
Copy link
Contributor

carnaval commented Jun 6, 2015

I think this was on purpose because if you have a recursive type you'd want only one instance created ?

@yuyichao
Copy link
Contributor Author

yuyichao commented Jun 6, 2015

@carnaval Good point. So is there a remove from cache or temporaryly add to cache function?

@carnaval
Copy link
Contributor

carnaval commented Jun 6, 2015

I'm not going through the source atm, could you explain what was the underlying problem ? Where is the cache used before the initialization is over ?

@yuyichao
Copy link
Contributor Author

yuyichao commented Jun 6, 2015

@carnaval The issue is that the functions after the type is cached can throw an error (for the two cases above, instantiation of super and calculation of the offset) If it happens, the type will stay in the cache with an invalid state. Causing code using them to behave unexpectedly

@yuyichao
Copy link
Contributor Author

yuyichao commented Jun 6, 2015

And I guess removing from cache may not be enough because it can be referenced to by other types as well....

@yuyichao
Copy link
Contributor Author

yuyichao commented Jun 6, 2015

Also please feel free to abort CI...

@yuyichao
Copy link
Contributor Author

yuyichao commented Jun 6, 2015

The only solution I can think of ATM is to have a temporary global type cache when instantiating the types and only update the real one if everything succeeds.

@carnaval
Copy link
Contributor

carnaval commented Jun 6, 2015

Even if it makes things more complicated it may be a case where we should do all the checks upfront and only instantiate when it is guaranteed not to fail. Although this doesn't help for some things like OOM exceptions along the way.

Removing from the cache is probably the only fully safe option. We could store a flag on each type saying if it is fully initialized. On error you go through the cache and remove the broken folks.

@yuyichao
Copy link
Contributor Author

yuyichao commented Jun 6, 2015

I've thought about that too but if I understand correctly the cache is bound to each typenames. Is there a way to iterate all typenames?

@yuyichao
Copy link
Contributor Author

yuyichao commented Jun 6, 2015

I'm surprised that the test passes.... Maybe we should add some tests for sth like self-refering types?

@yuyichao
Copy link
Contributor Author

yuyichao commented Jun 6, 2015

Interestingly, the following code doesn't cause an error on my branch.

julia> type B{T}
       b::B{T}
       B() = (b = new(); b.b = b; b)
       end

julia> B{Int}()
B{Int64}(B{Int64}(#= circular reference =#))

julia> abstract A{T}

julia> type C{T} <: A{C{T}}
       end

julia> C{Int}
C{Int64}

@carnaval Any idea why?....

@yuyichao
Copy link
Contributor Author

yuyichao commented Jun 7, 2015

@carnaval Actually I think this is probably not a problem. There's a jl_typestack_t* that records the types being constructed and from the comment here it seems that this is how recursive instantiation works. (This feels more correct instead of relying on the cache since there's also cachablility etc...)

The travis-CI is just the normal OOM error, the 32-bit AppVeyor is something different. I can't tell if it is related. @JeffBezanson I guess you should have the ultimate (if not the only) word here. (Only when you are not busy with other more important stuff though.)

@JeffBezanson
Copy link
Member

👍 but I'd like to briefly look into the CI failure.

@yuyichao
Copy link
Contributor Author

yuyichao commented Jun 8, 2015

@JeffBezanson The failure changes after a rebase. Any hint where to look? e.g. how would the cache interact with other part and why was the cache in that place to begin with (it seems strange to cache sth before it is done constructing....)

@tkelman
Copy link
Contributor

tkelman commented Jun 8, 2015

Same failure as in #10380 (comment), maybe tuple related?

@yuyichao
Copy link
Contributor Author

yuyichao commented Jun 8, 2015

The unpredictability of the failure make me think it might be GC related. I'm still going through the code but I've find one place that looks strange.

Is this a missing root or am I seeing things here?

Even if tuple types are always cached I feel like this is too weak an assumption to make here.

@carnaval

@yuyichao
Copy link
Contributor Author

yuyichao commented Jun 8, 2015

@tkelman = = is there a strip down list of those failures...................

@yuyichao
Copy link
Contributor Author

yuyichao commented Jun 8, 2015

From the blame afd4eb0#diff-757551c8929efe195c9bab04f9925d05R624 it could certainly be #10380 related...

@tkelman
Copy link
Contributor

tkelman commented Jun 8, 2015

More worth looking at the commits that I think fixed those issues, 7ec501d or b993658 - you're probably interacting with one of those changes here to bring back the same failures.

@yuyichao
Copy link
Contributor Author

yuyichao commented Jun 8, 2015

@tkelman You are right that the caching is indeed moving to the front in b993658

@JeffBezanson
Copy link
Member

@yuyichao Yes, the assumption there was that all concrete tuple types will be cached. In fact they have to be, or there will be many other problems.

@yuyichao
Copy link
Contributor Author

yuyichao commented Jun 8, 2015

In that case, IMHO this commit should only make a difference for functions that check the type cache which is only inst_datatype AFAICT. Since one of the caller of inst_datatype that can be called recursively (inst_tuple_w_) doesn't check the stack, should we move the stack checking in inst_datatype (after the cache check)?

P.S. what does these *_ *_w *_w_ names means?

@vtjnash
Copy link
Member

vtjnash commented Jun 8, 2015

ref #11320, which appears to also show the same issue (if you try to run the example twice)

JeffBezanson added a commit that referenced this pull request Jun 9, 2015
Do not cache the type before we finish constructing it.
@JeffBezanson JeffBezanson merged commit 057f573 into JuliaLang:master Jun 9, 2015
@yuyichao yuyichao deleted the delay-cache-type branch June 9, 2015 05:54
@yuyichao
Copy link
Contributor Author

yuyichao commented Jun 9, 2015

@JeffBezanson One of the failure seen in this PR shows up on AppVeyor for a few PR's again.....

@yuyichao
Copy link
Contributor Author

yuyichao commented Jun 9, 2015

So if this is called recursively, it can miss the stack in jl_apply_type_ on line 1997 in inst_datatype ?

@vtjnash
Copy link
Member

vtjnash commented Jun 9, 2015

i don't think checking the stack only is sufficient, since it may still be possible to cache a type that points to a partially constructed type (in this case, Vector{Z{I}}:

julia> type Z{I}
         a::Vector{Z{I}}
         b::NTuple{20000,I}
       end

julia> Z{Int}
ERROR: OverflowError()

julia> Vector{Z{Int}}
Array{Z{Int64},1}

@yuyichao
Copy link
Contributor Author

yuyichao commented Jun 9, 2015

So should I replace the stack with a temporary cache, make sure it is passed to all functions and only merge them to the global cache after everything is done?

@yuyichao
Copy link
Contributor Author

yuyichao commented Jun 9, 2015

Or is there a better way to do this?

@JeffBezanson
Copy link
Member

I wouldn't worry so much about this particular failure mode. The right approach is to fix #11597 and #11320. Then it will be very rare if not impossible for type instantiation to fail at this stage.

@yuyichao
Copy link
Contributor Author

yuyichao commented Jun 9, 2015

I guess someone could still write a very long structure? (or a structure of enough long tuples)?

@JeffBezanson
Copy link
Member

I think we'll have to allow more offset bits; uint16 is just too small.

@yuyichao
Copy link
Contributor Author

yuyichao commented Jun 9, 2015

I think we'll have to allow more offset bits; uint16 is just too small.

(And size.) Sure, that works too.... Just wondering if it is this short for some performance argument...

@JeffBezanson
Copy link
Member

I just wanted to keep it as compact as possible. Before the new tuples it made sense.

@vtjnash
Copy link
Member

vtjnash commented Jun 9, 2015

here's an updated example showing where it is in the cache:

julia> abstract A{I}

julia> type Z{I}
                a::A{Z{I}}
                b::NTuple{20000,I}
              end

julia> Z{Int}
ERROR: OverflowError()

julia> A.name.cache[1].parameters[1]
Z{Int64}

@yuyichao
Copy link
Contributor Author

yuyichao commented Jun 9, 2015

Although even if you increase the size to 32bit, it wouls still be possible to construct invalid types with a few recursions of sth like this without running out of memory

julia> type A{T}
       a::T
       b::T
       end

@JeffBezanson
Copy link
Member

Actually, it might be preferable to allow making types that are "invalid", and defer errors to run time e.g. when you try to construct an instance. For example you want to be able to ask "is (1,2) a 20000-element tuple?" The question makes sense even if you can't actually construct a 20000-element tuple.

We should probably use size_t. Or use uint16 for small structs, and size_t for bigger ones.

@yuyichao
Copy link
Contributor Author

yuyichao commented Jun 9, 2015

So e.g. it would be possible to construct an array of NTuple{20000, Int} but you can't push anything to it?

In this case, should querying the properties of the type return an error? If you want isa((1, 2), NTuple{20000, Int}) to be valid than I guess the answer is yes no. But in that case, I guess zeros() of it would happily allocate whatever size it think is appropreate? Or should we store what exactly is the error happening when instantialize the type?

Edit: .... yes -> no

@JeffBezanson
Copy link
Member

Most of all, we should fix #11597 and #11320. After all, what is the essential reason for the error here? None whatsoever, unless you try to actually allocate an object too big to fit in memory.

@carnaval
Copy link
Contributor

carnaval commented Jun 9, 2015

What if you get an OOM error while doing the instantiating of a type of reasonable size ? Agreed you're pretty much done at this point but if we're not going to even try handling those we might as well replace it with exit(1).

@vtjnash
Copy link
Member

vtjnash commented Jun 9, 2015

at some point, NTuple{ typemax(Int) - 1, NTuple{ typemax(Int) - 2, NTuple{ ... } } } is going to describe a massive data structure. i think we can fix that by conditionally switching from the current fixed size of offsets to a compressed computational format (trading off memory for flops).

@JeffBezanson
Copy link
Member

Yes, exactly.

@JeffBezanson
Copy link
Member

After fixing the offsets issue, the exit(1) approach does indeed start to make sense. However there's no need to be so unfriendly; we can print a big warning explaining exactly what happened and warning that the system might be in an unstable state. You want to give people a chance to save work.

@ScottPJones
Copy link
Contributor

Please don't take control out of the application developer's hands. Also, printing a big message (in fact, any message) is also not a good idea if you are not at the REPL, or there is no handler for the exception you should give.
If you run out of memory, you should free up any caches that the system might be keeping and run through a full GC, and if the memory allocation still cannot succeed, you should log information that this occurred, throw an OutOfMemory exception, so that the application can figure out how to proceed, freeing up any data structures it no longer needs, committing or rolling back any transactions, etc.

JeffBezanson added a commit that referenced this pull request Jun 16, 2015
reinstates and hopefully fixes the problem in #11606
fcard pushed a commit to fcard/julia that referenced this pull request Jul 8, 2015
reinstates and hopefully fixes the problem in JuliaLang#11606
@yuyichao yuyichao deleted the delay-cache-type branch July 25, 2015 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants