Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine the size of field descriptors using a estimated type size #11888

Closed
wants to merge 1 commit into from

Conversation

yuyichao
Copy link
Contributor

This fixes #11884 .

  1. Passing too many arguments to a function or constructing a structure/tuple of GB's size is arguably not a good idea. The primary goal of this PR is not to encourage that but I do feel like the current limit 16bit integer is a little bit too small.
  2. This does not solve the problem in issue long tuple problems #11320 (hopefully github's close issue hook does not pick this one up) so passing big tuples around / constructing them might still be an issue. On the other hand, this change should be almost orthogonal to that issue and won't make the situation much worse.
  3. The current limit is 2^31 - 1 and this is mainly because DataType.size is a Int32. According to @jiahao , this was because of not letting unsigned integer types polluting calculations. It might not be a problem now anymore but I think this limit should be good enough for now and I don't want to risk introducing subtle type stability issues.

@carnaval
Copy link
Contributor

I don't remember if you started doing it this way yesterday but it would make more sense IMO to switch between this 32 bit repr and an 8 bit repr (which will be "most" cases anyway, so we might at least take advantage of the change to scrap a little space here) based on a bit flag in the object header.

@yuyichao
Copy link
Contributor Author

@carnaval I had a look yesterday but I haven't started to fully implement it yet.

The issue is that we don't necessarily know the size of a field when we instantiate the type and I would have to give up and be conservative in such case. (Unless the GC support moving and resizing :P ). This can probably already save us a lot of memory and I just need to make sure the correct one is picked everywhere.

@carnaval
Copy link
Contributor

Yep, I remember now. So the "only" fully general solution is two passes ? I kinda like it but I'm not sure why. I'll probably reserve any opinion until tomorrow :)

@yuyichao
Copy link
Contributor Author

Yes, two pass may do it although you still need to be careful what to do with self referencing (direct or indirect) types and the type cache as well as where to store the information in the first pass.

IMHO, using two passes to solve this issue is a little bit too complicated and requires more work than if we only use it to resolve the issue of having invalid types in the type cache (which should arguably be fixed...). After all, this is invisible to user and is only for saving memory...

@carnaval
Copy link
Contributor

Self referencing is fine for sizing because it's always through a pointer. I agree that it's a bit over complex, but 4x (well, 2x for now) waste to cover edge cases bothers me.

@yuyichao
Copy link
Contributor Author

Yes, direct self referencing is fine since it can be special cased. However, I don't think indirect self referencing can be identified without actually trying to instantiate the type.

E.g. (I guess this is your "corner case")

julia> immutable A{T}
       c::T
       end

julia> immutable C{T}
       a::A{C{T}}
       end

julia> xdump(A{C{Int}})
A{C{Int64}}::DataType  <: Any
  c::C{Int64}::DataType  <: Any
    a::A{C{Int64}}::DataType  <: Any
      c::C{Int64}::DataType  <: Any
        a::A{C{Int64}}::DataType  <: Any
          c::C{Int64}::DataType  <: Any

@JeffBezanson
Copy link
Member

I rebased this. It would be really nice to get some version of this working before 0.4-final. Even a conservative approximation of when to use 16 or 32 bits would be better than the current situation. It's just too easy to write a big array literal and get an overflow.

@yuyichao
Copy link
Contributor Author

OK, I'll try to get a conservative version soon.

@JeffBezanson
Copy link
Member

Oops, I missed one merge conflict. Should be fixed now.

@yuyichao
Copy link
Contributor Author

yuyichao commented Sep 1, 2015

I added a conservative estimation of the type size. It was a little hard to find a balance between doing too much work and being too conservative. The current version should handle most of the cases except parametrized immutable types.

The current version passes compilation locally although there might still be issues. The overflow check also needs to be updated.

Let's see what the CI think and I'll also go through this again.

@yuyichao
Copy link
Contributor Author

yuyichao commented Sep 1, 2015

Surprised that it actually didn't segfault during the test...

@yuyichao yuyichao force-pushed the yyc/type_size branch 5 times, most recently from 536cd61 to 78d377b Compare September 8, 2015 13:57
@yuyichao
Copy link
Contributor Author

yuyichao commented Sep 8, 2015

@JeffBezanson Could you please have a look at the implementation of the size estimation?

I think there might be some cases where I'm too conservative but I believe (although I might be wrong). This is also not looking in the type cache/stack so it might be doing more passes than what is necessary.

@JeffBezanson JeffBezanson added this to the 0.4.1 milestone Sep 9, 2015
@tkelman
Copy link
Contributor

tkelman commented Sep 9, 2015

@JeffBezanson did you switch the 0.4.1 and 0.4.x milestones? I'd really rather have 0.4.x be nonspecific things and leave 0.4.1 for actually tracking what makes it into the first backport.

@JeffBezanson
Copy link
Member

Yes let's switch them. This particular issue is a good candidate for 0.4.1
though.
On Sep 9, 2015 1:58 PM, "Tony Kelman" notifications@github.com wrote:

@JeffBezanson https://github.com/jeffbezanson did you switch the 0.4.1
and 0.4.x milestones? I'd really rather have 0.4.x be nonspecific things
and leave 0.4.1 for actually tracking what makes it into the first backport.


Reply to this email directly or view it on GitHub
#11888 (comment).

@yuyichao
Copy link
Contributor Author

yuyichao commented Sep 9, 2015

This breaks the api/abi for jl_datatype_t so it would be nice to at least get the first half of this in 0.4.0 if we want this in any of 0.4.x.

@tkelman
Copy link
Contributor

tkelman commented Sep 9, 2015

Breaking embedding clients between 0.4.0 and 0.4.1 wouldn't be very nice of us. Let's not do that.

@yuyichao
Copy link
Contributor Author

yuyichao commented Sep 9, 2015

I reordered the commit so that the relatively safe part (and also the breaking part) should be all on the yyc/type_size-0.4.0 branch. This PR only have the size estimation commit on top of that.

tomasaschan added a commit to JuliaGeometry/Contour.jl that referenced this pull request Sep 15, 2015
tomasaschan added a commit to JuliaGeometry/Contour.jl that referenced this pull request Sep 15, 2015
@yuyichao yuyichao force-pushed the yyc/type_size branch 3 times, most recently from 767390f to b5c9999 Compare September 16, 2015 11:27
@yuyichao yuyichao changed the title Make size and offset in jl_fielddesc_t 32bit length Determine the size of field descriptors using a estimated type size Sep 17, 2015
@tkelman
Copy link
Contributor

tkelman commented May 12, 2016

@yuyichao is this still relevant? was tagged 0.4.x...

@yuyichao
Copy link
Contributor Author

This is still relevant. Maybe not particularly performance critical and needs a rewrite.

@tkelman tkelman removed this from the 0.4.x milestone May 12, 2016
@vtjnash
Copy link
Member

vtjnash commented Jul 4, 2016

superseded by #17231

@vtjnash vtjnash closed this Jul 4, 2016
@yuyichao yuyichao deleted the yyc/type_size branch July 4, 2016 04:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OverflowError when splatting array of arrays
5 participants