Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack allocation of tuples/immutables containing heap references #18084

Closed
wbhart opened this issue Aug 17, 2016 · 4 comments
Closed

Stack allocation of tuples/immutables containing heap references #18084

wbhart opened this issue Aug 17, 2016 · 4 comments

Comments

@wbhart
Copy link

wbhart commented Aug 17, 2016

If I have a function prototype something like this:

doit(a::Array{Tuple{Int, MyType}, 1}, b::Int, c::MyType)

where MyType is an ordinary Julia GC/heap allocated type, then the following line of code within the function seems to lead to a heap allocation of a tuple (the same is the case for any immutable type).

a[1] = b, c

There seem to be two issues here:

  1. There seems to be no efficient way to write an immutable value directly into an array in Julia. The above shouldn't require any allocation of any kind, stack or heap.

  2. Apparently, Julia GC/heap-allocates immutables if they contain heap references. I don't understand why, since surely the GC scans the stack conservatively for pointers.

Anyway, I have been unable to find a ticket or PR which deals with the above serious performance issue (which we hit in practice), directly.

(Obliquely) relevant tickets/PRs/tickets seem to include [1]. [2], [3] and [4] and of course the tupocalypse itself. There are also some relevant sounding things on Jeff's compiler optimisation tracker [5], though I am not clear if any are directly relevant.

[1] #8134
[2] #12205
[3] https://groups.google.com/forum/#!topic/julia-users/F_ncyfP2vxg
[4] https://groups.google.com/forum/#!topic/julia-users/LthfABeDN50
[5] #3440

@vtjnash
Copy link
Member

vtjnash commented Aug 17, 2016

I don't understand why, since surely the GC scans the stack conservatively for pointers

Because it's a precise scanner.

Closing because this is a feature request, and a duplicate of #12205

@vtjnash vtjnash closed this as completed Aug 17, 2016
@wbhart
Copy link
Author

wbhart commented Aug 17, 2016

Having precise GC doesn't preclude scanning the stack and registers conservatively, e.g. the Mono GC (though they call it "mostly precise").

I can understand closing the ticket on account of the fact that Julia just doesn't have this, and presumably #12205 will make it feasible to deal with point 2.

However, I still have two questions:

  1. How is point 1 above dealt with? Is it just an automatic outcome of fixing 2?

  2. Where does one put feature requests? Has the policy on putting these on the issue tracker changed? (I would normally think of this as a performance issue, rather than a feature request.)

Or maybe my point 1 wasn't sufficiently well explained. I am not looking for someone to write me some library code to do this. I was simply trying to open a ticket for the performance issue with the existing syntax. That's surely a Julia compiler issue, not a library feature request.

@JeffBezanson
Copy link
Member

JeffBezanson commented Aug 17, 2016

This is a perfectly legitimate issue, just a duplicate of #16190 #10899 #14955 #11714 plus the PRs already cited.

The case of a[1] = b, c isn't necessarily related to conservative stack scanning, since it's storing references into an array that's already on the heap. Arrays only support two memory layouts currently: a simple array of references, or an array of flattened immediate data. We can't yet handle a mixture of both in a single array.

What does the performance impact look like in your application? Is it GC time or run time? Do you have a sense of what the cost of the allocation is?

@wbhart
Copy link
Author

wbhart commented Aug 17, 2016

It's hard to get a precise sense of the performance impact. It arose when implementing a trick in a paper called "heap chaining". It's supposed to give us a 5-10x speedup when implemented. However, it made no difference whatsoever when I implemented it (carefully).

In our application, even the medium sized benchmark changed from a few hundred megabytes of total allocations to tens of gigabytes. I've implemented this trick in C before, so have a very good feel for what should happen.

The best I can offer Julia-wise is that performance jumped by a factor of about 1.5 overall when I implemented it a different way. Unfortunately we are still 50% slower than someone else's generic C implementation and a factor of about 15 from the C implementation of the authors of the paper.

I don't think gc time is the issue. I don't have a complete answer on that. I think we went from about 15% GC time to 0.1% when I implemented it a different way. That would suggest GC time wasn't the major component. However, I should add that subsequent computations and shutting Julia down take ages with the extra allocations in there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants