conservative stack scanning? #11714

StefanKarpinski · 2015-06-15T17:32:13Z

Instead of maintaining our own GC stack frames – which is slow and easy to mess up, likely accounting for a large portion of our tough-to-track-down core Julia bugs in the past years – we've occasionally talked about doing conservative stack scanning. That is, scanning the stack and conservatively assuming that anything that happens to look like a pointer to a valid heap-allocated Julia object actually is.

Pros:

much simpler
faster (RFC: Create local gc frame for throw #11508, Throw exceptions with arguments without creating a GC frame #11284)
makes stack allocation much easier (WIP: on-stack gc allocation #8134)
immutables with references trivially become as efficient as reference-free immutables
fewer segfaults

Cons:

might retain objects that should be freed
pointers passed into C code could still expect objects to live but not point to the head of the object
performance of stack scanning

The second con seems like it can be avoided with the same kind of rooting we do now, except only needed at the ccall entry point. The chances of the first con happening – i.e. random values in the stack pointing valid heap objects – seems pretty slim. There might be situations like when you take a pointer to an array and then use it – but in that case you probably want the array to remain rooted while someone has a pointer to it. Are there other potential cons that I'm not seeing?

Related: #10725, #10317.

cc: @carnaval, @vtjnash, @JeffBezanson

The text was updated successfully, but these errors were encountered:

yuyichao · 2015-06-15T17:38:09Z

The only thing I know about conservative vs precise stack scan is based on this and I quote.

...... Language implementations with automatic memory management often start out using exact rooting. At some point, all the careful rooting code gets to be so painful to maintain that a conservative scanner is introduced. The embedding API gets dramatically simpler, everyone cheers and breathes a sigh of relief, and work moves on (at a quicker pace!)

Sadly, conservative collection has a fatal flaw: GC things cannot move. If something moved, you would need to update all pointers to that thing — but the conservative scanner can only tell you what might be a pointer, not what definitely is a pointer. Your program might be using an integer whose value happens to look like a pointer to a GC thing that is being moved, and updating that “pointer” will corrupt the value. Or you might have a string whose ASCII or UTF8 encoded values happen to look like a pointer, and updating that pointer will rewrite the string into a devastating personal insult.

I know that we are propably even further from compacting but I'd like to know what the experts on these think...

StefanKarpinski · 2015-06-15T17:49:34Z

Our large amount of C interop makes it fairly unlikely that we'll ever be able to do moving GC.

JeffBezanson · 2015-06-15T17:57:12Z

It's not clear that we need to move objects. If we did, it would already require a lot of work, since the current rooting API assumes non-moving. That article seems to assume that moving is required for generational GC, but that isn't strictly true. You only need it for compacting or for a copying implementation of generational gc.

For us it's possible the conservative stack scan would be a bigger win than moving objects, especially given how important C interop is.

carnaval · 2015-06-15T17:58:37Z

To the pros I would add that we would be able to do many optimizations we do for bitstype on immutable types with no additional work. That's huge.

On the moving debate : we will never be able to do a fully relocating collector, because of C interop as Stefan said. However, it would be perfectly ok to do a so called "mostly copying" scheme, and we would need conservative stack scanning (to pin "C" objects) anyway.

The thing I don't agree on is that the chance of something on the stack looking falsely like a pointer is slim. The problem is that the stack is never cleaned up, so as it grows back you can end up overwriting parts (because of padding, alignment) of what was a pointer before. If you overwrite the LSB part you are on the contrary almost guaranteed to point inside a (different) julia object.

On the implementation side, the part taking care of pooled object is already "done" in my PR (which is where I encountered the partial pointer stuff btw). It's the easy one though because they are fixed size. To do this for "big" objects we need a datastructure to query if a pointer falls into a range of memory.

JeffBezanson · 2015-06-15T18:01:04Z

We can only properly evaluate false retention once we have a full implementation, and measure memory use in real workloads. Until then I'm not worried.

carnaval · 2015-06-15T18:02:32Z

As we already discussed, I'm not worried too much about the memory usage of false retention, but about the angry "my finalizer was not run after gc()" crowd. Anyway, you're right, we would need to implement big object conservative scanning first to see it in real life.

JeffBezanson · 2015-06-15T18:05:39Z

Well, you know how I feel about that. The only real solution is to use finalizers less and rely on them less.

ScottPJones · 2015-06-15T18:08:19Z

@JeffBezanson If you don't have reliable finalization, how do you handle externally managed memory reliably? Since C interop is so important, you need some way to be able to call C release code before something is GCed. Or am I missing something?

ScottPJones · 2015-06-15T18:10:31Z

@carnaval I agree that there'd be a high probability of seeing false positives of things looking like valid pointers still on the stack.

carnaval · 2015-06-15T18:12:33Z

"high probability" of seeing at least one when the programs run, sure. Now the thing which is hard to evaluate is : how long would false positive remain on the stack (after all if it is freed at the next call who cares), and with what density.

ScottPJones · 2015-06-15T18:24:06Z

Depends a lot on the application... if you have processes whose lifetime is measured in months or years, with the amount of stack fluctuating a lot, I think it might be more of an issue...

yuyichao · 2015-06-15T18:35:46Z

How reliable would it be to insert some mark on the stack to distinguish managed code and unmanaged code?

@carnaval Is the "almost moving" scheme a compacting collector that pins everything on the stack?

carnaval · 2015-06-15T18:44:40Z

You can mix conservative and precise stack frames, but the point of this is to get rid of the gcframe entirely, both in the runtime and the generated code. We can also have conservative scanning of C frames and precise scanning of julia frames using the llvm statepoint thing (more work).

The mostly copying stuff is a general technique, you just have a part of the roots you are not sure about. It can be the whole stack or only a part of it, or even some heap objects you wouldn't know the layout of (although we don't need this).

The canonical mostly copying (I know of) is http://www.cs.cornell.edu/home/fms/mcc/bartlett.html if you want to use copying as a marking segregation ("replacing" the marked bit / mark queue). Copying has other uses : compaction, generation segregation, ...

yuyichao · 2015-06-15T18:51:50Z

@carnaval Thanks for the explaination (and the pointer). I didn't know that the rooting can be fine grain classified like this before.

My feeling before was that compacting and the growing ability in LLVM to identify GC roots are the two things that are incompatible with conservative stack scan. Good to know that both of them are still possible. (Although they are still as hard to implement as they currently are now...)

StefanKarpinski · 2015-06-15T19:11:50Z

It's important that if we choose to do conservative stack scanning now we get all of the benefits immediately and can do more work to get back partial ability to do moving GC – which we're not even doing right now.

yuyichao · 2015-06-15T19:17:42Z

I totally agree and I find it very promising the first time @carnaval mentioned it to me.

It's just that when I want to find more detailed study about it, the internet is flooded with moving collector and there's not even many places mentioning non-moving generational collector.....

StefanKarpinski · 2015-06-15T19:26:59Z

The thing I don't agree on is that the chance of something on the stack looking falsely like a pointer is slim. The problem is that the stack is never cleaned up, so as it grows back you can end up overwriting parts (because of padding, alignment) of what was a pointer before. If you overwrite the LSB part you are on the contrary almost guaranteed to point inside a (different) julia object.

I'm trying to get a better understanding of this concern. Is the issue that you might have a pointer stored in a register or stack frame and then reuse the low byte or something like that and end up with the same register holding something that still looks like a pointer but points at something else entirely?

I should mention that at some point I had talked with @carnaval about wanting the ability to have interior pointers into an object pin the whole object (particularly useful with arrays). That would obviously make it much more likely that some accidentally pointer-like value would pin an object since you don't have to point exactly at the head of the object – pointing anywhere inside of it pins it. But I think the benefits of conservative GC outweigh that, so I'd be willing to forgo interior pointers for conservative stack scanning.

yuyichao · 2015-06-15T19:31:52Z

In that case, are we sure that the compiler optimization will not optimize a store of the original pointer out if we do not explicitly root it?

StefanKarpinski · 2015-06-15T19:40:52Z

Right, that's another concern – can we force LLVM to store pointers in a recognizable form on the stack?

yuyichao · 2015-06-15T19:44:55Z

But then this doesn't feel so much different from the current approach. I would expect a naive implementation to have the similar performance hit with the current rooting. It is probably a different story with LLVM gc support but I guess that's not even on the table at this point....

And we would also need to do the same with our c/c++ code.

JeffBezanson · 2015-06-15T19:46:11Z

You have to conservatively scan registers too.

simonster · 2015-06-15T19:55:28Z

jsc (WebKit's JS engine) actually uses a Bartlett-style collector with LLVM, so it can be done. Regarding false retention, their blog post says (under "Integrating LLVM with garbage collection"):

Retention of some dead objects is of course possible, and we can tolerate it so long as it corresponds to only a small amount of space overhead. In practice, we’ve found that so long as we use additional stack sanitization tricks – such as ensuring that our compilers use compacted stack frames and occasionally zeroing regions of the stack that are known to be unused – the amount of dead retention is negligible and not worth worrying about.

But Julia workloads are probably a bit different from JS workloads.

rbehrends · 2017-05-17T13:15:38Z

Of course, I suspect you need solutions that work now, not solutions that we can provide in a future release, so it's understandable if that's not quite sufficient in the short term. The GAP project is really exciting and I personally happy to do what we can to make sure it's a successful application of Julia.

Thank you, though obviously I also understand that your priority has to be the long-term health of Julia.

I don't actually need a solution right now (we're on what is essentially a 12-year project) and GAP is just one part of the whole thing where interoperability is a concern (which is why I'd like to focus on more general issues of interoperability), but we're at a stage where we're developing a roadmap so we need to figure out our options and their respective viability.

It feels like the best we can do it provide hooks into the marking phase of the GC and allow external integrations to do whatever marking they can by telling our GC where all the julia objects it knows about are. How it does that doesn't really have to be our business.

This is something that I had actually thought about. A minimal set of hooks that permitted this would be the following (I think):

An ability for foreign code to supply its own roots (which would be nice to have even with precise stack scanning).
Alternative ways to declare safepoints (this has little to do with conservative stack scanning as such and more with the needs of foreign code, which may be unable to inject safepoints at arbitrary locations). If there's already such a way that I've overlooked, feel free to correct me.
The aforementioned marking of foreign objects.
Registering the allocation and deallocation of bigval_t instances.
Hooking into the marking of tasks.

The first three would be generally good ideas to have for better C interop, I think – and I'm planning on writing up a separate feature request for those –, but the last two would expose internals of Julia and would at a minimum have to come with a big "can be deprecated at any time" warning label (any such external library would have to depend on some implementation details of the Julia GC, anyway).

yuyichao · 2017-05-17T13:21:33Z

Alternative ways to declare safepoints

What more alternative do you need? There's jl_gc_safepoint and I don't see why we need a different kind. (Or I guess if you mean alternative relative to allocation then jl_gc_safepoint is the one)

A minimal set of hooks

You also need to patch your code to issue write barrier.

rbehrends · 2017-05-17T13:50:54Z

What more alternative do you need? There's jl_gc_safepoint and I don't see why we need a different kind.

The current safepoint mechanism (again, correct me if I'm wrong) only allows for one point in time to be marked as a safepoint, not an interval. Like JIT-compiled code in the JVM, it will do a forced load from a specific page that triggers a segfault and the signal handler will then proceed to report to the GC thread that it is no longer working on any GCed memory. However, this does not account for cases such as long-running computations in third-party libraries or I/O where you cannot inject regular calls to jc_gc_safepoint(). For this, you want a mechanism where you say "we're not doing anything that interferes with GC between point A and point B".

You also need to patch your code to issue write barrier.

Obviously. I was talking about what was needed on the Julia side of things, and jl_gc_wb() and jl_gc_wb_back() are already in julia.h.

yuyichao · 2017-05-17T13:53:45Z

For this, you want a mechanism where you say "we're not doing anything that interferes with GC between point A and point B".

That's gc state transition and we do have that with jl_gc_safe_enter jl_gc_safe_leave

and jl_gc_wb() and jl_gc_wb_back() are already in julia.h.

What I mean is that you cannot do this without patching the C code. Not even with conservative stack scanning.

rbehrends · 2017-05-17T13:57:32Z

That's gc state transition and we do have that with jl_gc_safe_enter jl_gc_safe_leave

Ah, thanks for the correction.

What I mean is that you cannot do this without patching the C code. Not even with conservative stack scanning.

Yes, I am well aware of that.

StefanKarpinski · 2017-05-17T14:00:21Z

That's exactly what my mentioning of write barrier is based on. It has been tried for gmp and it doesn't work because you need to get GMP to trigger the write barrier.

It's unclear who you're replying to here.

yuyichao · 2017-05-17T14:02:36Z

It's unclear who you're replying to here.

The comment right above it. In particular, the first sentence which is what the comment is suggesting AFAICT

Another option for C interop wrt to memory allocation is using something like LD_PRELOAD to replace libc's malloc and free with Julia-aware versions.

yuyichao · 2017-05-17T14:06:58Z

Yes, I am well aware of that.

OK, so this why I don't think that's a good reason for doing conservative stack scan. FWIW, there's usually many more heap mutations than local variables (obviously there's more stack mutations but the JL_GC_PUSH<n> macros only needs to be called on variables creation, and not on each mutation) so adding the wb is the hard part. It's also much easier to find missing roots using either static analysis or stress testing. We've done both with good results. WB errors are much harder to catch.

Other requests all more or less make sense,

An ability for foreign code to supply its own roots (which would be nice to have even with precise stack scanning).

This is what a simulated ref counting interface is used for. A way to add callbacks to supply root is fine too but I think a ref counting API is easier to use.

Alternative ways to declare safepoints (this has little to do with conservative stack scanning as such and more with the needs of foreign code, which may be unable to inject safepoints at arbitrary locations). If there's already such a way that I've overlooked, feel free to correct me.

So we already do. Incidentally, this is also why I don't want to have an API for the next one right now, since this is the part that still needs a lot of tweaking.

The aforementioned marking of foreign objects.

This actually hooks much deeper into the GC than any other ones. It's in principle doable but certainly not via a stable API anytime soon.

Registering the allocation and deallocation of bigval_t instances.

Actually not sure what this one means.

Hooking into the marking of tasks.

I assume this is either (1) or conservative scanning. So this would be no or covered....

wbhart · 2017-05-17T14:11:39Z

For what it is worth, GAP already has issues with GMP (a dependency). Their solution at present is to reallocate GMP integers before each call to the mpz interface so that no reallocation occurs within calls to GMP. It's probably technically feasible to do a similar thing with MPFR, but it is not a general solution for all C libraries, obviously.

rbehrends · 2017-05-17T14:44:31Z

OK, so this why I don't think that's a good reason for doing conservative stack scan. FWIW, there's usually many more heap mutations than local variables (obviously there's more stack mutations but the JL_GC_PUSH<n> macros only needs to be called on variables creation, and not on each mutation) so adding the wb is the hard part.

I have to disagree here, based on my own experience with working with code that uses manual insertion of write barriers. Importantly, the proper use of write barriers is amenable to bespoke static verification even in C code (bespoke, because you may have adjust your tools for the specifics of how a particular piece of C code represents data), which incurs no runtime penalty other than the write barrier itself. This is different from explicitly exposing local variables.

This is what a simulated ref counting interface is used for. A way to add callbacks to supply root is fine too but I think a ref counting API is easier to use.

Both approaches have their pros and cons, obviously, but a callback has the advantage that the roots can be computed lazily, i.e. if they get updated more frequently than collections occur. In any event, I think this may be a discussion for a different place, once I get to write up a feature request for it.

yuyichao · 2017-05-17T16:50:21Z

I have to disagree here, based on my own experience with working with code that uses manual insertion of write barriers. Importantly, the proper use of write barriers is amenable to bespoke static verification even in C code (bespoke, because you may have adjust your tools for the specifics of how a particular piece of C code represents data), which incurs no runtime penalty other than the write barrier itself. This is different from explicitly exposing local variables.

Runtime penalty is not what I'm talking about. I'm only talking about

(b) not having to rewrite a large existing codebase; in particular, (b) often makes conservative stack scanning the only realistic option for retrofitting GCed allocation to such a codebase.

which is the only correctness issue that doesn't already have a clear solution that doesn't require conservative scanning and I'm just saying that conservative scanning can't do that either so that's not a good reason to do it.

For performance, I'll repeat that we should not do conservative scanning unless we've tried all other solutions.

rbehrends · 2017-05-17T18:02:29Z

Runtime penalty is not what I'm talking about. I'm only talking about

(b) not having to rewrite a large existing codebase; in particular, (b) often makes conservative stack scanning the only realistic option for retrofitting GCed allocation to such a codebase.

which is the only correctness issue that doesn't already have a clear solution that doesn't require conservative scanning and I'm just saying that conservative scanning can't do that either so that's not a good reason to do it.

Very well. There are three distinct issues here:

Performance of foreign code that interacts with the GC.
Verification of GC behavior for foreign code.
Ease of adaptation of foreign code for the GC.

As for performance, conservative stack scanning avoids separate GC frames entirely.

Regarding verification, the correctness of write barriers is largely a local property (and can in the absence of tools be also ascertained manually through code review). In order to ensure that a sufficient subset of local variables is put on the GC stack along all possible program paths, you need global analysis, probably combined with interprocedural dataflow analysis. This would have to work with multiple source files and function pointers.

I believe that ease of adaptation is also better (by which I don't mean it's trivial). Recall that we only need a write barrier when we're writing a pointer to a heap object. Assuming we can supply GC roots, we do not need a write barrier for assigning to a global or thread-local variable; this already reduces the amount of code that we have to deal with. Also, the write barrier only has to be performed before the next safepoint, which gives us a fair amount of leeway.

For local pointer variables we have to do more; recall that I'm not just talking about calling C code from Julia, where Julia references only occur as arguments at defined entry points of a module, but C -> Julia -> C -> Julia call chains and where foreign objects or global variables can contain Julia references. This means I cannot just guard the entry points to a C module, but – assuming that I don't want to use JL_GC_PUSH*() for every single local pointer variable – will have to possibly worry about every pointer read from a global variable, thread-local variable, or from the heap (this is necessary because even if you trace them as roots, the contents of those locations can change before the next GC, at which point the only live reference is on the stack). This, in essence, has the complexity of a read barrier, on top of which we still have to deal with the actual write barrier.

For performance, I'll repeat that we should not do conservative scanning unless we've tried all other solutions.

I don't disagree insofar as Julia code itself is concerned. But I am talking about the performance of foreign code here, which is a separate issue.

yuyichao · 2017-05-17T18:22:52Z

Regarding verification, the correctness of write barriers is largely a local property (and can in the absence of tools be also ascertained manually through code review). In order to ensure that a sufficient subset of local variables is put on the GC stack along all possible program paths, you need global analysis, probably combined with interprocedural dataflow analysis. This would have to work with multiple source files and function pointers.

Local variable rooting is also pretty local since they have defined behavior at function call site. write barrier is also not as local since there are cases we remove write barrier due to known old/young objects.

Also, the write barrier only has to be performed before the next safepoint, which gives us a fair amount of leeway.

This is not going to be the case.

assuming that I don't want to use JL_GC_PUSH*() for every single local pointer variable

From local information, you can do only a subset of it.

StefanKarpinski · 2017-05-17T18:45:04Z

When it comes to GMP and MPFR, we've considered having these types implemented as native Julia objects instead and only calling out to the libraries for complex operations. And of course, there's a ton of performance to be gained by immediate representation of smaller values – which presumably is what you're using the C-union between pointer and immediate values with tagging for. We should really work towards having efficient Julia-native support for this and only call out to GMP/MPFR when necessary. That would eliminate a lot of these problems in a much cleaner and more efficient way.

rbehrends · 2017-05-18T10:15:53Z

Local variable rooting is also pretty local since they have defined behavior at function call site.

Unfortunately, that is often insufficient. Without looking at callers, any argument to a function can be the sole live reference to an object, which means that if you're using purely local analysis, at a minimum all pointer arguments need to end up in GC frames. A simple example is:

f(pop(&global_stack));

Generally, we can eliminate JL_GC_PUSH*() if we can prove that another variable keeps the object alive or if a garbage collection will not occur within the scope of the local variable. In the general case, purely local analysis is insufficient for that.

write barrier is also not as local since there are cases we remove write barrier due to known old/young objects.

Write barriers are also relatively rare, as they are only need for pointer writes to the heap (for example, GAP, which has its own generational GC with manually inserted write barriers, has about 700 write barriers in some 160 KLOC worth of C code). It is also less of a constraint for the optimizer, as a write barrier can be just another read+write to the same part of the heap that was already updated (often the same cache line), whereas GC frames interfere with register allocation and live variable analysis. And if there's a performance problem along a hot path with write barriers, that can still be manually tuned.

I also note again that precise stack scanning doesn't help with write barriers, which is an orthogonal issue that arises on top of finding GC roots stored in local variables. Even if we adopted precise stack scanning for foreign code, the insertion of write barriers would still be necessary.

yuyichao · 2017-05-18T14:39:43Z

f(pop(&global_stack));

This is invalid.

Even if we adopted precise stack scanning for foreign code, the insertion of write barriers would still be necessary.

Right. And even if you implement conservative scanning, the insertion of write barriers would still be necessary too so

not having to rewrite a large existing codebase

Isn't true.

rbehrends · 2017-05-18T15:05:06Z

Hmm. I believe that we may be simply having some miscommunication here.

not having to rewrite a large existing codebase

Isn't true.

By rewrite I literally meant rewrite, as in wholesale alterations; I don't mean that no alterations are necessary, just that they are kept at a manageable level. Not needing any changes at all would be unrealistic: even dropping in a fully conservative GC (such as the Boehm GC) cannot generally be done entirely without changes; you may have to worry about zeroing a free list to prevent leaks, for example.

The rationale behind this is the software engineering cost involved, which need not be zero, just reasonable. Of particular interest is the case where one controls the upstream version, too, and Julia integration is essentially a configuration flag.

I can see how this could be read differently, but that's not what I was getting at.

yuyichao · 2017-05-18T15:13:35Z

By rewrite I literally meant rewrite, as in wholesale alterations

OK.

Still. I think #11714 (comment) is a better way forward since it is way too early to expose any part of the julia GC to a public API.

StefanKarpinski · 2017-05-18T20:19:38Z

@rbehrends: my understanding from what you said previously was that for at least some of your dependencies, literally no changes could be made – down to the shared binary library level. If that's not the case, that certainly opens up many more options.

DemiMarie · 2017-05-18T21:09:30Z

On Wed, May 17, 2017 at 11:45:19AM -0700, Stefan Karpinski wrote: When it comes to GMP and MPFR, we've considered having these types implemented as native Julia objects instead and only calling out to the libraries for complex operations. And of course, there's a ton of performance to be gained by immediate representation of smaller values – which presumably is what you're using the C-union between pointer and immediate values with tagging for. We should really work towards having efficient Julia-native support for this and only call out to GMP/MPFR when necessary. That would eliminate a lot of these problems in a much cleaner and more efficient way.

That is what GHC does in its integer-gmp library

rbehrends · 2017-05-18T21:14:53Z

@StefanKarpinski It's not that we're dealing with a single situation. We will likely have to deal with both dependencies where we indeed can't change a thing (GMP has been mentioned) and where we have to work around this limitation, but we'll also have code that's custom-written for interacting with Julia where we can do whatever we want; for yet other dependencies we may have different stakeholders with different (and possibly conflicting) goals involved.

But that's why I wanted to keep focused on the technical aspects of FFIs, with a looks towards having some flexibility in how to deal with interoperability. Having more or more flexible options generally means that one is less constrained by backwards compatibility concerns, software engineering constraints, or (in the worst case) project politics.

yuyichao · 2017-05-18T21:53:35Z

But that's why I wanted to keep focused on the technical aspects of FFIs, with a looks towards having some flexibility in how to deal with interoperability. Having more or more flexible options generally means that one is less constrained by backwards compatibility concerns, software engineering constraints, or (in the worst case) project politics.

Having to provide more flexible options also means that julia itself will have to be constrained by all of the above. As a recent example, I couldn't have done #21590 without breaking public API or having to emulate it if any part of the GC marking internals are exposed.

For interfacing with C, I believe the best option is to use them like C.
There are C code that are writen for multiple languages/users (GMP for example) and there are others that can rely on / link to julia runtime. For the foreseeable future I believe the vast majority of the libraries will not link to libjulia.

The C libraries that are already writen are usually writen with C memory management in mind and are meant to be used that way so I believe we'll get the largest benefit if we can provide better support for that and that's exactly #11714 (comment)

For C libraries that can be modified and adapted to julia should do that by incrementally porting the C code to julia. Works are being done to make that easier (e.g. #21912, which will make it much easier to declare and manipulate nested C structures). Even for libraries that can't link with julia, if it has reasonable low level API we should be able to port away from it's high level APIs and implement some of those in julia. This is effectively what we want to do with BigInt

Both of the two ways does not rely on deep changes to the language and can benefit all users instead of one particular use case.

On the other hand, if we try to efficiently support the memory management of one particular C library, it'll put a lot of constraint on what we can do/change in julia. Note that this is talking about literally the lowest level implementation in julia since the GC is the only part of the runtime that doesn't call another other major parts (obviously other than true external dependencies). There's also usually no standard ways to do all kinds of advanced tricks (e.g. tagged pointers) in C so having direct support of it would not be useful for FFI in general unless we decide to also make use of it in julia itself for other reasons.

We are quite different from many other scripting languages like python in that you should write glue code in julia and not C. This is possible due to the JIT and is also why pypy prefer cffi rather than cpyext. If there's something missing that makes it harder to wrap C code efficiently, the preferred way to fix it is to support that in julia and not exposing more C apis (i.e. the two ways mentioned above).

wbhart · 2017-05-18T22:18:43Z

@yuyichao I think our project has quite different requirements than interfacing with a C library. Gap is written in C, but it is itself a garbage collected (interpreted) programming language with additional dependencies. A large amount of effort has already been expended by @rbehrends to write bespoke tooling to insert write barriers in the Gap codebase (@rbehrends is the principal architect of HPC Gap).

You've indicated that it may be too early to formalise any Julia GC API. But my gut feeling is that we may still be better offering to contribute to the design and implementation of an experimental/unsupported interface in a branch, rather than abandon entirely the approach @rbehrends has been steering us towards.

@StefanKarpinski Our project is a collaboration between existing projects, including those of us who are already implementing in Julia. We don't have control over the repository of Gap or its dependencies. Some of their key developers are excited about Gap<->Julia interop and visited us for a week to hack on some potential ways to do that.

yuyichao · 2017-05-18T22:20:09Z

it is itself a garbage collected (interpreted) programming language with additional dependencies

So it cannot be used in C?

wbhart · 2017-05-18T22:24:38Z

@yuyichao That is one of the things we investigated with the Gap developers when they visited recently. It turns out that there is a project called libgap, which will make it possible to call into Gap from C, essentially. We prototyped calling this from Julia (without GC considerations for now), and calling Julia from Gap (again, in a simple way, without GC considerations). Most of the functionality in Gap is actually implemented in the Gap language, not the C kernel of Gap. This is one of the things that makes Julia so attractive for them. Gap is not JIT compiled, but interpreted, in a very dynamic way. Julia would be fantastic as a Gap library language for serious developers.

wbhart · 2017-05-18T22:37:31Z

I should probably add that Gap is only one of a number of languages that we will make use of in our collaboration. Another is Singular, where @rbehrends has again invested much expertise and time into concurrency for us. There is also Polymake, which is basically a hacked version of Perl 5. So Gap should be seen as an example of something more general, not a singularity.

We do also have a number of large C libraries that we are also using, so we of course care about good C interop too. So the suggestions you and @StefanKarpinski are making with regard to C are also of interest to us.

Keno · 2017-05-18T22:41:04Z

This might be a bit out there, but have you considered writing a Gap interpreter in julia?

wbhart · 2017-05-18T22:46:46Z

@Keno Yes, in fact a Gap developer actually started working on such a thing, but unfortunately abandoned it.

At present we haven't allocated funding to do such a thing for Gap, but we have allocated funding to do such a thing for Singular, which is a much simpler language. The real problem with Gap is the exceptionally dynamic nature of its type system. It doesn't map 1-1 with Julia. The much more interesting problem, for them and us, appears to be using Julia as a library language.

Having said that, there is great interest in rewriting a mathematical language, called Cap, which is currently implemented in Gap, in Julia. That will likely go ahead, from what I heard most recently.

Sorry to get right off topic here.

rbehrends · 2017-05-19T08:39:14Z

This might be a bit out there, but have you considered writing a Gap interpreter in julia?

This and related approaches (such as JIT-compiling GAP code or having a statically typed and compiled extension of the GAP language) have been considered in the past, but so far, this has gone nowhere.

Challenges would be:

It would be a massive amount of work, especially considering that fairly precise backwards compatibility has to be maintained.
The way GAP represents objects cannot currently be cleanly emulated in Julia (or most other high-level languages, for that matter). Not without significant performance overhead, at least.
There are dozens of existing GAP packages that rely on GAP's C API.

I'll also note that if it were just about GAP <-> Julia interoperability, we could simply ship a patched Julia with GAP (enabled to support conservative stack scanning). But the Oscar platform is intended to be more modular and extensible than that, so we'd like to get interop right.

(Separately, there's still the current issue of performance related to GC frames within Julia proper, which is an independent concern, but my understanding is now that these issues will be addressed through LLVM-supported precise stack scanning.)

StefanKarpinski added the needs decision A decision on this change is needed label Jun 15, 2015

StefanKarpinski changed the title ~~conservative stack scanning~~ conservative stack scanning? Jun 15, 2015

StefanKarpinski added performance Must go faster complex Complex numbers multithreading Base.Threads and related functionality and removed complex Complex numbers labels Jun 15, 2015

JeffBezanson added GC Garbage collector and removed multithreading Base.Threads and related functionality labels Jun 15, 2015

yuyichao mentioned this issue Jun 7, 2017

Export jl_get_function #22190

Closed

conservative stack scanning? #11714

conservative stack scanning? #11714

Comments

StefanKarpinski commented Jun 15, 2015

Pros:

Cons:

yuyichao commented Jun 15, 2015

StefanKarpinski commented Jun 15, 2015

JeffBezanson commented Jun 15, 2015

carnaval commented Jun 15, 2015

JeffBezanson commented Jun 15, 2015

carnaval commented Jun 15, 2015

JeffBezanson commented Jun 15, 2015

ScottPJones commented Jun 15, 2015

ScottPJones commented Jun 15, 2015

carnaval commented Jun 15, 2015

ScottPJones commented Jun 15, 2015

yuyichao commented Jun 15, 2015

carnaval commented Jun 15, 2015

yuyichao commented Jun 15, 2015

StefanKarpinski commented Jun 15, 2015

yuyichao commented Jun 15, 2015

StefanKarpinski commented Jun 15, 2015

yuyichao commented Jun 15, 2015

StefanKarpinski commented Jun 15, 2015

yuyichao commented Jun 15, 2015

JeffBezanson commented Jun 15, 2015

simonster commented Jun 15, 2015

rbehrends commented May 17, 2017

yuyichao commented May 17, 2017

rbehrends commented May 17, 2017

yuyichao commented May 17, 2017

rbehrends commented May 17, 2017

StefanKarpinski commented May 17, 2017

yuyichao commented May 17, 2017

yuyichao commented May 17, 2017 • edited Loading

wbhart commented May 17, 2017

rbehrends commented May 17, 2017

yuyichao commented May 17, 2017

rbehrends commented May 17, 2017

yuyichao commented May 17, 2017

StefanKarpinski commented May 17, 2017

rbehrends commented May 18, 2017

yuyichao commented May 18, 2017

rbehrends commented May 18, 2017

yuyichao commented May 18, 2017

StefanKarpinski commented May 18, 2017

DemiMarie commented May 18, 2017 via email • edited by StefanKarpinski Loading

rbehrends commented May 18, 2017

yuyichao commented May 18, 2017 • edited Loading

wbhart commented May 18, 2017

yuyichao commented May 18, 2017

wbhart commented May 18, 2017

wbhart commented May 18, 2017

Keno commented May 18, 2017

wbhart commented May 18, 2017

rbehrends commented May 19, 2017 • edited Loading

yuyichao commented May 17, 2017 •

edited

Loading

DemiMarie commented May 18, 2017 via email •

edited by StefanKarpinski

Loading

yuyichao commented May 18, 2017 •

edited

Loading

rbehrends commented May 19, 2017 •

edited

Loading