Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add garbage collector to std::gc #11399

Closed
wants to merge 21 commits into from
Closed

Add garbage collector to std::gc #11399

wants to merge 21 commits into from

Conversation

huonw
Copy link
Member

@huonw huonw commented Jan 8, 2014

Not ready for merging.

Summary

Tracing non-generational task-local GC, with stop-the-task non-incremental collections. The GC stores nothing inline in any values, and so doesn't need a header and a change of representation of generic types (like @ does).

Includes two new modules std::{libvec, uniq} for examples of library defined versions of ~[] and ~ respectively, which use the rooting API defined in std::gc to properly hold references to GC pointers (unlike ~, @, ~[] and @[]). (These modules are mostly demonstrations, not necessarily designed for landing.)

Note: when looking over this code, keep in mind I have pretty much no idea what I'm doing, so if something seems stupid, unconventional or silly; it almost certainly is.

Details

This adds a #[managed] annotation and an intrinsic reachable_new_managed::<T>() -> bool (cf. the old owns_managed intrinsic, renamed to owns_at_managed in this PR). The intrinsic is designed to check whether a type contains any #[managed] annotated types, so that the library types can avoid touching the GC if they aren't storing GC'd pointers

The GC is conservative on the stack: when checking for GC'd references on the stack, it will scan every word and any bitpattern that matches the pointer to the start of a GC'd pointer or one of the other pointer types will be considered a valid region.

It does support finalisers, which are just run on the memory when a pointer is determined to be unused; so Gc<T> uses this to run the destructor of T (if it has one).

The rooting API mentioned above is simply a function register_root_changes<T>(removals: &[*T], additions: &[(*T, uint, TraceFunc)]) which lets us indicate that certain regions are no longer possibly rooting Gcs, and add regions (*T, uint, TraceFunc) == (start pointer, metadata, function to start tracing) that possibly are rooting Gcs. The trick with this API that stops us having to scan everything is being generic: it knows the type T a pointer contains and statically if a certain type T can contain #[managed] types via the intrinsic. So if T can't contain managed pointers, there's no need to register it with the GC, so all one needs to do to make things GC safe is (unconditionally) pass a pointer of the appropriate to the relevant memory regions and the std::gc library will automatically figure out (by calling the intrinsic) if it actually needs to register those regions. In particular, this means that programs that never use any GC'd types can have the GC code removed, because nothing will call it. (register_root_changes will be inlined and reduce to a no-op for non-managed types.)

The metadata mentioned above is arbitrary (not examined by the GC) and can be set with update_metadata, this is essentially just designed to allow storing the length of vectors for tracing.

There are a few commits which act as an example for this API: I add the library vector type from strcat's rust-core, and then make the appropriate adjustments to make it GC safe (5 calls to register_root_changes), and also add an equivalent to ~, std::uniq::Uniq<T> which is similarly GC-safe.

Unfortunately to support tracing, these types (Uniq and Vec) require a Trace bound on their contents, which is unfortunate, as they should be allowed to store non-tracable types if those types don't contain any Gc<T> pointers.

Problems

  • Done: lang items to register/deregister ~ allocations as appropriate On that note... The largest problem with this PR is ~, @, ~[] and @[] do not act as roots. That is, having a pile of Gc pointers such that the only reference to them is a ~[Gc<int>] will cause them to be considered unreachable and be garbage collected. As such, I've marked any method that could demonstrate a symptom of this problem (the various .borrows) as unsafe. It's probably possible to do something with lang-items to get them to work... but, personally having them as library types seems simpler (look at how simple it was to add GC support to the two new modules: adding support to Rc would be as easy as it was to add to Uniq too.)
  • similarly, global variables are not considered (seems like this shouldn't be supported by default: the user can call register_root_changes themselves to register one if they must)
  • Done: scanned conservatively for now --- task-local storage is not considered
  • The GC is slow (about 20x 6.5x 5x 4x slower than straight malloc for a microbenchmark of just doing a lot of allocations) and memory hungry. Reasons:
    • Done: allocations less than 1 MB are (fairly basically) cached and reused giving a 50% performance boost --- every allocation is a call to malloc, and every unreachable pointer is actually freed (I'm working on caching unused allocations now)
    • std::trie is slow, and I'm using it wrong, and it's possibly not the best data structure.
      • Done: caching --- the GC actually spends most of it's time in trie methods like insert and remove, if allocations were cached and not removed from the trie this would help a lot
      • it breaks a uint into chunks of 4 bits from the most significant bit, and most malloc'd pointers agree on their first 30 bits, so we spend a lot of time just traversing that with no ability to distinguish between keys (very hard to fix, since the only way to get the correct order is to traverse in this manner; requires a path-compressing trie)
      • it could just do with some general optimisation (possibly with some unsafe)
    • it has to do a Local::borrow(None::<Task>) twice on every allocation (to retrieve and return the GC, see below) (although perf indicates the vast majority of the time is spent in collection)
  • Done: some restricted & careful unsafe code --- It has to actually remove the task-local GC from the task struct during allocation and collection (so that we can unborrow the Task because that would be far worse than what I'm about to describe), which means that any finalisers that need to call into the GC (like those that need to unregister roots) will crash, in particular, a type like Gc<Vec<Gc<T>>> will fail because the Vec destructor calls register_root_changes.
  • fail!-ing finalisers aren't considered at all, and also cause failure. (Both of these can hit the double-unwinding and cause an abort.)
  • finalisers are memory unsafe w.r.t. to cyclic objects, but I'm not 100% sure this is a problem: we already restrict destructors and require #[unsafe_destructor] so it's the users "fault" if they crash due to this
  • it requires more instrumentation & statistics
  • the GC could be generational, but isn't currently. There is a pseudo-API designed to (in theory) support this, defined on Gc (all are Gc<T> -> &T):
    • borrow for only Freeze types, that does not have any write barriers
    • borrow_write_barrier, the general borrow that does have a write barrier; although in theory the write barrier could be elided when owns_new_managed::<T> is false (since any writes couldn't add/change references to Gc pointers)
    • borrow_no_write_barrier; same as .borrow, but implemented for all T and unsafe, designed for when someone is definitely sure they're going to be reading only, or are not going to be changing any Gc references.
  • not enough in-source tests (easy enough to fix)
  • The added modules std::{uniq, libvec} have no documentation or tests (since they're just designed to exhibit the rooting API)
  • The tracing API means Uniq and Vec are less flexible that desirable (which flows downstream to any generic users of them) because they require a Trace bound to be able to register a handle to run when discovered by a conservative scan. Possible solutions:
    • having separate non-trace constructors (and non-trace .push and .pop for Vec!) but this would then require similar contortions downstream;
    • some sort of explicit TraceOrNonManaged bound (which would also require downstream generic to have that bound)
    • have the compiler enforce that Gc can only go into Trace types (a little like transmute requires types of the same size)
    • have the compiler generate trace glue like drop glue, etc., and then retrieve this for tracing (would require a call_trace_glue intrinisic, but then we'd have to have some way to get the appropriate types and information into it).
  • doesn't support interior pointers

Because of this list (mainly the memory-unsafety problem with ~ etc not acting as roots), I've marked std::gc::Gc as #[experimental].

@huonw
Copy link
Member Author

huonw commented Jan 8, 2014

cc @pnkfelix, @pcwalton, @nikomatsakis (anyone else)

@huonw
Copy link
Member Author

huonw commented Jan 8, 2014

I've filled in more details.

@bill-myers
Copy link
Contributor

How about special-casing RefCell and using the RefCell borrow flags to implement write barriers? (that is, add a new GcWriteBarrier state that causes borrow to append the RefCell address to a task-local modification log)

This way, they should work automagically with no API change, and allow safe code to borrow with no write barriers (since it's done at the proper place, i.e. the RefCell that can potentially give &muts and not the Gc which doesn't).

[Cell can only contain POD types, so the GC can ignore them]

Of course, the GC needs to be made fully accurate and aware of types beforehand, but that's essential anyway.

@huonw
Copy link
Member Author

huonw commented Jan 8, 2014

That may work, but it requires chaining RefCell and Gc together, while the former is definitely useful outside of Gc... would it make RefCell slower in the non-Gc case?

Also, RefCell isn't the necessarily only type with interior mutability; people can define their own outside of libstd. In any case, Gc as implemented here does satisfy Pod, we may wish to not have this but satisfying Pod is certainly convenient.

@pcwalton
Copy link
Contributor

pcwalton commented Jan 8, 2014

Excellent work!

For the trie, here's the gold standard: http://www.hpl.hp.com/personal/Hans_Boehm/gc/tree.html

@glaebhoerl
Copy link
Contributor

Cool!

This seems like a good place to float an idea I've been thinking about: what about doing tracing with a trait?

I've never written a GC before, only read a few papers. Most GCs seem to do tracing entirely based on dynamic information: a header attached to each heap object, and/or runtime calls to register/deregister traceable areas of memory (as here). Instead of that, what if we had a trait like:

trait Trace {
    fn trace(&self, ...);
}

The impl for most types would call trace() on each of its constituents in turn (same method, different impl). If there are none, it would do nothing. For Gc<T> itself, it would mark it as alive, before continuing to trace() the interior. When types are known statically, all of this would be normal function calls which could be inlined and optimized down to efficient code. Where there's an unknown type (a trait object or closure), it would store a Trace vtable (this being the "header") and make the call dynamically (after which it would be back to static calls).

If necessary, trace() could be passed some parameters or a closure by the GC, which it could be expected to react to resp. call back.

For the built-in types except *T a Trace impl would pre-exist. For most user-defined types, it could be derived either manually (deriving(Trace)) or automatically (like Drop is), unless the type contains another type which doesn't implement Trace. In this case Trace could be defined manually. For example, types which contain pointers into foreign data structures could use Trace to define how to trace through those data structures for pointers back into Rust, which is kinda cool.

An invariant that would have to be maintained in the type system is that data which transitively contains Gc<T> could not be stored inside any type which doesn't implement Trace. (I was thinking the "does type contain GC data" part could be done with something like #10879.) Here an interesting question is whether closures and trait objects would default to implying a Trace bound (like with Drop), or if it would have to specified manually. (This would determine whether or not they can close over managed data.)

@huonw
Copy link
Member Author

huonw commented Jan 8, 2014

That would certainly be precise.

However, it's not immediately clear to me how it interacts with Rust's stack, since AIUI, being precise on the stack (to work out which .trace() to call on what piece of memory) is hard work; does LLVM have support that can assist us?

(Also, would that preclude performance optimisations, so that in applications just using the GC in only one task get slowed globally (which this GC doesn't do) because LLVM can't reorder/SROA things on the stack?)

@bill-myers
Copy link
Contributor

That may work, but it requires chaining RefCell and Gc together, while the former is definitely useful outside of Gc...

Not exactly, although the behavior would obviously be triggered only for RefCells that the garbage collector sees

would it make RefCell slower in the non-Gc case?

No, because it already needs to check the flags to make sure it's not already borrowed, so the check for a write barrier and for the value already being borrowed can be done together at no extra performance cost.

Also, RefCell isn't the necessarily only type with interior mutability; people can define their own outside of libstd.

This would have to banned by adding a FreezeExceptNonFreezeTypesWhichSupportWriteBarriersOrCannotHoldGc kind bound to Gc or something like that.

Is there any real use case for this?

In any case, Gc as implemented here does satisfy Pod, we may wish to not have this but satisfying Pod is
certainly convenient.

It would either need to stop being Pod, or Cell needs to also have a NonManaged bound.

Overall, the issue is that write barrier functionality must be where you can get &muts, since otherwise you cannot safely request to borrow without write barriers (and in fact your code makes such functionality unsafe), which seems unacceptable.

So the other alternative is to add a Freeze bound to Gc and add a GcMut, but that's less flexible, and it appears the plan is to move away from that model towards using Cell and RefCell inside "immutable" types.

@glaebhoerl
Copy link
Contributor

However, it's not immediately clear to me how it interacts with Rust's stack, since AIUI, being precise on the stack (to work out which .trace() to call on what piece of memory) is hard work; does LLVM have support that can assist us?

This occurred to me as well. I plead total ignorance. How was the planned "conservative on the stack, precise on the heap" GC going to do this? With headers on all heap objects?

(Also, would that preclude performance optimisations, so that in applications just using the GC in only one task get slowed globally (which this GC doesn't do) because LLVM can't reorder/SROA things on the stack?)

Can SROA take something off the stack entirely and put it in a register? Would this GC handle that?

EDIT: Would it be unreasonable to have the compiler explicitly generate code to call register_root(&stack_val as &Trace) for each stack variable that can contain managed data (and presumably unregister as well at end of scope)? Maybe not the most performant thing in the world, but seems like it could work, be precise, and not have a negative effect on code that's not using GC.

@thestinger
Copy link
Contributor

On Wed, Jan 8, 2014 at 2:38 PM, Gábor Lehel notifications@github.com wrote:

Can SROA take something off the stack entirely and put it in a register?

Sure, the mem2reg pass does exactly that. SROA is an amalgamation of
many more things than just replacing aggregates with scalars and
likely does it too. It's free to destroy any debug information when
optimizations are enabled. GCC abandoned any hope of preserving this
information at -O1 and switched to exposing an -Og flag instead
along with -fvar-tracking-assignments. LLVM isn't there yet, and
doesn't have a solution for debugging in the presence of optimizations.

LLVM is even free to remove calls to malloc, realloc and free
and replace them with memory on the stack or in registers. It
currently only does this for dead stores, but the intent of the C
standard is to permit escape analysis and it should land soon.

@huonw
Copy link
Member Author

huonw commented Jan 9, 2014

This occurred to me as well. I plead total ignorance. How was the planned "conservative on the stack, precise on the heap" GC going to do this? With headers on all heap objects?

My current plan was storing the information in the table in the GC. (I.e. expand the scan field, which is a crude form of "precision".)

This would have to banned by adding a FreezeExceptNonFreezeTypesWhichSupportWriteBarriersOrCannotHoldGc kind bound to Gc or something like that.

Is this a serious suggestion?

In any case, I'm not too concerned by the (as yet hypothetical) write barriers for now:

  • Many types are Freeze, and so no write barriers at all
  • The intrinsic will allow eliding the write barriers in many instances
  • Gc is marked #[experimental], so people will get compiler warnings when they use it, hence making changes like that (if necessary) is possible

@huonw
Copy link
Member Author

huonw commented Jan 9, 2014

@glaebhoerl I've been thinking about precision/tracing, and I get the feeling that your suggestion does actually work (I'd been thinking kinda-similar thoughts, but it was late and I got distracted by your mention of stack precision):

trait Trace {
    fn trace(&self, gc_info: &mut GcInfo);
}
impl<T: Trace> Trace for Gc<T> {
    fn trace(&self, gc_info: &mut GcInfo) {
         if gc_info.mark_reachable(self) { // returns true if already traced
             x.borrow().trace(gc_info);
         }
    }
}

impl<T: Trace> Trace for RefCell<T> {
    fn trace(&self, gc_info: &mut GcInfo) {
        self.value.trace(gc_info);
    }
}

impl Trace for int { fn trace(&self, _: &mut GcInfo) {} }
impl<T> Trace for *T { fn trace(&self, _: &mut GcInfo) {} }
// and similarly for the other basic types (I don't think we can/should
// impose any particular tracing semantics on `*`?)

// these are registered as roots for (precise) scanning separately, or have
// been registered to be included in any conservative scans (and have 
// proper impls here). (The latter is probably better; see below.)
impl<T> Trace for Uniq<T> { fn trace(&self, _: &mut GcInfo) {} }
impl<T> Trace for Vec<T> { fn trace(&self, _: &mut GcInfo) {} }
impl<T> Trace for Rc<T> { fn trace(&self, _: &mut GcInfo) {} }

// etc.

Then we could have a deriving mode that takes

#[deriving(Trace)]
enum Foo {
    X(Gc<int>, Vec<Gc<int>>)
    Y(int, int)
    Z(int, Gc<Vec<int>>)
}

and generates (after inlining & removing the no-op methods)

impl<T: Trace> Trace for Gc<T> {
    fn trace(&self, gc_info: &mut GcInfo) {
        match *self {
            X(a, _) => { a.trace(gc_info) }
            Y(_, _) => {}
            Z(_, b) => { b.trace(gc_info) }
        }
    }
}

And then <T> Gc<T> becomes <T: Trace> Gc<T>, and we keep a way to call the relevant trace of each known pointer. Unfortunately, this means that trait objects don't work very well in Gc: every trait used would need to inherit from Trace. And it also means that any generic data-structures that wish to be precisely traced need a Trace bound, which is rather unfortunate. :(

We could make Trace super special and automatically derived for all types, unless there is a manual implementation; or have an intrinsic like get_tracing_function<T>() -> Option<fn(...)>, which will give None/Some based on whether there is such an method, so we can default to a conservative scan if necessary (this last one seems more feasible than the first). [Possibly with a lint for using non-traceable types in the GC.]

Re Uniq, and Vec and so on: having them registered to be traced by a conservative scan seems like a very good idea, however, the current scanner only recognises pointers that point exactly at the start of it's registered allocations, not interior pointers; in practice this probably isn't too much of a limitation, since it would be very weird for something to lose its pointer to the start of an allocation.

Also Trace seems rather similar to Encodable, but Encodable is in extra and we are not; and Encodable is a bit of weird bound to have. (And would be very weird to add as special to the compiler for the intrinsic mentioned above.)

In any case, this seems significantly more feasible than I first thought; I'll experiment.

(Also, I'm not sure how this API would work if we were to try to support other tracers, rather than just the one in libstd, I guess GcInfo could be a trait...)

@glaebhoerl
Copy link
Contributor

impl<T> Trace for *T { fn trace(&self, _: &mut GcInfo) {} }

I was thinking *T would deliberately not have an impl, to force types which contain it, which are probably up to some funny business, to write their own, and not derive. In any case, having an impl that does nothing seems like the wrong idea, making it easy to automatically derive an impl that doesn't trace things which should be traced.

Why can't you impl<T: Trace> Trace for Uniq<T>/Vec<T>/Rc<T> normally to trace the contained values?

Unfortunately, this means that trait objects don't work very well in Gc: every trait used would need to inherit from Trace.

Ah right. In my head I was using my earlier-proposed idea to have Trait1+Trait2 be itself a trait, so you could do e.g. ~(ToStr+Trace)...

And it also means that any generic data-structures that wish to be precisely traced need a Trace bound, which is rather unfortunate. :(

What do you mean here exactly? This seems logical to me (e.g. impl<T: Trace> Trace for List<T>), or were you thinking of something different?

We could make Trace super special and automatically derived for all types, unless there is a manual implementation

If we do this I think an impl Trace should be automatically derived for a type iff all of its members also impl Trace (if it has type parameters, then with a T: Trace bound on them). I'm not sure if there's a use case for overriding this manually with your own impl? If the type contains non-Traceable members (such as *T), then you would be forced to write your own. You could also turn the auto-deriving off with an attribute (either per-type or crate-wide).

@huonw
Copy link
Member Author

huonw commented Jan 9, 2014

Why can't you impl<T: Trace> Trace for Uniq/Vec/Rc normally to trace the contained values?

It would be possible (see the "Re Uniq, and Vec and so on" paragraph my previous comment). In fact, just thinking about it now, it is necessary. My current implementation is actually incorrect for something like

struct X {
    x: Gc<Uniq<RefCell<X>>>
}

The Uniq is always regarded as reachable, even for an otherwise-unreachable X value that is pointing at itself, and so that value would then be stuck in a reference cycle and leak.

What do you mean here exactly? This seems logical to me (e.g. impl<T: Trace> Trace for List), or were you thinking of something different?

Something different. Smart pointers and generic data stuctures that can contain GC'd values will need to be able to register themselves with the garbage collector, passing in their tracing info (e.g. a function pointer to something wrapping the .trace method) so that the GC can run it when it sees a relevant pointer when doing its conservative scan. Something like:

fn run_tracing<T: Trace>(x: *(), gc_info: &mut GcInfo) {
     unsafe { (*(x as *T)).trace(gc_info) }
}

impl<T> Uniq<T> {
     fn new(x: T) -> Uniq<T> {
          // pseudo-Rust
          let ptr = malloc(size);
          *ptr = x;

           // get the appropriately monomorphised version of `run_tracing`
          register_with_gc(ptr, run_tracing::<T>);

          Uniq { ptr: ptr }
     }
}

The problem is the run_tracing::<T> line: it requires a Trace bound, and so we'd need impl<T: Trace> Uniq<T>... and then anything using Uniq would need one too, etc etc.

My goal above was to be fancier/work-around the type system: e.g. a pointer to a function equivalent to run_tracing in the tydesc, possibly two lang items like

#[lang="trace_traceable"]
fn trace_traceable<T: Trace>(start: *(), end: *(), gc_info: &mut GcInfo) {
    unsafe { (*(start as *T)).trace(gc_info) }
}

#[lang="trace_nontraceable"]
fn trace_nontraceable<T>(start: *(), end: *(), gc_info: &mut GcInfo) {
    gc_info.conservative_scan(start, end)
}

where the compiler uses the first where possible (i.e. in the tydescs of types with Trace impls) and the second to make up the slack (i.e. in the tydescs of types without Trace impls). In any case, this tracing stuff seems pretty similar to Drop and the drop glue.


Another problem I just thought of (which is completely obvious in hindsight): when we are scanning the stack conservatively and see a pointer to a Vec<T> registered with the GC, we don't know how many elements are in the vector and so don't know how to scan precisely. Possible solutions:

  1. scan the memory conservatively (optimisation: wait until we have nothing else to scan before doing any of the vectors found via this, on the off chance that there is another reference somewhere that does have access to true tracing info)
  2. store the length in the allocation; either for all types (loses some of the advantages of the nice {length, capacity, data*} vector representation) or just for managed types (one of the problems with the current @ is it forces a change of representation like this)
  3. store the length in a table in the GC (makes every .push and .pop really expensive)

I'm leaning toward 1 as the solution that has the least impact: it only affects vectors of managed things stored directly on the stack (or in other such vectors), as soon as you're behind, e.g, a smart pointer, you can trace precisely (so Rc<~[Foo]> would be precise).

@bill-myers
Copy link
Contributor

Ultimately, the only solution is to have a fully precise GC, including the stack.

There seems to be already support for that in LLVM through the shadow stack plugin (albeit with some performance degradation due to explicit bookkepping and need to keep roots on the stack and not in registers), and there is work on something that can work with no performance degradation at https://groups.google.com/forum/#!topic/llvm-dev/5yHl6JMFWqs

And in fact the latter appears to be already experimentally available in LLVM 3.4 according to http://www.llvm.org/docs/StackMaps.html#stackmap-section

Anyway, as long as trait objects without Send or NonManaged bounds are not used, the Rust compiler has perfect knowledge of whether any value holds Gc or not, so since trait objects are supposed to be rare (especially those without such bounds), even a precise GC with suboptimal performance should not really impact non-GC code much.

@pnkfelix
Copy link
Member

pnkfelix commented Jan 9, 2014

Ultimately, the only solution is to have a fully precise GC, including the stack.

the llvm-dev thread rooted here may also be relevant/of interest: http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-October/066782.html

Note in particular the discussion of Bartlett mostly-copying gc

(Update: ah, indeed, the google groups link that bill-myers provided is to a propsal that came after (and in response to) the thread I linked above.)

@pnkfelix
Copy link
Member

pnkfelix commented Jan 9, 2014

How was the planned "conservative on the stack, precise on the heap" GC going to do this? With headers on all heap objects?

Essentially, yes, that was my plan (that, in addition to some way to map from interior pointers to the start of an object).

@thestinger
Copy link
Contributor

@pnkfelix: What about heap objects without managed values? Does rooting borrowed pointers require adding overhead to all heap allocations?

@pnkfelix
Copy link
Member

pnkfelix commented Jan 9, 2014

@thestinger my plan was that ~-allocated objects that could contain references to managed-refs or borrowed-refs would need a header.

(I'll need to double check about ~-allocated objects whose references are solely to other ~-allocated objects.)

@thestinger
Copy link
Contributor

@pnkfelix: So won't this require an extra branch in every unique trait object that's not 'static and Send?

@pnkfelix
Copy link
Member

pnkfelix commented Jan 9, 2014

So won't this require an extra branch in every unique trait object that's not 'static'?

I don't understand your question. There will be extra branches in some places, but I don't see what trait objects would have to do with it ... I was planning on storing all such header information at a negative offset from the referenced address, so that the object layout would look the same from the view point of client code accessing the state of a ~-allocated object.

@pnkfelix
Copy link
Member

pnkfelix commented Jan 9, 2014

(or maybe @thestinger is talking about the trait objects that store a Trace vtable ... in which case that's not what I have been actively thinking about; I haven't even really read the discussion here on that topic carefully yet.)

@huonw
Copy link
Member Author

huonw commented Jan 9, 2014

my plan was that ~-allocated objects that could contain references to managed-refs or borrowed-refs would need a header.

Do borrowed refs need a header? Doesn't the borrow freeze the original value in-place and hence the GC will trace it through that value?

@pnkfelix
Copy link
Member

pnkfelix commented Jan 9, 2014

Do borrowed refs need a header?

The references themselves should not need any header. I think the sentence I wrote was unclear.

The point I was trying to make was just that: Under the scheme I was envisaging (a variant of Bartlett), an ~-allocated objects of type ~T would need a header, unless rustc can statically prove that any instance of ~T would not need to be traced for the GC.

What sort of header would the ~T need? At minimum, just the extent of the object (and you would conservatively scan the ~T in the same way that you would the stack). Of course, we can provide a more precise header (and may want to do so).

@pnkfelix
Copy link
Member

(a clarification: by "need a header", I am really thinking "will need meta-data somewhere tracking extent, and/or type, etc"; a side-table might be a better choice, or even an outright necessity, for some cases. Either way, whether its a side-table or a header at a negative-offset, this is all stuff just for the Gc, not for the object to use.)

@huonw
Copy link
Member Author

huonw commented Jan 12, 2014

I've pushed two commits that turn this into a tracing garbage collector. (Conservative on the stack, precise on the heap.) Unfortunately, it has some drawbacks, like forcing Uniq<T> and Vec<T> to have T: Trace. Quoting from the adjusted PR text:

Unfortunately to support tracing, these types (Uniq and Vec) require a Trace bound on their contents, which is unfortunate, as they should be allowed to store non-tracable types if those types don't contain any Gc pointers.

Problems

[...]

  • The tracing API means Uniq and Vec are less flexible that desirable (which flows downstream to any generic users of them) because they require a Trace bound to be able to register a handle to run when discovered by a conservative scan. Possible solutions:
    • having separate non-trace constructors (and non-trace .push and .pop for Vec!) but this would then require similar contortions downstream;
    • some sort of explicit TraceOrNonManaged bound (which would also require downstream generic to have that bound)
    • have the compiler enforce that Gc can only go into Trace types (a little like transmute requires types of the same size)
    • have the compiler generate trace glue like drop glue, etc., and then retrieve this for tracing (would require a call_trace_glue intrinisic, but then we'd have to have some way to get the appropriate types and information into it).

@nikomatsakis
Copy link
Contributor

On Wed, Jan 08, 2014 at 07:32:41AM -0800, Huon Wilson wrote:

That may work, but it requires chaining RefCell and Gc together,
while the former is definitely useful outside of Gc... would it make
RefCell slower in the non-Gc case?

When we last discussed this, we did plan to have Cell and
RefCell do a quick check to implement write barriers. This is one of
those examples which demonstrate why GC is best thought of as more of
a "privileged library" than "just another library". (However, I
haven't finished reading all the comments on this thread, and I'd be
happy if we are able to avoid the need to modify Cell/RefCell --
perhaps by just not needing write barriers altogether, though that
implies a less sophisicated GC algorithm in turn.)

@glaebhoerl
Copy link
Contributor

@huonw I've been thinking about this paradox:

  1. Any type that is known not to contain managed data can have a trivial Trace impl, with trace() being a no-op.
  2. Assume the type system prohibits storing managed data in a non-Trace type.
  3. Because of 2., we can assume that any non-Trace type does not contain managed data. Therefore, following 1., we can give that type a trivial no-op impl of Trace.
  4. We conclude that every type may have an impl of Trace. But if a type impls Trace, then the type system will let you store managed data in it, which invalidates our assumptions in 2.

Can you pinpoint where the contradiction is? I get the feeling that I'm using circular logic in there somewhere. Maybe there should be two separate Trace-ish traits, in particular that the one present in 2., written/derived for types by the user, and the one induced by 1., used by the GC, should be separate things. But this is only a cloudy vision, and I can't see the particulars of how it would work.

(Is this the same thing as the TraceOrNonManaged possibility you were floating? If we can assume that TraceOrNonManaged holds for every type (and following the above we should be able to), then we could make it a special implicitly-present-on-every-type-parameter bound like Drop is.)

The key to unlocking this might also be in the particulars of how 2. happens, which isn't clear to me either.

(I also had an idea w.r.t a primitive method for precisely scanning the stack that doesn't rely on LLVM (but does require compiler support) that I edited into the end of one of my previous comments, did you catch it?)

@huonw
Copy link
Member Author

huonw commented Jan 12, 2014

@glaebhoerl I think I've actually been thinking a similar thing to you (and have finally worked out that I'm on approximately the same page as you).

If M represents containing managed, and T is implementing Trace (and Tn for that being a noop), then your "proof" is

1. !M => T & Tn
2. !T => !M
3. 1 & 2 => (!T => T & Tn)

Even symbolically like that, it's not exactly obvious to me where that breaks down.

(Is this the same thing as the TraceOrNonManaged possibility you were floating? If we can assume that TraceOrNonManaged holds for every type (and following the above we should be able to), then we could make it a special implicitly-present-on-every-type-parameter bound like Drop is.)

I think the two traits you mention are basically equivalent to Drop + the drop glue generated by the compiler: i.e. types that contain Drop types aren't automatically Drop but do have destructors that run the destructor of their contents (the drop glue).

(I also had an idea w.r.t a primitive method for precisely scanning the stack that doesn't rely on LLVM (but does require compiler support) that I edited into the end of one of my previous comments, did you catch it?)

I didn't catch it (but I have now). I'd guess that something like that is the simplest way to do it, assuming that stops LLVM from optimising out the references.

(As @nikomatsakis implies, the Mozilla employees (who've more experience in the GC space that I anyway, and more experience hacking on the compiler) have been thinking about this on and off for a while, so I assume they may have solutions for many of these problems.)


@nikomatsakis

However, I haven't finished reading all the comments on this thread, and I'd be happy if we are able to avoid the need to modify Cell/RefCell -- perhaps by just not needing write barriers altogether, though that implies a less sophisicated GC algorithm in turn

The API current exposed here is designed to have a write barrier on Gc<T> for all T, with a non-write-barrier method for Freeze types. This is presumably slightly/significantly slower if someone is repeatedly borrowing a RefCell<Gc<T>> immutably (which write-barriers-on-RefCells could avoid emitting a write barrier on) or .geting from Cell<Gc<T>>, but doesn't require privileging them.

(Although, if they were privileged, std::gc could expose the API for this publically so that other non-libstd types that need write barriers could do the same thing.)

@huonw
Copy link
Member Author

huonw commented Jan 13, 2014

@pnkfelix and I had some discussion on IRC about the fact that the current collection policy (i.e. deciding when to run a collection; at the time of writing, it is just after some fixed number of allocations) is very subpar, causing quadratic behaviour---the number should be adjusting for heapsize.

(Noting this here for posterity, and so I don't forget to fix it.)

@glaebhoerl
Copy link
Contributor

@huonw

I'm still uneasy about *

Me too, but can you think of a better plan? The one I described is the only one I managed to arrive at which actually holds together. I don't see any promising leads for a potentially better one, but that doesn't mean there isn't one.

W.r.t. a copying/moving collector (it's good to see we've been thinking about the same things again!), the thought I keep returning to is that the garbage collector would have to be made to "obey the borrow checker", i.e. to not move things around if it would violate the contracts of any outstanding loans. I have very little idea about how (or whether) this could happen, though. (There's potentially two parts of it: one is moving an object in the heap and leaving an indirection to it in its former location. This might be handled by changing the representation of a Gc box (to enum of indirection or value etc), and potentially storing bit flags like Cell to know if there's any loans. The harder-seeming part is cleaning up the indirection and rewriting references to point to the new object... for this you would have to know that there aren't any loans of the Gc references themselves (as opposed to their contents).)

@pnkfelix

my assumption was that the core of the collector code, where it knows about object representation, where the mark bits are represented, whether to copy or not, how to scan the roots, etc would all be part of the rust runtime

Right... my thought was that the parts which relate to the properties and representation of types would be all be handled by the given type's trace glue (i.e. instead of storing data that "this type is enum, it has a discriminant of size S, and members of type either A, B, and C or X, Y, and Z" for the GC to interpret, the trace() glue would just do the match and invoke trace() in turn on the appropriate members statically, exactly the same as any derived instance of e.g. Eq or drop glue might do). How or whether this could be made to work together with more sophisticated GC ideas like generational and/or incremental collection, card marking, copying/moving, and so forth is an open question in my mind. So if you see any major conflicts, do share. :)

@huonw
Copy link
Member Author

huonw commented Jan 25, 2014

the thought I keep returning to is that the garbage collector would have to be made to "obey the borrow checker", i.e. to not move things around if it would violate the contracts of any outstanding loans I have very little idea about how (or whether) this could happen, though.

At the very least, by tracing through & and &mut and marking their contents as "borrowed" (this would, unfortunately, require tracing every reference, since any reference could point inside a Gc pointer; although the cost will only happen in tasks where the GC is actually active... but, depending on how we handle things like vectors, it may make Vec<&T> slightly slower). Also possible by having an RAII wrapper around the .borrow methods of Gc that registers & then deregisters a pointer , although this would mean every borrow is quite expensive... maybe ok, especially if we need to root borrows anyway.)

Another option is to actually rewrite & and &mut as we move things, but I imagine this may mean we have problems with optimisations.

for this you would have to know that there aren't any loans of the Gc references themselves

I assume you're talking about &Gc<T> here? We can still rewrite the pointer there, but we may have to make Gc non-Freeze for it to be valid. (Although we may have to do this anyway, since even something like let x: Gc<int>; will be considered deeply immutable (with possible optimisation consequences) without marking it as not Freeze... unfortunately that is slightly lossy, in that only the pointer is changing, the contents is always identical.)

Me too, but can you think of a better plan? The one I described is the only one I managed to arrive at which actually holds together. I don't see any promising leads for a potentially better one, but that doesn't mean there isn't one.

The only other one I can think of that I think has any hope of working is the one that is similar to transmute. In code:

extern "rust-intrinsic" {
     /// Trace the value for the GC, calling a `Trace` impl and[/or]
     /// the auto-generated tracing glue.
     fn trace<T>(value: T, tracer: &mut Tracer);
}

fn foo(t: &mut Tracer) {
    // ok
    trace(0i, t);
    trace(Gc::new(Gc::new(1)), t);

     // error: trace called on a raw pointer that contains a managed type
    trace(0 as *Gc<int>, t);
}

But, as I said above, this will have similar errors to transmute, where generic instantiations of functions from other crates can cause errors (similar to C++, too).

@glaebhoerl
Copy link
Contributor

Also possible by having an RAII wrapper around the .borrow methods of Gc that registers & then deregisters a pointer , although this would mean every borrow is quite expensive... maybe ok, especially if we need to root borrows anyway.)

This is kind of what I was thinking.

I don't suppose there's any way that the compiler could relay the static information that it has about borrows/loans to the garbage collector, and that this information could be sufficient? (I have a hard time wrapping my head around this, but my suspicion is no.)

Another option is to actually rewrite & and &mut as we move things, but I imagine this may mean we have problems with optimisations.

Right. Any time the garbage collector disregards the borrow checker, I think you need to have a very strong reason why it's safe for it to do that, when it's not safe for other code to do so.

I assume you're talking about &Gc here?

Yes, or more likely a borrow of some larger structure with Gc<T> inside it.

All this talk of rewriting pointers and references where it's not expected has me wondering whether we couldn't, or even shouldn't, adopt SpiderMonkey's solution involving double indirection. After all, they're facing very similar constraints: a managed heap hosted in a non-managed environment. Their rooting API also seems to be very similar to the suggestion I had for how to do precise tracing of the stack. So essentially in this scheme our Gc<T> would be like their Handle<T>, and variables on the stack referencing potentially-managed data would be like their Rooted<T>, except done automatically by the compiler. (Though, AFAICT, they have a clear boundary between what's inside the GC heap (JavaScript) and what's on the outside pointing in, which means that it's only the latter which need a double indirection, whereas I'm not sure if we could make a similar distinction...)

The only other one I can think of that I think has any hope of working is the one that is similar to transmute

Just to see if I'm understanding you correctly: is the idea that the transmute-like error would be generated if a type contains a raw pointer, which the automatically generated trace glue would attempt to trace? And that to resolve it you'd have to write your own Trace impl, which first casts it to a borrowed pointer or something before tracing it? (And IINM this would also depend on the manual Trace impl being used instead of, not in addition to, the generated trace glue?)

In the context of my earlier plan, i.e. not the afore-discussed transmute-like one, what about this solution: a lint which warns (or errors) if a type contains a raw pointer potentially referencing managed data, UNLESS any of the following are true:

  • the type has a #[no_trace] attribute (in which case NoManaged will be enforced by the compiler and there's nothing to worry about), or
  • the type has a manual Trace impl (whether it can or should also check that this impl looks like it actually attempts to trace the raw pointer is another question), or
  • the type has an #[unsafe_managed] attribute or something, which indicates to the compiler that "I have my own solution to this and I know what I'm doing".

For a type like Rc<T> which stores a "single thing" and doesn't rely on an external invariant to keep its data alive, if you don't want to allow managed data in it, adding #[no_trace] has no real drawback, it just enforces that you put NoManaged in the places where it's required for correctness in either case. It's only if you want one part of the type to be traced and another part NoManaged, or if you're doing a borrowed-pointer-like thing, that you need something more sophisticated.

This would still be best-effort (of course, I don't think you can make any guarantees around raw pointers), but it seems less error-prone than the prior plan.

@pnkfelix
Copy link
Member

@glaebhoerl @huonw regarding this issue: "W.r.t. a copying/moving collector (it's good to see we've been thinking about the same things again!), the thought I keep returning to is that the garbage collector would have to be made to "obey the borrow checker", i.e. to not move things around if it would violate the contracts of any outstanding loans."

In the design I have been envisaging, this would fall semi-naturally out of a Bartlett style mostly-copying GC, with a few caveats. Note that currently in @T, the T is not allowed to have &-ptr fields, which helps a little here.

  • The reason it falls out almost totally naturally is that one expects most of the &T ptrs (be they pointers to gc-allocated objects or to other memory), one expects most of them to be on the stack, so in a Bartlett-style Gc, the objects referenced via &T's on the stack will be pinned in place already.
  • I'm currently planning to conservatively scan ~-allocated objects that are recursively reachable from the stack; so that's another source of potential &T ptrs. But that appears to just be a generalization of how the &T ptrs on the stack will be handled. (Note the "recursively" here, i.e. I think it needs to be able to deal with a ~(Q, ~(R, ~(S, &T))) that is on the stack.)
  • The only place where I imagine we would deal with re-writing &T references would be if they are in ~-allocated objects that are only directly referenced by objects that we are tracing precisely (i.e., gc-allocated objects). Any borrowed references (to a T within gc-allocated object) that we scan conservatively would cause the referenced gc-allocated object to get pinned.
    • Actually, I need to double check my thinking here. I was thinking that the borrow-checker disallows @&T but that it allows @~&T (and likewise @(Q, @(R, ~(S, &T)))) but that might be simply incorrect on my part. In which case this complication may not arise at all. I need to check about this first.

I do not know if this answers your question about how to integrate with the borrow checker, but it was enough to convince me that we could consider doing a moving collector that even allowed for (some) &T references to moving objects.

(The reason that I want to ensure that we handle updating the &T references on ~-allocated objects that are reachable solely via gc-allocated objects, is because if you do not do this, then either (1.) the borrow-checker needs to emit code to pin and unpin gc-allocated objects, which I do not think is feasible, or (2.) you need to do a full non-moving tracing GC to find all of the reachable &T references before you can do any moves of any objects at all.)

@pnkfelix
Copy link
Member

I didn't address native *T ptrs at all in my previous post. I've been working under the assumption that the client code, i.e. library utilizing *T, would be responsible for pinning any gc-objects that have an outstanding *T pointer, and hopefully unpinning it eventually. (Some collectors make pinned objects uncollectable, but in my mind pinning (disallowing moves) and rooting (disallowing reclamation) can be orthogonal notions; i.e. we could avoid imposing the burden of knowing when to unpin the object, and instead just never move it until it dies.)

@huonw
Copy link
Member Author

huonw commented Jan 26, 2014

I don't suppose there's any way that the compiler could relay the static information that it has about borrows/loans to the garbage collector, and that this information could be sufficient?

At the very least, we have to convey the actual value of the pointers that are borrowed so the GC knows which they are. (I imagine some sort of static information may be possible with precision everywhere... not really sure.)

is the idea that the transmute-like error would be generated if a type contains a raw pointer, which the automatically generated trace glue would attempt to trace? And that to resolve it you'd have to write your own Trace impl, which first casts it to a borrowed pointer or something before tracing it? (And IINM this would also depend on the manual Trace impl being used instead of, not in addition to, the generated trace glue?)

Yes, that's the idea, but I think we could still run the generated trace glue with a manual Trace impl: basically *T would have a noop trace glue, but it would be an error to actually attempt to call it directly. That is, you need to have a type with a Trace impl "between" a call to the trace intrinsic and any raw pointers, i.e.:

struct NoTraceImpl<T>(*T);
struct HasATraceImpl<T>(T);
impl<T> Trace for HasATraceImpl<T> { ... }

// errors:
trace(raw_ptr, ...)
trace(NoTraceImpl(raw_ptr), ...);
trace(~NoTraceImpl(raw_ptr), ...); // `~` calls `trace` on its interior

// ok
trace(HasATraceImpl(NoTraceImpl(raw_ptr)), ...);
trace(~HasATraceImpl(NoTraceImpl(raw_ptr)), ...);
trace(HasATraceImpl(~NoTraceImpl(raw_ptr)), ...);

To be clear, in my mind, all this work with raw pointers is just designed to make it harder to do the wrong thing, it's not particularly fundamental to the semantics of the GC, it's just something that users should avoid making mistakes on, and it'd be nice if rustc caught the common ones.

To this end, a lint makes sense, but I think it could be rather difficult to do properly, since you may have the tracer for a raw pointer on a different type, i.e.:

struct Inner {
    ptr: *X
}

pub struct Outer {
    priv inner: Complicated<Wrappers<Inner>>
}

impl Trace for Outer { ... }

Of course, peculiar wrappers like this may just be where we say "you're doing weird things, and the compiler can only be so smart; you're on your own, be careful".

Actually, I need to double check my thinking here. I was thinking that the borrow-checker disallows @&T but that it allows @~&T (and likewise @(Q, @(R, ~(S, &T)))) but that might be simply incorrect on my part. In which case this complication may not arise at all. I need to check about this first.

~&T is not 'static, so yes, @~&T (all the sigils!) is disallowed.

I'm currently planning to conservatively scan ~-allocated objects that are recursively reachable from the stack; so that's another source of potential &T ptrs

(FWIW, the code in this PR currently does this.)

Does this include conservatively scanning ~[] vectors? (I guess with the library vector in std::vec_ng, it will be (somewhat) easier to precisely scan, since we don't have to build that extra infrastructure into the compiler. If we do precisely scan them, then we do have the option(requirement?) of rewriting any references inside Vec<&T>.)

Also, on &T, what about &[T]? (either as an interior slice into a Gc<~[T]> or just as a &[Gc<T>])

@glaebhoerl
Copy link
Contributor

basically *T would have a noop trace glue, but it would be an error to actually attempt to call it directly. That is, you need to have a type with a Trace impl "between" a call to the trace intrinsic and any raw pointers

I might be misunderstanding something: wouldn't the compiler generate trace glue for NoTraceImpl and immediately raise the error?

To be clear, in my mind, all this work with raw pointers is just designed to make it harder to do the wrong thing

We're on the same page.

a lint makes sense, but I think it could be rather difficult to do properly, since you may have the tracer for a raw pointer on a different type

In this example, the compiler would complain about Inner.

(That is, my thinking was that it wouldn't try to "look inside" anything, and would raise the error at the type which actually contains the *T. Maybe there could be yet another attribute which says "defer all yer complaining to users of this type" (i.e. that this type is itself "like *T"), if there's demand for it.)

What I'm not so clear about is how something like struct Id<T>(T); struct Hmm<T> { m: Id<*T> } might be handled. It's obviously preposterous to complain about Id. But to complain about Hmm, it seems like you would have to look at the definition of Id. (Given that it's a generic type, I suppose the compiler would see its definition at some point, but I'm not sure if it's an appropriate point?)

At the very least, we have to convey the actual value of the pointers that are borrowed so the GC knows which they are. (I imagine some sort of static information may be possible with precision everywhere... not really sure.)

The compiler's information about borrows is per function, since of course function bodies are typechecked separately. So to again start from the least sophisticated solution that could possibly work, what the compiler could do is insert calls to register_borrow(&var, Imm | Mut | ...) and unregister_borrow(&var) or similar at the beginning and end of the statically known lifetime of each borrow of a variable holding potentially-managed data. If you also require RefCell to call these explicitly, then... you might even have full knowledge of active borrows. So maybe my suspicion was misplaced and this could actually work?

What's not clear to me yet is the case where the function takes out a loan on some larger structure, but the interior of that structure (which is thereby also borrowed) is also reachable through other means. First of all, is this possible? (Does the borrow checker allow it? Does the compiler have some notion of active vs. inactive pointers which the GC doesn't (yet)?) Second of all, if it's possible, could the GC detect it in a reasonable way?

And of course the other big question is what the GC would actually do with all this information about active borrows that we'd be giving it.

@pnkfelix:

  • If completely precise GC is possible, are there reasons why partly-precise, partly-conservative GC might still be preferable? What are they? What I'm driving toward with all this Traceing and borrow checker stuff is essentially to see whether completely precise GC would be possible (and without headers everywhere!). (Again, my impression being that GCs are for some reason usually designed with the assumption that they need to work in a completely "dynamic" fashion; while we have a lot of static information, and we might try using it.)
  • Knowing that, as @huonw notes, borrowed pointers inside managed data are completely impossible, would this also completely avoid the need to rewrite any borrowed pointers anywhere in your scheme?
  • It makes me a little bit uneasy that your plan gives special consideration to ~ pointers, when it's just one smart pointer type among many. Is it meant to stand as a proxy for "whatever kind of smart pointer"? Or would there actually be a gap in support between ~ and other smart pointers? What if ~ is itself removed from the language and implemented as a library instead?

@huonw
Copy link
Member Author

huonw commented Jan 26, 2014

I might be misunderstanding something: wouldn't the compiler generate trace glue for NoTraceImpl and immediately raise the error?

No, that's the "magic". There's two concepts under that plan: the actual trace glue and the calling of the trace glue, (normally) done via some intrinsic (I called it trace above, but I'll call it trace_checked here). For raw pointers, the actual trace glue is a noop (and is perfectly fine to create[1]), but calling trace directly on one is illegal.

Semantically, creating the trace glue for types without a Trace implementation would be the same as just calling trace on every field (so that "unprotected" raw pointers inside structs would be an error). However, calling trace on a type with a trace implementation would still generate the trace glue for the fields, but the generation would not be considered to be a call to trace.

As an example, say we have Uniq<T> { priv ptr: *mut T } as a library implementation of Uniq, we'd have

impl<T> Trace for Uniq<T> {
    fn trace(&self, t: &mut Tracer) {
        unsafe { trace_checked(&*self.ptr, t) }
    }
}

i.e. just tracing its contents.

Then the trace glue for Uniq<T> would look something like:

// explicit Trace impl:
call trace_checked::<T> // contents
// autogenerated:
call trace_glue::<*mut T> // noop

For Uniq<*T> the trace glue would be

// explicit Trace impl:
call trace_checked::<*T> // error: calling trace_checked on *T
// autogenerated:
call trace_glue::<*mut *T> // noop

As a different example, say we have Foo<T, U> { x: Uniq<T>, y: U } (without an explicit Trace impl), then the trace glue for Foo<int, *int> would be

// explicit Trace impl: <no impl>
// autogenerated:
call trace_checked::<Uniq<int>> // code above
call trace_checked::<*int> // error: calling trace_checked on *T

The difference is the call of trace_checked or the direct call of trace_glue in the autogenerated section based on whether there is an Trace impl of the current type or not. Writing it out like this, I'm relatively sure it could work, but (a) the error messages would be as bad as the ones we get from transmute, and (b) I'm not sure how well it fits into the compiler/language. Especially since the only types that have a different behaviour for trace_glue vs. trace_checked are *T and *mut T.

[1]: I don't know how this would interact with trait objects. I guess the trace glue in a trait object would created as if it were a call to trace_checked, rather than trace_glue.

In this example, the compiler would complain about Inner.

My point is that is a case when it shouldn't complain about Inner, because Outer is managing the tracing. I guess #[allow(untraced_raw_pointer)] struct Inner { ... } would be ok.

What's not clear to me yet is the case where the function takes out a loan on some larger structure, but the interior of that structure (which is thereby also borrowed) is also reachable through other means. First of all, is this possible? (Does the borrow checker allow it? Does the compiler have some notion of active vs. inactive pointers which the GC doesn't (yet)?) Second of all, if it's possible, could the GC detect it in a reasonable way?

Maybe something like Gc<Rc<int>>: you can have an Rc<int> outside the Gc that points to the same int. However, I'm not quite sure what you mean by this.

Also, re spidermonkey-style double-indirection, I'd thought about it briefly, but didn't really pursue it at all, since we actually have control of the compiler & language and the ability to put in the appropriate hooks to make that unnecessary, in theory. (I'd assumed that avoiding two layers of pointers is a Good Thing, without considering the benefits of such an approach in depth.)

@pnkfelix
Copy link
Member

@glaebhoerl

  1. Regarding "GCs are for some reason usually designed with the assumption that they need to work in a completely "dynamic" fashion; while we have a lot of static information, and we might try using it"
    • I will just mention again the papers by Appel and also Goldberg that I referenced in an earlier comment. Those explore GC in the context of ML, where you have much the same collection of static information that we do, at least from the view point of a GC. (I suspect Goldberg's tack diverges from your own in his handling of polymorphic types, but still, if you want to spend time on the thought experiment, it may be good to see what others have done here.)
  2. Regarding: "If completely precise GC is possible, are there reasons why partly-precise, partly-conservative GC might still be preferable?"

This is a game of trade-offs. The word "possible" is very broad. I do not doubt that it is "possible" to implement a completely precise GC for Rust. But that does not imply that it would be the right engineering decision for the short term (and perhaps not even for the long term).

Largely I keep pushing for a Mostly-Copying Collector due to ease of implementation: to get a tracing GC into Rust ASAP, I do not want to spend more time than necessary wrestling with LLVM. I am under the impression that LLVM's API for precise stack scanning may be in a flux in the near term, so that's a distraction, and I do not want to spend time finding out what the performance overheads are of its current API.

I'm not saying that I would immediately veto a fully-precise collector. I just think it would be a mistake to build full-precision in as an up-front requirement for the GC.

Anyway, there are other reasons that a Mostly-Copying GC could be preferable. I continue to stress: this is all about trade-offs:

  • It may simplify interfacing with native code. (But then again, I already said that I want to see whether we can require explicit pinning and unpinning from the client code in such cases.)
  • The metadata to support precise stack scanning are another source of potential bloat. I do not have my copy of the GC handbook with me at the moment so I do not have access to the relevant statistics here.
  • The code to scan the stack conservatively is not just simpler to implement, but it can also be faster than trying to do a precise scan, depending on the technique that is used.

I think my mindset coincides with Filip Pizlo's on this matter; see the LLVMdev emails I linked in another comment above. In particular this one: "I do not mean to imply that accurate GC shouldn't be considered at all. It's an interesting compiler feature."

I'll repeat a few links that I found again while double-checking my work writing this comment:

@glaebhoerl
Copy link
Contributor

@pnkfelix Thanks for the pointers, I've finally borrowed some time to start following them. Below are my notes. I did not give them very thorough readings, but only attempted to understand the most important aspects of each.

Bartlett's 1989 paper explains mostly-copying GC and how to extend it to a generational collector. The key idea is that besides the area of memory they reside in, whether a page is in old-space or new-space can also be tracked by a field stored in the page. The stack is scanned conservatively, and pages pointed into are "moved" to new-space by updating their space fields. In this way the address of objects which may-or-may-not be referenced from the stack is unchanged. After this, objects referenced from these pages are physically copied into new-space. (This requires those objects to be self-identifying, e.g. to have headers.)

Appel's paper is about GC in Pascal and ML without headers or tags on objects. This is accomplished by the compiler generating tables containing type information (layout etc) for the garbage collector, which traverses these tables in parallel with the traced objects to keep track of types. The type of roots is determined by associating this information with the functions containing them, which is looked up by the GC based on the return-address pointer, and in the case of polymorphic functions, walking up the stack until the function where the type variables are instantiated is found. It also has a section on how to do breadth-first copying collection with this scheme.

Goldberg 91 seems closer to what I was thinking about: here garbage collection routines are generated by the compiler for each type. He takes it a step further, and borrowing Appel's idea of associating GC information with functions, also generates GC routines per function to trace its stack variables (and in fact a different routine for each point in the function where different variables are live). There is also analysis to avoid doing this when a function doesn't allocate and therefore cannot trigger collection. Tracing polymorphic values (here all polymorphism is with boxing) is accomplished by parameterizing the tracing routine for each polymorphic function and type over the tracing routines of their type variables, which during GC are passed in by the tracing routine for the containing type/stack frame. There is also a section on extending this to parallel programs which I skipped.

Goldberg 92 is about how to extend tagless GC to copying incremental GC. Read about halfway, seems quite interesting, might be worth returning to later. (Also the pages are in reverse order, which is kinda weird.)

Goldberg/Gloger 92 realize that tracing vptrs must also be stored for closures. For polymorphic closures there seems to be a problem, because there is no way to go back to the point where the closure was created to find out what types it is closing over. They then realize that by parametricity, if a closure is polymorphic over some types, then its code cannot follow pointers to those types, therefore doing this is actually unnecessary: any values whose type the GC cannot establish can only be garbage.

Huang et al 04 appears to be about using JIT technology to help the Java GC optimize locality based on the dynamic execution of the program. I'm not sure which part of this is relevant here?

Cheadle et al 04 is about using code specialization to reduce the heap usage of their incremental GHC/Haskell garbage collector. As Haskell is lazily evaluated, heap objects in GHC contain a pointer to "entry code" which, if the object is an unevaluated thunk, points to code which evaluates it, stores the result, and then overwrites the entry code pointer with a different one which merely returns the stored result. Objects are accessed by the mutator "entering" them in this way. As this is already a read barrier, they take advantage of the same mechanism for incremental GC to scavenge the object (copy its referents from old-space to new-space) when the mutator "enters" it. In their earlier version, this necessitated storing an additional pointer to the original non-scavenging entry code in each heap object, so that after scavenging the normal entry code can be used. In this paper they remove the need for this additional pointer by instead generating specialized scavenging and non-scavenging versions of each function, and making the entry code pointer point to the appropriate one. Except for perhaps their measurements I'm not sure how relevant all of this is to us. They do observe that closures are just trait objects:

The most important feature of the STG-machine for our purposes is that a closure has some control over its own operational behaviour, via its entry code pointer. We remark that this type of representation of heap objects is quite typical in object-oriented systems, except that the header word of an object typically points to a method table rather than to a single, distinguished method (our “entry code”).

I haven't looked at the LLVM links yet, I'll have to get back to those later.

@glaebhoerl
Copy link
Contributor

@pnkfelix most of your concerns w.r.t. precise tracing appear to be about tracing the stack. I'll re-suggest my earlier suggestion here: the compiler could generate code in each function to register and deregister those stack variables as roots (in the form of their address plus the trace glue vptr corresponding to their type) which may possibly contain references to managed data. While this may not be optimal for mutator performance when using GC, it also avoids relying on LLVM and imposes no cost on code which is known not to require GC. I also had a similar idea for identifying (and presumably pinning?) objects which have been borrowed. In both cases, there is also the advantage that besides the compiler generating these calls automatically for safe code, they would also be available for unsafe code to invoke manually where appropriate.

The reason I was wondering what advantages semi-precise GC has (in other words, what the other side of the tradeoffs is) is that one of our main objectives, which is not putting a burden on code which doesn't use GC, seems (to me) to be easier to achieve with precise collection. (Which is not so surprising if you consider that precise collection involves knowing what not to trace.)

@huonw Gah, I still need to process your last comment as well. :) I'll get around to it.

@pnkfelix
Copy link
Member

@glaebhoerl you said "most of your concerns w.r.t. precise tracing appear to be about tracing the stack":
well, yes, when comparing fully-precise tracing versus mostly-copying style tracing, the only difference is how the stack is handled ...

(update: I was being a bit flip in the previous comment. I do realize you've been talking about trying to do a heavily type-driven tracing gc, which would imply that you really do need precise knowledge of the types of one's registers and stack slots in order to properly drive the tracing itself, at least if one wants to do it without any headers, and that would be impossible in a conservative stack scanning setting. I just am not yet convinced that this is a realistic option for us. Maybe I am too pessimistic.)

There is plenty of precedent for having the compiler emit code to automatically maintain a root set (or a shadow stack of roots, register/deregister, etc.) I think the LLVM-dev email thread I linked to earlier has some pointers to related work here. As you said, its not a technique thats known to be terribly performant.

I'll have to go back through your comments on this thread, as I feel like I must have misunderstood something in the line of discussion here.

@bill-myers
Copy link
Contributor

You can have optimal performance by generating tables that tell you, for every instruction that can potentially trigger GC (i.e. all function calls), which registers or locations on the stack cointain GC roots, and what their type is.

When GC is triggered you use unwind tables to find the IPs in all functions on the call stack, lookup the GC tables for all of them, and trace them.

You can probably do that now on LLVM with http://www.llvm.org/docs/StackMaps.html

I think that ultimately this is the only option for Rust that would be acceptable for "production": conservative scanning is conceptually really ugly, cannot scan datatypes that hide pointers but have a custom Trace implementation, can incorrectly keep dead objects alive forever, adds the overhead of having to annotate all allocations with type information and adds the overhead of a data structure that allows to lookup whether a value is a pointer to an object.

@pnkfelix
Copy link
Member

@bill-myers you've said basically the same thing in your earlier comment.

Out of curiosity:

  • Do you know offhand if Filip Pizlo has revised his opinion of the necessity/utility of precise stack scanning?
  • Do you think that Rust is just fundamentally different from the languages where Bartlett-style mostly-copying gc has been deployed succesfully? I suppose reasonable people can different about what "acceptable for production" means.

In my opinion we need precise heap scanning but can live with conservative stack scanning for 1.0. I agree that the other drawbacks with conservative scanning you pointed out are present, at least partially (I may quibble about details), but I disagree about their impact. I do not think mostly-copying is conceptually ugly, and I stand my points about the trade-offs here that I made earlier.

Looking over the LLVM stack map API, it says outright in the motivation section that the functionality is currently experimental. The only linked client of the API is FLT JIT, which is also experimental and disabled by default. So I stand especially by this comment I made earlier:

I do not want to spend more time than necessary wrestling with LLVM. I am under the impression that LLVM's API for precise stack scanning may be in a flux in the near term, so that's a distraction, and I do not want to spend time finding out what the performance overheads are of its current API.

@pnkfelix
Copy link
Member

I realized something while reflecting on this dialogue.

I think many of the participants here are focused on using precise stack scanning to enable type-driven heap tracing from the outset in the roots+stack. Notably, people haven't been discussing whether these tracing procedures are using a mark-sweep style GC or a copying GC, which makes sense, because that is an orthogonal decision once you assume that you are going to have 100% precise tracing on the roots and stack.

Meanwhile, my focus has been on how to get a GC into Rust that makes use of techniques such as copying collection. I do not want to build in a mark-sweep GC as the only GC we deploy in 1.0, because I worry that end-users will then build the assumption that objects do not move into their own code, and we'll never get a relocating collector into the runtime once libraries deployed with that assumption become popular. (And Rust may not need a relocating collector; but since it might, I would prefer to start with one and then see whether it fails to pay for itself.)

My recent realization is that these two ends: type-driven GC and a mostly-copying style GC, may not be at odds with one another. Assuming that we have precise type information for at least one GC reference stored on the stack (or in a ~[T] solely reachable from the stack), via a stack map or what-have-you: then that may be enough info to drive the GC in the manner that @glaebhoerl wants, while still allowing us to use a Bartlett style system that still pins all the objects immediately reachable from the stack (which I conservatively assume to be borrowed and have outstanding borrowed references) but allows relocation of the other heap-allocated objects. There would be no need, I think, to worry about borrowed-references to moving objects, which has been my primary motivation for focusing on mostly-copying GC.

I admit: the above is just conjecture, I haven't thought it through completely. It may not address all of the drawbacks that @bill-myers pointed out.

I guess my point is, I may have been mistakenly conflating "precise stack scanning" with "fully-moving GC", and the latter I have been treating as "too risky for 1.0." But I would be happy to adopt type-driven tracing in combination with mostly-copying GC.

@glaebhoerl
Copy link
Contributor

@huonw I think I finally understand the trace_checked idea. It does seem a bit involved and magical, but also relatively thorough and complete (apart from checking the validity of actual Trace impls, it seems like it catches every case?). Do you see a solution in the lint-based (or lint-like) approach to the struct Foo<T> { x: Uniq<*T> } case that's better than just conservatively triggering the warning any time *T is present as a type argument? But maybe that's good enough... perhaps it could piggyback off variance and, a bit less conservatively, warn only if it's used in covariant or invariant positions.

My point is that is a case when it shouldn't complain about Inner, because Outer is managing the tracing. I guess #[allow(untraced_raw_pointer)] struct Inner { ... } would be ok.

In total you'd need three attributes, with the hard part being figuring out ergonomic bikeshed colors:

  • one which makes the type have no-op trace glue and enforces NoManaged (no_trace?)
  • one for types containing *T which are "like &T", in that they look unsafe without a Trace impl but aren't (unsafe_raw_managed?)
  • and one for types containing *T which are "like *T", in other words are unsafe without a Trace impl, and should be treated the same as *T itself in any containing contexts (checked_unsafe_raw_managed??)

The latter two are rather different, but capturing that distinction in their names is another matter.

Maybe something like Gc<Rc<int>>: you can have an Rc<int> outside the Gc that points to the same int. However, I'm not quite sure what you mean by this.

A good example might actually be the reverse: consider Rc<Gc<int>>. Under what circumstances is the GC allowed to (a) rewrite the contents (pointee) of the Gc box with an indirection, (b) rewrite the Gc pointer itself? The former case seems easier: it should be allowed iff the int isn't borrowed (so "borrowing is pinning"). For (b), I was thinking at first that the same rule would apply (rewritable unless borrowed), but upon reflection I think this doesn't make sense. Why should it be forbidden to rewrite the Gc<int> pointer in &Gc<int>, but allowed in e.g. Gc<Gc<int>>, and our example which is Rc<Gc<int>>? (RefCell doesn't mind.) If this is the right way of thinking about it, then Gc would indeed be kinda non-Freeze as you say. (I'm starting to have questions about the meaning and legitimacy of Freeze...)

The restrictions on (a) seem like they could be satisfied with a RefCell-like dynamically tracked borrows with RAII scheme. Are there other, perhaps more efficient ways? For instance, using either my earlier idea or conservative stack scanning, how could it handle cases like Gc<~int> where someone takes out a borrow on the int? Just as when it's on the stack, ~int should not be relocated while its interior is borrowed. But how will the GC know this, when they're at unrelated addresses? (With the RefCell-like mechanism, the programmer must first borrow the full contents of the Gc box i.e. ~int to borrow a part of it, i.e. the int, so this doesn't pose a problem.)

@pnkfelix:

I think my attitude is basically that it would be awfully nice to have the infrastructure for fully precise, type-based, Rustic tracing right from the start, in a way that minimizes (ideally all the way to 0) the impact on non-GC code, while the performance of the GC itself and GC-using code can be improved later as long as the semantics and programmer-facing parts of it are there. Hence why the compiler-inserted register_root calls idea seems appealing to me: not necessarily fast, but appears relatively easy to implement (relative to having to rely on LLVM support which doesn't exist, anyways), and it could be replaced with precise stack maps or Goldberg-style generated stack tracing code or whatever else at any later point without changing the observable behavior.

I worry that end-users will then build the assumption that objects do not move into their own code

Given that Rust's semantics guarantee memory safety, this should only be an issue for unsafe code. Is that what you meant?

Also questions:

  • How would conservative stack scanning handle the Gc<~int> + &int case from above?
  • In an earlier comment you wrote "I'm currently planning to conservatively scan ~-allocated objects that are recursively reachable from the stack": how would this work exactly? (And [how] would it generalize to library smart pointer types?)

(I don't really mind what method is used, as long as it's compatible with the mentioned goals, so if "borrowing is pinning" is implemented by conservatively scanning the stack that's fine by me. I'm just trying to understand how it would work.)

There would be no need, I think, to worry about borrowed-references to moving objects, which has been my primary motivation for focusing on mostly-copying GC.

Could you expand on this point? I'm not sure I grok it.

I may have been mistakenly conflating "precise stack scanning" with "fully-moving GC", and the latter I have been treating as "too risky for 1.0."

Is this because of having to rely on uncertain-at-best LLVM support, or some other reason?

@bill-myers do you have any thoughts about the generated stack tracing code approach from Goldberg's 1991 paper? (This is kind of like treating a stack frame as a big struct and generating code to trace that type (not unlike how stack-closures-as-trait-objects can be interpreted), with the additional complication that different variables are live at different points. Of course that complication exists with stack maps as well.)

@huonw
Copy link
Member Author

huonw commented Jan 31, 2014

I'll come back to it, but:

(I'm starting to have questions about the meaning and legitimacy of Freeze...)

It's possible that Freeze will disappear (#11781), but it may be still useful for TBAA-style optimisations. In any case, I certainly don't think that it's a requirement for Gc<T> to be Freeze.

@emberian
Copy link
Member

emberian commented Feb 2, 2014

Closing to clear up the queue.

@emberian emberian closed this Feb 2, 2014
@glaebhoerl
Copy link
Contributor

(I don't think this should impede us in continuing the discussion?)

@huonw I noticed that of the "three attributes" from above, the second can be adequately expressed by writing an explicit no-op Trace impl. So then there's only two, which makes naming a bit easier.

@emberian
Copy link
Member

emberian commented Feb 3, 2014

(No, it certainly doesn't)

On Mon, Feb 3, 2014 at 2:58 AM, Gábor Lehel notifications@github.comwrote:

(I don't think this should impede us in continuing the discussion?)

@huonw https://github.com/huonw I noticed that of the "three
attributes" from above, the second can be adequately expressed by writing
an explicit no-op Trace impl. So then there's only two, which makes
naming a bit easier.


Reply to this email directly or view it on GitHubhttps://github.com//pull/11399#issuecomment-33930506
.

@huonw
Copy link
Member Author

huonw commented Feb 6, 2014

Do you see a solution in the lint-based (or lint-like) approach to the struct Foo { x: Uniq<*T> } case that's better than just conservatively triggering the warning any time *T is present as a type argument? But maybe that's good enough... perhaps it could piggyback off variance and, a bit less conservatively, warn only if it's used in covariant or invariant positions.

Lints can examine attributes etc. on things, so it would be possible to have a #[allows_unsafe_gc] annotation on types that can always handle *T... but other than that: I guess it would just be conservatively triggered. I'm not really sure of the consequences.

Under what circumstances is the GC allowed to (a) rewrite the contents (pointee) of the Gc box with an indirection

I'm probably misunderstanding(I'm interpreting that as "change the value that the Gc<T> points to")... but, never? The GC should only be modifying the Gc pointers not the contents (unless the contents is other Gc pointers...).

(b) rewrite the Gc pointer itself?

As long as we can guarantee we rewrite all references (and only rewrite actual references) as we move things, it seems reasonable for this to always be possible... although this means everything can change under the feet of a function (since the values can change due to some far removed Gc::new() call), so if we allow rewriting & would this require reloading them off the stack for every interaction?

(If so, it would have to apply to all &s, and would require compile-time disabling of the GC (in the whole program) to avoid... maybe it seems less reasonable.)

For instance, using either my earlier idea or conservative stack scanning, how could it handle cases like Gc<~int> where someone takes out a borrow on the int? Just as when it's on the stack, ~int should not be relocated while its interior is borrowed

I think the ~int can actually be relocated when there is only a &int, since the int it (and the reference) points to never moves. I think the subtlety here (which is what you are saying, AIUI) is making sure that a &int into a Gc<~int> keeps it reachable, so that the Gc pointer isn't collected (possibly finalising the ~int and leaving the &int dangling). I.e. we need some sort of rooting/borrowing device like RefCell, as you say.

@glaebhoerl
Copy link
Contributor

Under what circumstances is the GC allowed to (a) rewrite the contents (pointee) of the Gc box with an indirection

I'm probably misunderstanding(I'm interpreting that as "change the value that the Gc points to")... but, never?

Some GCs, I think generational collectors, do a thing where moves happen in two stages: in minor collections moved objects are rewritten with forwarding pointers (indirections) to their new location, and in major collections these are cleaned up by actually making all references point to the new location (after which the indirection becomes garbage). A simplistic formulation in Rust might be something like:

pub struct Gc<T> { priv contents: *mut GcContents<T> }
enum GcContents<T> { Object(T), Indirection(Gc<T>) }

Clearly the contents can't be overwritten with an indirection as long as the object itself is borrowed.

if we allow rewriting & would this require reloading them off the stack for every interaction?

I don't have specific arguments for this position yet (though perhaps you have just stated one), but my very strong feeling is that borrowed references should not be rewritten. It should work in the opposite direction: an & reference should root and pin the thing it is referring to. The GC should not have special license to meddle with things outside its domain, rather it should be subject to the same invariants as other code.

I think the ~int can actually be relocated when there is only a &int, since the int it (and the reference) points to never moves.

In this particular case yes, this is true. But in the case of (double, int) borrowing &int, it just as obviously isn't. Maybe in that case, the GC can figure out that it's the same object because they're adjacent in memory. But in theory both types have the same kind of ownership semantics. How does the GC know? I'm very nervous about this extending to any kind of general case, because it feels like the GC would need to have knowledge of the semantics and internal invariants of every particular type for it to be safe. Much more reassuring to say that borrowing the interior of an object in a Gc box should behave as if the whole object were borrowed first (maybe the only way to do this is to make it be literally what happens, or maybe not; I don't know).

@pnkfelix
Copy link
Member

pnkfelix commented Feb 6, 2014

@glaebhoerl First off, I'm inclined to agree that a & reference should pin its referent. I have gone back-and-forth internally on this matter, but at this point I think this suggestion is likely to be the most workable solution.

I think this property (that & reference should pin their referent) is the essence of my argument for why we should adopt a mostly-copying strategy that pins the references reachable from the stack (or reachable from & references that are in ~ objects recursively reachable from the stack). I admit that there remain two strategies for pinning here: 1. Pin based on immediate-reachability from the stack (or the ~heap), as described in the previous sentence, or 2. do the pinning when the borrow occurs and then unpin when the borrow expires, which I believe is what you are thinking of.

I do not yet know how to handle doing the unpin exactly when the borrow expires, which is one reason strategy 2 worries me, although maybe we can do something that just keeps things pinned for longer than strictly necessary, I am not yet sure.


Second, just an aside / FYI / head's up: I think you are conflating generational collectors with replicating incremental collectors. A relocating generational collector will move a subset of the objects (namely the live ones in the nursery) during a minor collection, but in my experience it is also responsible for updating all references to the moved objects before yielding control back to the mutator. (This is one purpose of a remembered set: to provide the collector with a narrowed subset of the fields that need to be updated when the objects in the nursery are moved, so that hopefully one will do less work than an scan of the entire heap in order to maintain this invariant.)

In an incremental copying (i.e. relocating) collector, objects can be copied and then there may be two copies in existence while the mutator is allowed to run. Supporting a system where an object can be forwarded by the GC and then control returns to the mutator before all of the outstanding references have been updated in this strategy is often accomplished via a read-barrier (which is a non-starter for us IMO), although I think alternative schemes that still rely solely on write-barriers have been devised such as the one from Cheng and Blelloch (which I think is a variant of Nettles and O'Toole; and here's another contemporary paper by Blelloch and Cheng, I cannot recall offhand which one is most relevant here; in any case the write-barriers in these cases are far more expensive than the ones one typically sees in a normal generational collector.)

(my apologies for not providing links that are not behind a paywall. If other people take the time to track down variant links to free pre-prints of those papers, please feel free to add those links to the end of the this comment, but please leave the acm links above alone to aid with finding cross-references.)

@glaebhoerl
Copy link
Contributor

@pnkfelix I was thinking in particular of GHC's collector, which does do the thing where indirections are cleaned up by the GC, but indeed, my hazarded guess that it has to do with generational collection was wrong: in fact the indirections are created when thunks are evaluated. (I had a surprising amount of trouble finding a good reference for this which does more than just mention it in passing, but see e.g. the "Graph Reduction: Thunks & Updates" section here.)

So mea culpa, but it doesn't end up making much difference: if the object is moved by rewriting all references, the old location will become garbage, so we still can't allow it while borrowed references exist (as @huonw noted). (Constrast to a borrow not of the managed object, but of a Gc<T> which refers to it: in that case we will still rewrite the Gc<T> with impunity, and accordingly we must declare Gc<T> non-Freeze.)

I think this property (that & reference should pin their referent) is the essence of my argument for why we should adopt a mostly-copying strategy that pins the references reachable from the stack (or reachable from & references that are in ~ objects recursively reachable from the stack). I admit that there remain two strategies for pinning here: 1. Pin based on immediate-reachability from the stack (or the ~heap), as described in the previous sentence

My thoughts here are still the same: I can see how this would straightforwardly handle most cases, but we need to handle all of them, and it's not obvious to me what the story for the remainder is. In particular I'm still concerned about two things:

  1. How would you detect & references stored inside arbitrary (library-defined) non-~ smart pointer types?
  2. How would you detect & references to the interior of a managed object?

or 2. do the pinning when the borrow occurs and then unpin when the borrow expires, which I believe is what you are thinking of.

Something like that. In the tradition of starting with a simple solution which is obviously correct, consider:

pub struct Gc<T> { priv contents: *GcContents<T> }
struct GcContents<T> { pins: Cell<uint>, object: T }
pub struct PinRef<'s, T> { priv contents: &'s GcContents<T> }
pub fn borrow<'s, T>(gc: &'s Gc<T>) -> PinRef<'s, T> {
    unsafe { 
        (*gc.contents).pins.set((*gc.contents).pins.get() + 1);
        PinRef { contents: &*gc.contents }
     }
}
pub fn get<'s, 'x, T>(pr: &'s PinRef<'x, T>) -> &'s T {
    &pr.contents.object
}
impl<T> Drop for PinRef<T> {
    fn drop(&mut self) {
        self.contents.pins.set(self.contents.pins.get() - 1)
    }
}

(I now regret using the word "obviously": it's not obvious to me that I got all of that right.) But in any case, the basic idea that just as with RefCell, borrows are delineated by RAII increment and decrement of a counter, and thereby the GC can tell whether the object is pinned by checking pins > 0 does seem to be obviously correct, and points 1. and 2. from above don't present any additional difficulty. The question is whether it's possible to make a solution that's more efficient than this without compromising its correctness.

I do not yet know how to handle doing the unpin exactly when the borrow expires, which is one reason strategy 2 worries me, although maybe we can do something that just keeps things pinned for longer than strictly necessary, I am not yet sure.

Could you elaborate on what kind of mechanisms you were thinking about here?

@huonw, re: interior references, I think my concerns can be distilled down to the fact that the borrow checker doesn't let you do it. Any time you take out a loan on the interior of an object, whether owned box, tuple, or HashMap, the borrow checker will disallow moving the object before the loan has expired. A lot of thought has gone into the borrow checker to make sure that its rules are a sufficient condition for safety. Any time we want the GC to do something which the borrow checker would reject, I would want to have at least a sketch of a proof that it's still safe, and not just for specific types, but all types.

@pnkfelix
Copy link
Member

pnkfelix commented Feb 7, 2014

@glaebhoerl I'll address your questions in reverse order

  • How would you detect & references to the interior of a managed object?

My assumption is either (1.) we require the owning pointer to the managed object to be kept alive on the stack for as long as it has outstanding &references (since the presence of the owning pointer on the stack will pin the managed object in place), which may require some IR / LLVM-integration effort to keep LLVM from dead-code-eliminating those owning pointers, or (2.) we require a page map for the managed heap, so that &references to interiors of managed objects can be mapped to the metadata for the managed object.

I have more hands-on experience with (2) than (1), but either should be workable.

  • How would you detect & references stored inside arbitrary (library-defined) non-~ smart pointer types?

Let me see if I understand this scenario correctly. Any kind of value, including the smart-pointers holding the &references, will end up being put either: onto the task's stack, into a ~-allocated object, into a gc-heap-allocated object, or into memory acquired externally to the runtime (e.g. via malloc).

The main worrisome case that I can see here is &references stored in malloc'ed memory (since we can handle pointers on the stack or in ~-allocated storage conseratively and we cannot put &references into gc-heap-allocated objects).

But at that point we must be in the realm of user-defined tracers, no? (Or user-annotations on the types that yield tracers, etc; this is a topic I've been sidestepping since I wanted to get the basics working first.)

Or are you talking about the pointer to malloc'ed memory then itself being transmuted to a &reference or a ~reference (and then later transmuted back to a *u8 and then freed in a drop method somewhere)? I'll admit I haven't thought too carefully about this because I had assumed a lot of these use-cases would need to be dealt with via user-annotation.

I continue to worry that I jump into responding to your questions while being unsure that I actually understand the scenarios you are describing...

@pnkfelix
Copy link
Member

pnkfelix commented Feb 7, 2014

�(also I am fully aware that @glaebhoerl has posed questions to me that I have not answered. That's mostly because it takes too long for me to come up with concise answers, while lengthy answers make this long thread even more unmanageable. I wonder whether there is a better forum for us to carry on this discussion...)

@huonw huonw deleted the managed branch December 4, 2014 02:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants