Plan for HIR lifetimes support #406

Manishearth · 2024-01-11T01:35:51Z

Somewhat depends on conclusion of #405

@QnnOkabayashi wrote lifetimes support for the AST JS backend and also designed lifetimes stuff for HIR, including the elision module which basically figures out lifetime elision in methods (quite challenging!!).

Unfortunately, we never wrote an HIR backend back then so we didn't really get to battle test the APIs.

We do have actual backends using HIR now (C2, CPP2, Dart, soon hopefully JS: @robertbastian and I are working on it). We now know what they look like.

What I'm hoping is that the backends have to do as little work as possible to get the information they need. This may involve precomputing more info about lifetimes than we do already.

The status quo is that every method contains a LifetimeEnv (which tracks bounds, etc, and has already dealt with elision). You can also get a BorrowingFieldsVisitor from it, which eventually tells you which input lifetime corresponds to which output field lifetime.

Every type name (as found in a method or struct field, not a def) has a TypeLifetimes that lists which of the lifetimes of the surrounding method/struct are found in its path, and in what order. These can be converted to MethodLifetimes, that are in the context of a surrounding method.

This is a start, but I think we need more. The end state I would like to see is (partially started in #404) :

Every struct and opaque also contains a LifetimeEnv<Type>. Since structs don't have fancy elision going on we can construct these in a much simpler way.
There is a way to get an exhaustive Lifetimes for every struct field: basically something like LifetimeEnv<Type>.new_lifetimes_for(&field), producing a list of relevant indices. This list is transitively evaluated.
Backends generating a struct basically need to have one ownership edge param per struct input lifetime in the from-FFI struct ctor. They then use new_lifetimes_for() to make sure the struct ctor correctly flows lifetimes into its fields.
Backends generating an opaque need to have one ownership edge param for self borrows, plus one per input lifetime. They use LifetimeEnv's existing lifetimes iterator for this.
Backends generating a method use the BorrowingFieldsVisitor to figure out which input param/field corresponds to which output lifetime. Once it has constructed the edge arrays, it flows them correctly.
Ideally we can have a higher level API than BorrowingFieldsVisitor where
- Construct a MethodBorrowContext
- Construct a list of edge arrays, one per input lifetime
- You call enter_field() exit_field() when converting all method params and their struct fields to Rust types
- For each field, the MethodBorrowContext will return a MethodLifetimes corresponding to which resolved set of method lifetimes that field corresponds to. At that point, you can add the variable name of the relevant borrowed object to the relevant edges arrays.
  - Note that the relevant borrowed object is not always the field/param: sometimes e.g. in JS we'll be converting a type into an ArrayBuffer first, and that buffer is what needs to be borrowed.
  - This might get annoying for nested fields since we'll need to make variables at the top level for each one
  - An alternate way to do this is generate code for edges_a, edges_b, etc arrays, and just push to the array when you get here.
- Now, when generating the output type, we do something similar with enter_output_field() etc, and it will return one or more MethodLifetimes for each field that correspond to each edge for that field.
- For lazy structs, MethodBorrowContext would need to be tweaked based on the conclusion of (Should backends be allowed to lazy-destructure structs? #405). If we take borrowing strategy C, then it doesn't need to visit fields, only params, which is simpler.

The end result of this would be that adding lifetime support to a backend that eagerly constructs structs (see #405) is just a matter of sprinkling in a couple method calls; nothing fancy. We can have BorrowingFieldsVisitor be an internal type that contains the actual field visiting logic.

The text was updated successfully, but these errors were encountered:

Manishearth · 2024-01-11T01:36:32Z

cc @robertbastian @sffc

Manishearth · 2024-01-30T02:21:28Z

Going to flesh out the different possible approaches for #410 (comment) . Focusing on Dart here but JS will need to take basically the same route.

To start out with, here are the things I am trying to care about:

'static is not the same as owned, and we should be careful about that
Methods that do not borrow should not pay a cost
We should hold on to the absolute minumum number of edges, as much as possible.
Methods that do borrow should pay a minimal cost:
a. Ideally they do not spend time building lifetime edge lists that are unneeded
b. Ideally they build as much of the edge list up front as possible.
Ideally methods do not need to recurse into fields. Each type can handle its own internal borrowing edges. This can sometimes be in conflict with (4a).

Here's a rundown of the approach in #410 for this, which does not handle slices:

Each opaque accepts a list of lifetime edges on construction from FFI, including a self-edge for the self lifetime
Each struct accepts a list of lifetime edges on construction from FFI, and immediately passes it down to its constituent opaques (and such)
Each struct can also produce an array for each of its lifetimes, listing each field that contains that lifetime (_fields_for_lifetime_a())
Struct lifetime functions do not handle lifetime transitivity because all lifetime dependencies on a struct will exist on a method too. This might be a mistake.
In a method, we first construct an edge array for each edge used by the return type by iterating parameters and stuffing them in the right arrays (and in the case of structs, spreading in _fields_for_lifetime_whatever()). Then we pass down the edge arrays to the returned type.

A core part of this design is that structs are considered fully transparent: they do not themselves hold on to lifetime edges; but they know how to pass them down to their fields, or extract the right fields for a given lifetime. Changing this would mean teaching the structs to track mutation, which risks leaking things since you can't know for sure which lifetime edges to drop after mutation.

This doesn't handle slices since slices do not borrow from the slice parameter, they borrow from temporary intermediary arenas.

Approaches for handling slices

For the purpose of this comment, .pointer() refers to whatever "to FFI" method we add for structs. This is currently .pointer(), but once we update past dart-lang/sdk#45697 we will be able to do it without extra arenas and pointers and should probably just call it _toFfi()

Let's deal with the following code:

struct Bag<'x, 'y, 'z> {
   slice: &'x [u8],
   opaque: &'y Opaque<'z>
}

// on opaque Foo
fn foo<'a, 'b, 'c>(bag: Bag<'a, 'b, 'c>, other_slice: &'c [u8] ) -> OtherOpaque<'a, 'c>;

struct Opaque<'x>;
struct OtherOpaque<'x, 'y>;

Currently this will generate something like this

OtherOpaque foo(Bag bag, ByteBuffer bytes) {
   final temp = ffi2.Arena();
   List edges_for_lifetime_a = [...bag.fields_for_lifetime_x()]; // this produces `[bag.slice]`
   List edges_for_lifetime_c = [bytes, ...bag.fields_for_lifetime_z()]; // This ends up being [bytes, bag.opaque]

   final result = _Foo_foo(fields._pointer(temp), other_slice.pointer(temp)); 
   return OtherOpaque.(result, /* isOwned */ true, edges_for_lifetime_a, edges_for_lifetime_c);
}

(ignore my Rusty naming style right now: we should fix that for camelCase but it's not urgent)

Approach A: `.pointer()` appends to edge arrays

One way to do this is to have .pointer() take in nullable arguments for each edge array, which it can then append additional stuff to.

OtherOpaque foo(Bag bag, ByteBuffer bytes) {
   final temp = ffi2.Arena();
   final arenaForOtherSlice = makeAllocatorWithFinalizer();
   List edges_for_lifetime_a = [...bag.fields_for_lifetime_x()]; // this produces `[bag.slice]`
   List edges_for_lifetime_c = [arenaForOtherSlice];

   final result = _Foo_foo(fields._pointer(temp, [edges_for_lifetime_a], null, [edges_for_lifetime_b]), other_slice.pointer(temp)); 
   return OtherOpaque.(result, /* isOwned */ true, edges_for_lifetime_a, edges_for_lifetime_c);
}

Then in Bag::_pointer() something like this happens

// edge arrays are double lists because `x` may correspond to multiple caller lifetimes due to interdependencies!
Pointer<BagFfi> _pointer(Allocator temp, List<List>? edges_x, List<List>? edges_y, List<List>? edges_z) {
  final pointer = temp<BagFfi>();  

  Allocator arenaForSlice = temp;
  if (edges_x != null) { // unclear if we should use nullable arrays here or just emptyable arrays, don't want to pay array costs
       arenaForSlice = makeAllocatorWithFinalizer();
       for (edge in edges) { edge.append(dest)}
  }
  pointer.ref.slice._pointer = slice.pointer(arenaForSlice);
  // copy over length as well
  pointer.ref.opaque = opaque._underlying;

  return pointer;
}

The makeAllocatorWithFinalizer + edge.append(dest) for edge in edges pair of operations can probably be made into a single helper.

A slightly different approach would be for ByteBuffer.pointer() to become:

  ffi.Pointer<ffi.Uint8> pointer(Allocator alloc, List<List>? edges) {
    ffi.Allocator dest = alloc;
    if (edges != null) {
       dest = makeAllocatorWithFinalizer();
       for (edge in edges) { edge.append(dest)}
   }
    return alloc<ffi.Uint8>(length)..asTypedList(length).setRange(0, length, asUint8List());
  }

I'm less happy about this because in the 99% of cases where the buffer isn't doing any borrowing stuff we'll end up doing additional checks. However, this simplifies the code for everyone else since it boils down to passing down a list of edge arrays.

This generalizes for nested structs pretty well, where _pointer would need to do something like this:

Pointer<BagFfi> _pointer(Allocator temp, List<List>? edges_x, List<List>? edges_y, List<List>? edges_z) {
  final pointer = temp<BagFfi>();  

  pointer.ref.nested_struct = NestedStruct._pointer(temp, edges_x, edges_y); // assume a field `nested_struct: NestedStruct<'x, 'y>`
  // copy over length as well
  pointer.ref.opaque = opaque._underlying;

  return pointer;
}

In this approach the only code that needs to handle interdependencies between lifetimes is methods; everything else will get passed down in the edge arrays.

Approach B: Everyone appends to edge arrays

Instead of specialcasing structs and slices, we could have the edge arrays initialized to [] and have each Dart-to-FFI codepath take in List<List> for each of its edges, and appending to the relevant ones. I think this is less good from a perf standpoint.

On the other hand, it means that the responsibilities of edge-management are clear and not type-dependent.

(I can write out examples if needed but personally I'd like us to do Approach A and I'm planning on signing off for the day soon)

Approach C: Abandon modularity

We can do what the JS backend currently does and have methods handle transitive struct construction. This effectively gets rid of our _pointer() and _ on our Dart structs, instead making methods regen this code. Then they can use the BorrowingFields visitor to link up the right fields, no problem.

Will still be gnarly.

Not preferred since modularity is nice.

Manishearth self-assigned this Jan 11, 2024

QnnOkabayashi mentioned this issue Jan 17, 2024

Add Type::lifetimes to make getting lifetimes out of Type easier #409

Merged

Manishearth mentioned this issue Jan 24, 2024

Support elided lifetimes in return types #411

Open

Manishearth added this to the ICU4X 2.0 milestone Feb 2, 2024

Manishearth mentioned this issue Feb 2, 2024

Slice borrowing in Dart #421

Closed

Manishearth added the enhancement New feature or request label Feb 2, 2024

Manishearth mentioned this issue Feb 21, 2024

Correct lifetimes for borrowing of dart slice *parameters* #430

Merged

Manishearth mentioned this issue Mar 5, 2024

Handle nested structs in Dart #441

Merged

robertbastian closed this as completed in #441 Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plan for HIR lifetimes support #406

Plan for HIR lifetimes support #406

Manishearth commented Jan 11, 2024

Manishearth commented Jan 11, 2024

Manishearth commented Jan 30, 2024

Plan for HIR lifetimes support #406

Plan for HIR lifetimes support #406

Comments

Manishearth commented Jan 11, 2024

Manishearth commented Jan 11, 2024

Manishearth commented Jan 30, 2024

Approaches for handling slices

Approach A: .pointer() appends to edge arrays

Approach B: Everyone appends to edge arrays

Approach C: Abandon modularity

Approach A: `.pointer()` appends to edge arrays