How do remote parts and closure captures work? #754

eudoxia0 · 2023-07-09T06:59:46Z

eudoxia0
Jul 9, 2023

Hello all,

I learned about Val a while back and I'm fascinated by the idea of borrow checking without explicit lifetimes or references. I wrote a post about second-class references and now I'm writing a survey of the different type system approaches to solving memory safety, e.g. linear types and Rust's ownership.

I want to write about how Val addresses memory safety. I'm also considering whether my own programming language would benefit from switching to second-class references for greater simplicity. It seems like making references second-class is an obvious improvement over Rust-like explicit lifetimes, with possibly a slight loss of safety for certain kinds of code.

My understanding of Val so far is:

References are not first-class but rather a parameter passing mode.
- Because they can't be stored in structs or returned from functions, lifetime analysis is not needed.
- "Borrow checking" is just a disjointness checks at function call sites which is amazing.
Subscripts are kind of like coroutines, they are functions that "return" references but under the hood they run some continuation in an environment where the reference is accessible.

Remote parts exist to get around the fact that you can't store references inside data structures, but I don't know what the semantics are. I tried reading the doc but I think it's meant for compiler developers.

Additionally, the language tour describes closures as first-class values, i.e. can be stored in data structures and returned from functions. And I'm wondering how this interacts with let and inout captures.

I imagine there's a kind of lightweight flow analysis rooted at variables rather than types? I got that from reading other discussions (e.g. here ) and from the remote parts doc, which says:

Arguments to remote parameters in an initializer form lifetime bounds.

And the spec which says:

An object may capture a projection at its initialization. A captured projection is released when the lifetime of the capturing object ends. The captured projections of a closure are defined by its environment. The captured projections of a tuple or instance of a product type are defined by the stored projections [i.e. remote parts] that were used to initialize that object.

What I'm not sure about is where the information that tells the compiler "this {closure|remote} must not outlive this other variable" is represented.

(Normally I'd try to answer these questions myself, by running the compiler, but I failed to build it on NixOS.)

Also: do you expect Val to have something like Rc or Arc in Rust, for cases where you want first-class references without unsafe pointers?

kyouko-taiga · 2023-07-09T15:07:53Z

kyouko-taiga
Jul 9, 2023
Maintainer

Hi @eudoxia0, thanks a lot for your interest in Val!

I'm impressed by how far you've been able to get, putting isolated pieces of information together. I'm more than happy to help connect the missing dots.

My understanding of Val so far is:

References are not first-class but rather a parameter passing mode.

Because they can't be stored in structs or returned from functions, lifetime analysis is not needed.

"Borrow checking" is just a disjointness checks at function call sites which is amazing.

That's about right but with a couple of asterisks. Your observation would be correct in a strict MVS model where all values would either be copyable, like in Swift, or behave linearly, like in Wadler-style linear languages (scientific literature enthusiasts may be interested in this paper).

Val has a linear type system because, though we took a lot from Swift, we also wanted to support non-copyable types. But linear types are quite impractical, so Val uses a few tricks to improve the user experience, introducing second-class references not only a function boundaries, but also in local scopes.

For example, consider this program:

type MoveOnly: Movable, Deinitializable {
  public var s: String
  public memberwise init
}

public fun main() {
  var x = MoveOnly(s: "Hello, World!")
  let y = x
  print(y.s) // Prints "Hello, World!"
  &x.s = "See you!"
  print(x.s) // Prints "See you!"
}

Here, though x can't be copied, we are still able to read its contents through y, which is a second-class reference. But for all intents and purposes, it can be thought of as an independent value, so the user doesn't need to think about aliasing. It's the same idea that we've been able to use at function boundaries with passing conventions.

Obviously, the ability to create second-class references in local scopes can introduce hazards, so we need lifetime analysis.

I imagine there's a kind of lightweight flow analysis rooted at variables rather than types?

Yes! And it's precisely because we can root the analysis at variables that we can keep the type system fairly simple. If you think about it, baking lifetime annotations in types is just a way to teach the same variable-oriented analysis what to do when it sees a function call.

In broad strokes, this analysis works by constructing the useful lifetime of every binding and checking that they cannot overlap in an undesirable way. We'll see what that means for captures and remote values below.

Subscripts are kind of like coroutines, they are functions that "return" references but under the hood they run some continuation in an environment where the reference is accessible.

Exactly.

Remote parts exist to get around the fact that you can't store references inside data structures, but I don't know what the semantics are. I tried reading the doc but I think it's meant for compiler developers. [...] What I'm not sure about is where the information that tells the compiler "this {closure|remote} must not outlive this other variable" is represented.

Let's talk a little more about the "lightweight analysis".

Let R(b) denote the live-range of a binding b, that is the set of expressions that use b or are sequenced before another expression that uses b. (Note: the analysis works on an intermediate representation that supports a more precise definition of what "using b" means, but we'll just waive our hands a little here.)

Intuitively, R(b) relates to the useful lifetime of b, but it is not enough to deal with projections and captures. For example, consider the clearly illegal program below. The useful lifetime of y must cover the last statement because z is a projection of (a part of) y.

public fun main() {
  var x: Array = [[1, 2], [2, 3]]
  let y = x[0]
  let z = x[1]
  &x.remove_all()
  print(z)
}

We say that some operations are extending lifetimes. That is the case of a projection but also the formation of a lambda with borrowed captures. In our example, we can say that the extended live-range of y, written R+(y), is R(y) combined with R+(z), which is R(z) since no expression extends the lifetime of z.

Because y projects x immutably, we must guarantee that x cannot change during the useful lifetime of y. So we look at all expressions in R+(y) and check if there's one that modifies x.

The exact same process applies on lambda captures. Once we've figured out the set of bindings borrowed by a lambda, we can determine which lifetimes it is extending. For example:

public fun main() {
  var x: Array = [[1, 2], [2, 3]]
  let f = fun (_ i: Int) { x[0][i].copy() }
  &x.remove_all()
  print(f(1))
}

Because f projects x immutably in its body, we must guarantee that x cannot change during the useful lifetime of f. So we look at all expressions in R+(f) and check if there's one that modifies x.

To deal with remote parts, we need to say that constructors become lifetime-extending expressions if they accept a remote type as parameter:

public fun main() {
  var x: Array = [[1, 2], [2, 3]]
  let t: {a: remote ArraySlice<Int>, b: Int} = (a: x[0], b: 1)
  &x.remove_all()
  print(t.a[t.b])
}

At this time, I can only confidently say that this model works for constructors of structural types and memberwise initializers. I think I'll be able to make remote types work with arbitrary constructors and perhaps even arbitrary functions, but I'll need a little more time to think about the problem.

0 replies

dabrahams · 2023-07-09T16:04:37Z

dabrahams
Jul 9, 2023
Maintainer

Agree, @eudoxia0, you've done really well in understanding Val!
I can fill in some gaps in @kyouko-taiga's answer.

Remote parts exist to get around the fact that you can't store references inside data structures, but I don't know what the semantics are. I tried reading the doc but I think it's meant for compiler developers.

Basically, the rules for instances of types with remote parts are the same as the rules for bare local “2nd-class references”: they aren't allowed to escape their local scope, which allows us to always reason about lifetime and access in a context where the necessary information is available.

The reason I put “references” in quotes is because as @kyouko-taiga said, the semantics of let and inout bindings are really those of independent values. They don't have what I consider the defining property of references: that operating on a variable can have non-local effects on another variable.

Additionally, the language tour describes closures as first-class values, i.e. can be stored in data structures and returned from functions. And I'm wondering how this interacts with let and inout captures.

The let and inout captures of a closure are exactly remote parts. A closure with such captures can't escape its local scope.

What I'm not sure about is where the information that tells the compiler "this {closure|remote} must not outlive this other variable" is represented.

It's captured in a dependency analysis that is performed entirely on the local scope of a function.

(Normally I'd try to answer these questions myself, by running the compiler, but I failed to build it on NixOS.)

(I'm curious as to what the problems were.)

Also: do you expect Val to have something like Rc or Arc in Rust, for cases where you want first-class references without unsafe pointers?

I'm sure someone will build these types, but we're reluctant to have them in the standard library. For us, proliferating references (to shared mutable state) is an anti-pattern because they expose the need for non-local reasoning about mutation that can't be checked by the compiler. When we consider where such things might be useful, we always want to encapsulate them inside data structures with value semantics, e.g. in the implementation of a doubly-linked list type, whose public API does not expose references. Implementing such data structures is generally an expert-level programming exercise, even if you have Rc or Arc, and because one normally wants really high efficiency inside these API boundaries. At that point, we are not afraid to tell people to build a safe abstraction using unsafe parts (plain pointers).

Note: I think your (nice!) article maybe misinterprets the motivation behind our discussion about representing C++-style iterators. We consider those to be an anti-pattern also, but since we've been thinking about how to interoperate with C++ code, how to adapt is still an interesting question. A Val iterator would simply store the whole collection as a remote part, much the way Swift iterators store a (CoW'd) copy of the collection.

0 replies

eudoxia0 · 2023-07-11T22:50:53Z

eudoxia0
Jul 11, 2023
Author

Thank you both for the very detailed responses!

For example, consider this program:

So in Rust this might be:

fn main() {
  let mut x: MoveOnly = MoveOnly("Hello, World!");
  let y: &'a = &x;
  print(y.s); // Prints "Hello, World!"
  &x.s = "See you!";
  print(x.s); // Prints "See you!"
}

I think that's clear enough.

The useful lifetime of y must cover the last statement because z is a projection of (a part of) y.

Did you mean to write:

public fun main() {
  var x: Array = [[1, 2], [2, 3]]
  let y = x[0]
  let z = y[1]
  &x.remove_all()
  print(z)
}

Or, alternatively, did you mean that the lifetime of x (not y) must cover the last statement since z is a projection of a part of x (not y)?

We say that some operations are extending lifetimes. That is the case of a projection but also the formation of a lambda with borrowed captures.

This makes sense to me. So the analysis keeps a kind of tree of binding-parent relationships. And in a program like:

let x = f();
let y = x.foo;
let z = y.bar;
let w = z.baz;

The live range of each binding is:

But the compiler knows the transitive closure of the dependency relationship:

Which it then uses to calculate the extended live range:

(I hope this is at all intelligible)

Because y projects x immutably, we must guarantee that x cannot change during the useful lifetime of y. So we look at all expressions in R+(y) and check if there's one that modifies x.

Yep.

At this time, I can only confidently say that this model works for constructors of structural types and memberwise initializers. I think I'll be able to make remote types work with arbitrary constructors and perhaps even arbitrary functions, but I'll need a little more time to think about the problem.

So right now my mental model is:

You can create references at function call sites and that's trivially safe (only need disjointness analysis).
You can construct objects that store remote parts, or closures that capture bindings, as long as the constructor / lambda expression is on the right side of a binding declaration, because then the compiler easily knows which bindings associate to what.

This model makes sense to me. It requires some of the code to me written in something like A-normal form, to help the analysis, but that's good from a simplicity perspective.

I guess what I don't understand is, if lifetimes are not explicit in the types, how does this generalize to more complex, nested expressions, something like:

let y = f(g(h(x)))

I think the answer is: it doesn't (which is fine), because in the case where there's function indirection, then you'd need a more general type system with lifetime parameters in the types?

(Will reply to Dave in a second post because I might have to run soon!)

0 replies

dabrahams · 2023-07-12T03:00:48Z

dabrahams
Jul 12, 2023
Maintainer

guess what I don't understand is, if lifetimes are not explicit in the types, how does this generalize to more complex, nested expressions, something like:
let y = f(g(h(x)))
I think the answer is: it doesn't (which is fine), because in the case where there's function indirection, then you'd need a more general type system with lifetime parameters in the types?

I'm not sure I understand what generalization you're concerned about. That's exactly equivalent to:

let y0 = h(x)
let y1 = g(y0)
let y = f(y1)

and you can equivalently analyze the expansion, which has no nesting, if you prefer.

0 replies

eudoxia0 · 2023-07-12T21:25:32Z

eudoxia0
Jul 12, 2023
Author

Basically, the rules for instances of types with remote parts are the same as the rules for bare local “2nd-class references”: they aren't allowed to escape their local scope, which allows us to always reason about lifetime and access in a context where the necessary information is available.

This makes sense, but how does it work for generic functions? e.g., if a function takes a generic type parameter T, does it have to assume conservatively that values of T behave like second-class references? Or is this handled by specifying what protocols a type parameter must implement?

The reason I put “references” in quotes is because as @kyouko-taiga said, the semantics of let and inout bindings are really those of independent values. They don't have what I consider the defining property of references: that operating on a variable can have non-local effects on another variable.

But isn't something like:

var s = "Hello, world!"
inout s' = s
s' = "Goodbye, world!"
print(s) // Should print "Goodbye, world!"?

An example of spooky action at a distance?

(I'm curious as to what the problems were.)

Just that swiftc can't find Foundation even though I've installed the package.

/home/eudoxia/Downloads/val/Package.swift:1:8: error: no such module 'Foundation'
import Foundation

Anyways, this is just obscure Nix problems, please don't waste your time on it. I can probably get the Dockerfile working, but I didn't see it before.

I'm sure someone will build these types, but we're reluctant to have them in the standard library. For us, proliferating references (to shared mutable state) is an anti-pattern because they expose the need for non-local reasoning about mutation that can't be checked by the compiler. When we consider where such things might be useful, we always want to encapsulate them inside data structures with value semantics, e.g. in the implementation of a doubly-linked list type, whose public API does not expose references.

Makes sense.

A Val iterator would simply store the whole collection as a remote part, much the way Swift iterators store a (CoW'd) copy of the collection.

Is the distinction from C++ iterators here that iterator invalidation is impossible because of lifetime analysis?

I'm not sure I understand what generalization you're concerned about.

Yeah I'm not explaining this very well. Let me try again.

Since remote parts don't have lifetime parameters (like in Rust), I wonder how you prevent unsoundness when, for example, putting a record with a remote part in a collection. Without type-level lifetimes, doesn't this "anonymize" the references?

It's easy to convince myself a Rust-like model with explicit lifetimes is sound, because references with distinct lifetimes are distinct types, and the lifetime is like a lexically-scoped type, so they trivially can't be confused with one another or leaked.

Wait, I think I got it: is remote a property of the type or of the record field?

10 replies

dabrahams Jul 16, 2023
Maintainer

have a look at the way you're supposed to use them in algorithms!

Which examples from the paper demonstrate what you mean?

kyouko-taiga Jul 17, 2023
Maintainer

Which examples from the paper demonstrate what you mean?

The implementation of append in Fig. 4. Apart from the terribly verbose type annotations (to be fair, the paper doesn't propose a surface language), the main issue is the need to explicitly pack and unpack capabilities. AFAIK, this problem affects all systems with first-class capabilities.

Some of the ideas behind alias-types have been implemented in Mezzo, which was a dialect of OCaml with a quite decent syntax for capabilities. Mezzo also proposed an interesting approach to deal with shared mutation, called adoption, and that leverages dynamic checks (see §7). IMO it was a very promising lead that perhaps could be worth revisiting one day.

But I think I've now successfully stirred the discussion away from the original topic. My apologies.

dabrahams Jul 17, 2023
Maintainer

Perhaps I misunderstood, but I thought figure 4 demonstrated a DPS optimization that could be automated.

kyouko-taiga Jul 19, 2023
Maintainer

AFAIK the paper doesn't present any approach to get to the optimized algorithm. It only shows how the algorithm can be type checked, assuming it's the result of some type-preserving optimizer. I find this point a bit hand-wavy because it is not obvious how one can build such an optimizer. The paper simply cite related work, but existing techniques do not take existential alias types into consideration.

Regardless, my point is that reasoning about the algorithm in Fig. 4 is too hard.

dabrahams Jul 20, 2023
Maintainer

OK; it sounds like we don't really know how to interpret that paper. Maybe a discussion with the paper's authors could be enlightening.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Hylo Group

How do remote parts and closure captures work? #754

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 10 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

The Hylo Group

How do remote parts and closure captures work? #754

eudoxia0 Jul 9, 2023

Replies: 5 comments · 10 replies

kyouko-taiga Jul 9, 2023 Maintainer

dabrahams Jul 9, 2023 Maintainer

eudoxia0 Jul 11, 2023 Author

dabrahams Jul 12, 2023 Maintainer

eudoxia0 Jul 12, 2023 Author

dabrahams Jul 16, 2023 Maintainer

kyouko-taiga Jul 17, 2023 Maintainer

dabrahams Jul 17, 2023 Maintainer

kyouko-taiga Jul 19, 2023 Maintainer

dabrahams Jul 20, 2023 Maintainer

eudoxia0
Jul 9, 2023

Replies: 5 comments 10 replies

kyouko-taiga
Jul 9, 2023
Maintainer

dabrahams
Jul 9, 2023
Maintainer

eudoxia0
Jul 11, 2023
Author

dabrahams
Jul 12, 2023
Maintainer

eudoxia0
Jul 12, 2023
Author

dabrahams Jul 16, 2023
Maintainer

kyouko-taiga Jul 17, 2023
Maintainer

dabrahams Jul 17, 2023
Maintainer

kyouko-taiga Jul 19, 2023
Maintainer

dabrahams Jul 20, 2023
Maintainer