Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write design doc for yoke #1459

Merged
merged 9 commits into from
Jan 8, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 8 additions & 10 deletions utils/yoke/design_doc.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,18 @@
# Yoke: Lifetime Erasure for Rust
# Yoke: Self-Referential Borrowing for Rust
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sffc I'd really prefer to call this "lifetime erasure", "self referential borrowing" is a huge field in Rust with many crates already. When explaining what yoke does to other rustaceans I've found "lifetime erasure" to be quite clear since it makes a clear analogy with "type erasure" (dyn)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see lifetime erasure as being a means to an end. Yoke applies its lifetime erasure to a very narrow use case, which is self-references. I took another stab at the headline which mentions both things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still kinda feel like the relationship is inverted: the point of Yoke is lifetime erasure, and it accomplishes that by self referential borrowing. There's no way lifetime erasure can work without self referential borrowing as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately, to me, "self referential borrowing" does not tell me much because it covers so many things. "lifetime erasure" tells me exactly what this crate is useful for: I want to turn a compile time lifetime into an erased runtime one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like, to be clear, by "lifetime erasure" I'm not talking about the fact that yoke replaces lifetimes with static as a means to an end.

I'm making an analogy against type erasure: turn a compile time construct into a runtime one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me write it out in the doc


## Problem statement

Zero-copy deserialization is a very effective way to speed up programs and avoid allocations, however can lead to lifetimes pervasively spreading throughout the codebase, and prevents using diverse memory management techniques like caching.
Zero-copy deserialization is a very effective way to speed up programs and avoid allocations. However, the requirement that data is borrowed from somewhere else means that:

It would be nice if it were possible to "erase" lifetimes and turn them into dynamically managed lifetimes (similar to type erasure with `dyn`) to allow for more flexible memory management.
1. All data types that contain zero-copy data, even indirectly, need to carry a lifetime parameter
2. Certain memory management techniques are hampered, like caching.

The goal of Yoke is to allow the borrowing of data from self, so that we don't need lifetime parameters to track data ownership, and to enable reference-counted data that can be safely dropped from a cache.
Manishearth marked this conversation as resolved.
Show resolved Hide resolved

## Background

[ICU4X](https://github.com/unicode-org/icu4x) is an internationalization library that has pluggable data loading as a core value proposition. Internationalization often needs a lot of data, and we want to make sure data loading can be fast and efficient. Zero-copy deserialization is quite attractive as a way to reduce this load.

Unfortunately, zero-copy deserialization leads to pervasive lifetimes in anything that consumes this data. The user has to hold on to the data source for as long as the deserialized data is needed, which can be a pain. More sophisticated memory management strategies, like using `Rc<T>` to dynamically cache the source of the data for as long as it's needed, cannot work since lifetimes are purely static constructs.

It would be nice if it were possible to "erase" lifetimes and allow for memory management of zero-copy deserialized data.


## Requirements

- It should be possible to use zero-copy deserialization without storage of the deserialized data introducing a lifetime (<span style="color:red">**required**</span>)
Expand All @@ -30,13 +28,13 @@ The `yoke` crate provides the [`Yoke<Y, C>`][`Yoke`] and [`Yokeable<'a>`][`Yokea

`Yoke<Y, C>` allows one to "yoke" a zero-copy deserialized object (say, a `Cow<'a, str>`) to the source it was deserialized from, (say, an `Rc<[u8]>`), known as a "cart", producing a type that looks like `Yoke<Cow<'static, str>, Rc<[u8]>>` and can be moved around with impunity.

The `'static` is somewhat of a lie, the lifetime of the data the `Cow` borrows from is the lifetime of the `Yoke`, however since this `Cow` cannot be normally extracted from the `Yoke` the lifetime does not matter.
The `'static` is somewhat of a lie: it is actually a self-referential lifetime. The `Cow` is allowed to borrow data from the cart (the `Rc<[u8]>`), but the Rust compiler does not allow this, so we use `'static`. Since this `Cow` cannot be normally extracted from the `Yoke`, the lifetime is considered an implementation detail.

Most of the time the yokeable `Y` type will be some kind of zero-copy deserializable abstraction, potentially with an owned variant (like `Cow`, [`ZeroVec`](https://docs.rs/zerovec), or an aggregate containing such types), and the cart `C` will be some smart pointer like `Box<T>`, `Rc<T>`, or `Arc<T>`, potentially wrapped in an `Option<T>`.

### Basic functionality

The `Yokeable<'a>` trait is implemented on the `'static` version of any zero-copy type, e.g. `Cow<'static, T>` implements `Yokeable<'a>` (for all `'a`). One can use `Yokeable::Output` on this trait to obtain the "lifetime'd" value of the `Cow<'static, T>`, e.g. `<Cow<'static, T> as Yokeable<'a>'>::Output` is `Cow<'a, T>`.
The `Yokeable<'a>` trait is implemented on the `'static` version of any zero-copy type; for example, `Cow<'static, T>` implements `Yokeable<'a>` (for all `'a`). One can use `Yokeable::Output` on this trait to obtain the "lifetime'd" value of the `Cow<'static, T>`, e.g. `<Cow<'static, T> as Yokeable<'a>'>::Output` is `Cow<'a, T>`.

The key behind this crate is [`Yoke::get()`][get], with a signature as follows:

Expand Down