Implement a canary edition system #201

mara-schulke · 2023-12-15T11:10:50Z

This PR implements an edition system for buffrs manifests. This allows us to version manifests properly and show users errors (or map the manifests internally behind the scenes)

Error:   × could not deserialize Proto.toml
  ╰─▶ TOML parse error at line 1, column 1
        |
      1 | edition = "0.7"
        | ^^^^^^^^^^^^^^^
      unsupported manifest edition, supported editions of 0.8.0 are: 0.8

This is a preliminary feature to roll out before driving wide adoption of buffrs to be able to still make breaking changes (and introduce new features) while having an existing user base.

Closes #64

Canary Edition

The canary edition correlates (and de/encodes to 0.x where x is the current minor version of buffrs). This means that until the stable release there is an option to introduce potentially breaking changes with every minor release.

Unknown Edition

Every other manifest release edition that is not matching the current canary one is deserialized into unknown and thus triggers an error for users.

Introduces

pub enum Edition – The edition enum
pub const CANARY_EDITION – The current canary edition of this crate

Changes

Manifest / RawManifests internal layouts to accommodate the edition logic
Implements a custom serializer / deserializer for branching and adding backwards compatibility
The semantics of buffrs package: The compressed package will always contain a manifest with the current canary edition inside. Even if the "source" manifest didn't contain any edition pin
The edition is optional in the Porto.toml and inferred to be the canary edition, which means that users are responsible for keeping it compatible (this also ensures that this feature works out of the box with existing projects!)

Examples

Editioned (supported)

edition = "0.7"

[package]
name = "xyz"
version = "xyz"
type = "lib"

Uneditioned (supported)

[package]
name = "xyz"
version = "xyz"
type = "lib"

Unknow / Outdated (unsupported)

edition = "0.6"

[package]
name = "xyz"
version = "xyz"
type = "lib"

This causes the following error:

Error:   × could not deserialize Proto.toml
  ╰─▶ TOML parse error at line 1, column 1
        |
      1 | edition = "0.6"
        | ^^^^^^^^^^^^^^^
      unsupported manifest edition, supported editions of 0.8.0 are: 0.8

Open Points

After the review of the core concept of editions the following points are open and must be addressed before merging:

Document this in the guide
Write tests for this

src/manifest.rs

Cargo.toml

docs/src/reference/editions.md

src/manifest.rs

xfbs · 2023-12-19T10:10:15Z

I just want to add one thing here, which is a pattern that I have used before that I found rather useful for when you have versioned data structures and getting them to parse easily with serde. From glancing at the code in this PR, it looks like this strategy might be useful here as well. Some of this explanation includes stuff we are already doing, but I repeat it here for completness' sake.

The idea is that we use serde's support for internally tagged enums to get the decoding for free. To do this, we can have an Edition enum, and a Manifest struct (this one is internal, it can change at any time):

// could be auto-derived from VersionedManifest using strum, for better maintainability
#[derive(Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum Edition {
    V1,
    V2,
}

// internal representation of a manifest (we are free to change this, as it is now decouple from the
// representation on disk).
#[derive(Serialize, Deserialize)]
pub struct Manifest {
    edition: Edition,
    authors: Vec<String>,
    frobnicate: bool,
}

Now, when we actually parse the manifests, we use a different type called a VersionedManifest (this is raw data that we parse from the filesystem). Note that this is now an enumeration.

// external representation of a manifest (add new variants here for new editions)
#[derive(Serialize, Deserialize)]
#[serde(tag = "edition", rename_all = "snake_case")]
pub enum VersionedManifest {
    V1(ManifestV1),
    V2(ManifestV2),
}

To translate this: with the tag = "edition", we are telling serde that in our TOML there is a field called edition, which it should parse first to figure out which variant it should match. We also tell it to snake_vase the variant names. So basically, the edition field can now be v1 and v2. We are also able to override them with something else using #[serde(tag = "something")].

The variants of this enumeration should obviously align with the Edition enum. In fact, the Edition enum can be automatically derived from this VersionedManifest using the strum crate, if desired.

Now, we can define these manifests. These are public-facing and may never change (part of the public API, if you wish):

// on-disk representations for manifests, one for every edition
#[derive(Serialize, Deserialize)]
pub struct ManifestV1 {
    // these fields can be public because they may never change
    pub author: String,
}

#[derive(Serialize, Deserialize)]
pub struct ManifestV2 {
    pub authors: Vec<String>,
    pub frobnicate: bool,
}

You can see that these two have different shapes — one has an author which is a single string, and one has a list of authors. This is fine — we can totally change the shape between versions. We have full flexibility to make any kind of breaking changes in between editions.

Finally, we write a bit of glue code to convert a VersionedManifest into our internal Manifest representation:

// this is where we translate the public-facing versioned manifest into our
// internal representation. we can change our internal representation at any point
// and change how we do this translation, as long as we don't change the semantics
impl Into<Manifest> for ManifestV1 {
    fn into(self) -> Manifest {
        Manifest {
            edition: Edition::V1,
            authors: vec![self.author],
            frobnicate: false, // fallback to default value
        }
    }
}

impl Into<Manifest> for ManifestV2 {
    fn into(self) -> Manifest {
        Manifest {
            edition: Edition::V2,
            authors: self.authors,
            frobnicate: self.frobnicate,
        }
    }
}

impl Into<Manifest> for VersionedManifest {
    fn into(self) -> Manifest {
        match self {
            VersionedManifest::V1(manifest) => manifest.into(),
            VersionedManifest::V2(manifest) => manifest.into(),
        }
    }
}

What is nice about this is that now, the two toml files parse properly:

edition = "v1"
author = "Snoop Doggy Dogg"

and this one as well:

edition = "v2"
authors = ["Snoop Doggy Dogg", "Dr. med. Dre"]
frobnicate = true

And yet, once we have parsed them both fully and turned them into a Manifest, they produce the same shape of data (but yet we still have that edition field, so that we can in the code have different behaviours depending on which edition you have chosen, if you really want):

Manifest {
    edition: Edition::V1,
    authors: vec!["Snoop Doggy Dogg"],
    frobnicate: false,
}

I'm a really big fan of how easy serde makes it to compose things like this. We could even have support for unknown editions, by writing something like this:

// the untagged makes it try to match the input as a Known variant first, and if it cannot, fall
// back to keeping it as a string. so "v1" => Known(V1), "v4" => Unknown("v4").
#[derive(Serialize, Deserialize)]
#[serde(untagged)]
pub struct RawEdition {
    Known(Edition),
    Unknown(String),
}

Not that we would ever want that, probably, but the way this stuff composes makes it possible to mix strongly-typed stuff with a fallback to loosely-typed, dynamic data.

Maybe this is something we could implement here as well, it might save us some custom serde parsing code?

But Patrick, what about backwards compatibility?

Well I am glad you asked. This is thankfully easily possible. We can introduct a LegacyManifest struct, or we can simply treat manifests with missing editions as a specific, known edition. Here is how to treat ones with missing editions as a V1 manifest:

// the untagged means we will parse it first as a VersionedManifest, and fall back to the second case.
#[derive(Serialize, Deserialize)]
#[serde(untagged)]
pub struct RawManifest {
    Versioned(VersionedManifest),
    Unversioned {
        // capture the edition, if any
        #[serde(default)]
        edition: Option<String>,
        #[serde(flatten)]
        manifest: ManifestV1,
    },
}

impl Into<Manifest> for RawManifest {
    fn into(self) -> Manifest {
        match self {
            Versioned(manifest) => manifest.into(),
            // here we could raise an error if the edition is incorrect
            Unversioned { edition, manifest } => manifest.into(),
        }
    }
}

Now, when parsing everything as this RawManifest, the following toml:

author = "Snoop Doggy Dogg"

Parses as such:

Manifest {
    edition: Edition::V1,
    authors: vec!["Snoop Doggy Dogg"],
    frobnicate: false,
}

So, TL;DR:

We can use tagged enums with serde to parse editions
We can use untagged enums as a fallback (for unknown/unspecified editions)
Type magic makes for rather pleasant code

xfbs · 2023-12-19T10:34:58Z

Another cool trick with serde (while I'm at it): suppose you have some data structure that is typed, but maybe in the future you might add fields to it. You don't care about those fields now, but you also don't want to lose them. So what do you do?

This:

#[derive(Serialize, Deserialize)]
pub struct MyType {
    author: String,
    uuid: Uuid,
    tags: BTreeSet<Tag>,

    // anything else not present when parsing gets dumped into this
    #[serde(flatten)]
    other: BTreeMap<String, serde_json::Value>,
}

Now when you parse something like this:

{
    "author": "Michael Jackson",
    "uuid": "0000-000-0000-000000000000",
    "tags": ["cheese", "pizza", "hummus"],
    "serendipity": "chumsky",
    "direction": "discombobulated",
    "something": ["abc", 12],
}

It parses neatly (losslessly):

MyType {
    author: "Michael Jackson",
    uuid: Uuid::from("0000-000-0000-000000000000"),
    tags: BTreeSet::from(["cheese", "pizza", "hummus"]),
    other: BTreeMap::from([
        ("direction", Str("discombulated")),
        ("something", List(Str("abc"), Num(12))),
    })
}

And when you serialize this, you get back the original JSON.

Neat, right! I think serde is so cool 😆

mara-schulke · 2023-12-19T12:55:24Z

@xfbs so, I agree, and I have used internally tagged serde enums before for these kinds of work. The problem here is that the Canary edition does not serialize to one specific edition, but to a different string on every buffrs version hence the custom serialization logic is needed.

The custom deserializers for the RawManifest enable us to treat the missing tag differently from how serde would behave by default. So to summarize this quickly: Its not possible to teach serde the current implementation's behavior through combinations of rename and tag. This is why I opted to implement this manually.

If you find a way of having untagged variants default to a specific tag + rename a variant dynamically let me know! Im open for suggestions but after carefully consulting the docs and lib of serde this is just not covered by the derive syntax.

xfbs · 2023-12-19T13:19:30Z

@mara-schulke I see! As far as I understand, we can certainly have untagged variants default to a specific tag. However, you are right indeed in that we cannot rename a variant dynamically, this is tracked in serde-rs/serde#450.

As far as I understand, the manually-implemented behaviour can be (almost) reimplemented like this:

#[derive(Serialize, Deserialize)]
#[serde(tag = "edition")]
pub enum TaggedManifest {
    #[serde(rename = "0.7")]
    Canary(CanaryManifest),
}

#[derive(Serialize, Deserialize)]
#[serde(untagged)]
pub enum Manifest {
    Tagged(TaggedManifest),
    Untagged(FallbackManifest),
}

The only caveat being that due to the missing dynamic rename functionality, we have to write rename = "0.7" and we currently cannot write rename = concat!("0.", env!("CARGO_MANIFEST_VERSION_MINOR")). So, if we are okay with having a place in the code that we need to manually update with the current version name (this can be unit tested quite easily), then this approach might be useful to us here.

If we don't want to have to manually update that 0.7, then obviously implementing Serialize and Deserialize manually (as is the current approach) is the only way to go 😢

mara-schulke · 2023-12-19T13:24:16Z

Well, but the current behavior of "untagged" manifests is to deserialize as the current canary edition. I don't think that this is possible as of right now?

xfbs · 2023-12-19T13:38:08Z

So we only have two parsing cases:

edition = "0.7" or missing: treat as current canary
edition = "0.6" or any other, non-current edition: emit warning

Is that correct?

mara-schulke · 2023-12-19T14:33:30Z

Well, no, because 0.7 may be compatible with 0.6, then we would not emit an error.

mara-schulke · 2023-12-22T10:33:45Z

@xfbs @tomkarw @Tpt @qsantos: this is blocking the migration from kicking of, may we either merge this or get concrete change requests?👀🙈

I'm open on reiterating the implementation, I'm just waiting on suggestions / concrete feedback here. If you don't have any, may we shift the discussion to a github discussion / issue and move on?

Tpt

The current state seems a great improvement to me. But I am not one of the project maintainer so it's not my call to make I think.

docs/src/reference/editions.md

src/manifest.rs

xfbs · 2023-12-22T11:40:49Z

Sorry from my side for being slow!

I think this is a really good idea and also necessary for the long-term growth of buffrs. So it is quite important that we get this in. I think the code looks good, therefore I approve this.
I have a bit of uncertainty/lack of understanding of the strategy around the canary editions, backwards compatibility and all that good stuff (but I suppose that is not code review per se but rather some high-level strategy "review")
Random idea: Are we maybe overloading the edition key with the version? Is this what we really want:
```
edition = "alpha"
buffrs_version = "0.7" # requires 0.7 at minimum
```
and
```
edition = "1"
```
Or something like that (aka have separate fields for edition versioning and buffrs minimum versions)? (not sure if we do, just thinking out aloud)
It scares me a little bit that the current approach is that you need to re-publish a package to upgrade to a new edition (and all of the dependencies need to do the same), if I understood that correctly (did I understand that correctly?)
I feel like there are some thoughts that could be better discussed synchronously, because I feel bad when I ask stupid questions or write a wall of text 😄

I think @Tpt left some good comments, and maybe I'm just overthinking it!

mara-schulke · 2023-12-22T12:04:34Z

It scares me a little bit that the current approach is that you need to re-publish a package to upgrade to a new edition (and all of the dependencies need to do the same), if I understood that correctly (did I understand that correctly?)

I think what you need to acknowledge is that you would need to do this either way. If we break the compat (for whatever reason) you can't use old packages. This is already the case as of right now and will always be the case prior to being stable. What this edition system solves is that you gain stability and awareness of when this is the case (and if it is the case, you get the ability to repair it, by mapping the old manifest to the new).

Re mixing editions and versions: buffrs doesn't have editions as rust has them. We only version the manifest through editions thus there is no need to use "alpha", "beta" etc for the manifest version. The only version we track is the stability of the package manager and thus strictly related to the server version string of it. The versioning system described by this pr is basically just formalizing what is already the case: every new minor version (pre 1.0) may be incompatible with the prior one. The manifest may change completely between each version and thus the edition tracks when the manifest was valid / how it should look like and enables us to write migration guides between each edition.

Most likely each breaking change in buffrs (and thus each new minor version) will come along with some, potentially breaking, manifest change. If that is not the case anymore we are approaching stable.

xfbs · 2023-12-22T12:22:47Z

You are obviously much more knowledgeable on buffrs than me, so always take what I say with a grain of salt 😅

If we break the compat (for whatever reason) you can't use old packages. This is already the case as of right now and will always be the case prior to being stable.

We could make a conscious choice to say: while buffrs is unstable (there will be breaking changes in it's API), we are explicitly versioning the manifests and guaranteeing that newer versions will be able to use older manifests. if we wanted to, anyways.

My understanding is that that is not something we want to do, because we want to be able to make many breaking changes or changes to the manifests that are not easily "convertible" (aka bijective)?

Most likely each breaking change in buffrs (and thus each new minor version) will come along with some, potentially breaking, manifest change. If that is not the case anymore we are approaching stable.

Does the manifest change often? I think I was under the impression that it is relatively stable — but I am likely wrong!

Sorry for opening a can of worms with my comment but I think this explanation from you really helped me understand the motivation behind the design the way it is 😊

Implement a canary edition system

026faa0

mara-schulke added this to the Stabilization milestone Dec 15, 2023

mara-schulke requested review from j-baker, xfbs and uschi2000 December 15, 2023 11:10

mara-schulke self-assigned this Dec 15, 2023

mara-schulke added 3 commits December 15, 2023 12:11

Update version

87663a4

Fix clippy issues

78c54f7

Repair tests

e72e717

mara-schulke requested a review from tomkarw December 16, 2023 08:09

Document the edition system

8e3f8cd

mara-schulke commented Dec 18, 2023

View reviewed changes

src/manifest.rs Outdated Show resolved Hide resolved

Update src/manifest.rs

cedcc23

mara-schulke commented Dec 18, 2023

View reviewed changes

src/manifest.rs Show resolved Hide resolved

tomkarw reviewed Dec 18, 2023

View reviewed changes

Cargo.toml Outdated Show resolved Hide resolved

docs/src/reference/editions.md Show resolved Hide resolved

src/manifest.rs Show resolved Hide resolved

src/manifest.rs Show resolved Hide resolved

src/manifest.rs Show resolved Hide resolved

mara-schulke mentioned this pull request Dec 18, 2023

Vendor folder structure seems to be changing or broken #196

Closed

mara-schulke and others added 3 commits December 18, 2023 19:46

Merge branch 'main' into mara/manifest-version

21a630d

Repair package test

fd92aec

Fix tests (again)

d39d7e4

Merge branch 'main' into mara/manifest-version

ec08660

Tpt approved these changes Dec 22, 2023

View reviewed changes

docs/src/reference/editions.md Outdated Show resolved Hide resolved

src/manifest.rs Show resolved Hide resolved

src/manifest.rs Outdated Show resolved Hide resolved

src/manifest.rs Show resolved Hide resolved

src/manifest.rs Show resolved Hide resolved

xfbs approved these changes Dec 22, 2023

View reviewed changes

PR feedback

06b830a

mara-schulke enabled auto-merge December 26, 2023 10:45

mara-schulke merged commit d9a8f58 into main Dec 26, 2023

mara-schulke deleted the mara/manifest-version branch December 26, 2023 10:50

Implement a canary edition system #201

Implement a canary edition system #201

Uh oh!

Conversation

mara-schulke commented Dec 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Canary Edition

Unknown Edition

Introduces

Changes

Examples

Editioned (supported)

Uneditioned (supported)

Unknow / Outdated (unsupported)

Open Points

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xfbs commented Dec 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

But Patrick, what about backwards compatibility?

Uh oh!

xfbs commented Dec 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mara-schulke commented Dec 19, 2023

Uh oh!

xfbs commented Dec 19, 2023

Uh oh!

mara-schulke commented Dec 19, 2023

Uh oh!

xfbs commented Dec 19, 2023

Uh oh!

mara-schulke commented Dec 19, 2023

Uh oh!

mara-schulke commented Dec 22, 2023

Uh oh!

Tpt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xfbs commented Dec 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mara-schulke commented Dec 22, 2023

Uh oh!

xfbs commented Dec 22, 2023

Uh oh!

Uh oh!

mara-schulke commented Dec 15, 2023 •

edited

Loading

xfbs commented Dec 19, 2023 •

edited

Loading

xfbs commented Dec 19, 2023 •

edited

Loading

xfbs commented Dec 22, 2023 •

edited

Loading