Skip to content

Implement a canary edition system #201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Dec 26, 2023
Merged

Implement a canary edition system #201

merged 11 commits into from
Dec 26, 2023

Conversation

mara-schulke
Copy link
Contributor

@mara-schulke mara-schulke commented Dec 15, 2023

This PR implements an edition system for buffrs manifests. This allows us to version manifests properly and show users errors (or map the manifests internally behind the scenes)

Error:   × could not deserialize Proto.toml
  ╰─▶ TOML parse error at line 1, column 1
        |
      1 | edition = "0.7"
        | ^^^^^^^^^^^^^^^
      unsupported manifest edition, supported editions of 0.8.0 are: 0.8

This is a preliminary feature to roll out before driving wide adoption of buffrs to be able to still make breaking changes (and introduce new features) while having an existing user base.

Closes #64


Canary Edition

The canary edition correlates (and de/encodes to 0.x where x is the current minor version of buffrs). This means that until the stable release there is an option to introduce potentially breaking changes with every minor release.

Unknown Edition

Every other manifest release edition that is not matching the current canary one is deserialized into unknown and thus triggers an error for users.


Introduces

  • pub enum Edition – The edition enum
  • pub const CANARY_EDITION – The current canary edition of this crate

Changes

  • Manifest / RawManifests internal layouts to accommodate the edition logic
  • Implements a custom serializer / deserializer for branching and adding backwards compatibility
  • The semantics of buffrs package: The compressed package will always contain a manifest with the current canary edition inside. Even if the "source" manifest didn't contain any edition pin
  • The edition is optional in the Porto.toml and inferred to be the canary edition, which means that users are responsible for keeping it compatible (this also ensures that this feature works out of the box with existing projects!)

Examples

Editioned (supported)

edition = "0.7"

[package]
name = "xyz"
version = "xyz"
type = "lib"

Uneditioned (supported)

[package]
name = "xyz"
version = "xyz"
type = "lib"

Unknow / Outdated (unsupported)

edition = "0.6"

[package]
name = "xyz"
version = "xyz"
type = "lib"

This causes the following error:

Error:   × could not deserialize Proto.toml
  ╰─▶ TOML parse error at line 1, column 1
        |
      1 | edition = "0.6"
        | ^^^^^^^^^^^^^^^
      unsupported manifest edition, supported editions of 0.8.0 are: 0.8

Open Points

After the review of the core concept of editions the following points are open and must be addressed before merging:

  • Document this in the guide
  • Write tests for this

@mara-schulke mara-schulke added docs::crate Improvements or additions to documentation datamodel Changes related to the Datamodel component::cli Everything related to the buffrs cli priority::high Urgent change or idea, please review quickly complexity::medium Issues or ideas which require some discussion, research and more implementation work type::epic An epic change. This is going to make a big difference to buffrs as a product. type::idea Rough idea or proposal for buffrs docs::project Documenting the project vision, design decisions etc labels Dec 15, 2023
@mara-schulke mara-schulke added this to the Stabilization milestone Dec 15, 2023
@mara-schulke mara-schulke self-assigned this Dec 15, 2023
@mara-schulke mara-schulke requested a review from tomkarw December 16, 2023 08:09
@xfbs
Copy link
Contributor

xfbs commented Dec 19, 2023

I just want to add one thing here, which is a pattern that I have used before that I found rather useful for when you have versioned data structures and getting them to parse easily with serde. From glancing at the code in this PR, it looks like this strategy might be useful here as well. Some of this explanation includes stuff we are already doing, but I repeat it here for completness' sake.

The idea is that we use serde's support for internally tagged enums to get the decoding for free. To do this, we can have an Edition enum, and a Manifest struct (this one is internal, it can change at any time):

// could be auto-derived from VersionedManifest using strum, for better maintainability
#[derive(Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum Edition {
    V1,
    V2,
}

// internal representation of a manifest (we are free to change this, as it is now decouple from the
// representation on disk).
#[derive(Serialize, Deserialize)]
pub struct Manifest {
    edition: Edition,
    authors: Vec<String>,
    frobnicate: bool,
}

Now, when we actually parse the manifests, we use a different type called a VersionedManifest (this is raw data that we parse from the filesystem). Note that this is now an enumeration.

// external representation of a manifest (add new variants here for new editions)
#[derive(Serialize, Deserialize)]
#[serde(tag = "edition", rename_all = "snake_case")]
pub enum VersionedManifest {
    V1(ManifestV1),
    V2(ManifestV2),
}

To translate this: with the tag = "edition", we are telling serde that in our TOML there is a field called edition, which it should parse first to figure out which variant it should match. We also tell it to snake_vase the variant names. So basically, the edition field can now be v1 and v2. We are also able to override them with something else using #[serde(tag = "something")].

The variants of this enumeration should obviously align with the Edition enum. In fact, the Edition enum can be automatically derived from this VersionedManifest using the strum crate, if desired.

Now, we can define these manifests. These are public-facing and may never change (part of the public API, if you wish):

// on-disk representations for manifests, one for every edition
#[derive(Serialize, Deserialize)]
pub struct ManifestV1 {
    // these fields can be public because they may never change
    pub author: String,
}

#[derive(Serialize, Deserialize)]
pub struct ManifestV2 {
    pub authors: Vec<String>,
    pub frobnicate: bool,
}

You can see that these two have different shapes — one has an author which is a single string, and one has a list of authors. This is fine — we can totally change the shape between versions. We have full flexibility to make any kind of breaking changes in between editions.

Finally, we write a bit of glue code to convert a VersionedManifest into our internal Manifest representation:

// this is where we translate the public-facing versioned manifest into our
// internal representation. we can change our internal representation at any point
// and change how we do this translation, as long as we don't change the semantics
impl Into<Manifest> for ManifestV1 {
    fn into(self) -> Manifest {
        Manifest {
            edition: Edition::V1,
            authors: vec![self.author],
            frobnicate: false, // fallback to default value
        }
    }
}

impl Into<Manifest> for ManifestV2 {
    fn into(self) -> Manifest {
        Manifest {
            edition: Edition::V2,
            authors: self.authors,
            frobnicate: self.frobnicate,
        }
    }
}

impl Into<Manifest> for VersionedManifest {
    fn into(self) -> Manifest {
        match self {
            VersionedManifest::V1(manifest) => manifest.into(),
            VersionedManifest::V2(manifest) => manifest.into(),
        }
    }
}

What is nice about this is that now, the two toml files parse properly:

edition = "v1"
author = "Snoop Doggy Dogg"

and this one as well:

edition = "v2"
authors = ["Snoop Doggy Dogg", "Dr. med. Dre"]
frobnicate = true

And yet, once we have parsed them both fully and turned them into a Manifest, they produce the same shape of data (but yet we still have that edition field, so that we can in the code have different behaviours depending on which edition you have chosen, if you really want):

Manifest {
    edition: Edition::V1,
    authors: vec!["Snoop Doggy Dogg"],
    frobnicate: false,
}

I'm a really big fan of how easy serde makes it to compose things like this. We could even have support for unknown editions, by writing something like this:

// the untagged makes it try to match the input as a Known variant first, and if it cannot, fall
// back to keeping it as a string. so "v1" => Known(V1), "v4" => Unknown("v4").
#[derive(Serialize, Deserialize)]
#[serde(untagged)]
pub struct RawEdition {
    Known(Edition),
    Unknown(String),
}

Not that we would ever want that, probably, but the way this stuff composes makes it possible to mix strongly-typed stuff with a fallback to loosely-typed, dynamic data.

Maybe this is something we could implement here as well, it might save us some custom serde parsing code?

But Patrick, what about backwards compatibility?

Well I am glad you asked. This is thankfully easily possible. We can introduct a LegacyManifest struct, or we can simply treat manifests with missing editions as a specific, known edition. Here is how to treat ones with missing editions as a V1 manifest:

// the untagged means we will parse it first as a VersionedManifest, and fall back to the second case.
#[derive(Serialize, Deserialize)]
#[serde(untagged)]
pub struct RawManifest {
    Versioned(VersionedManifest),
    Unversioned {
        // capture the edition, if any
        #[serde(default)]
        edition: Option<String>,
        #[serde(flatten)]
        manifest: ManifestV1,
    },
}

impl Into<Manifest> for RawManifest {
    fn into(self) -> Manifest {
        match self {
            Versioned(manifest) => manifest.into(),
            // here we could raise an error if the edition is incorrect
            Unversioned { edition, manifest } => manifest.into(),
        }
    }
}

Now, when parsing everything as this RawManifest, the following toml:

author = "Snoop Doggy Dogg"

Parses as such:

Manifest {
    edition: Edition::V1,
    authors: vec!["Snoop Doggy Dogg"],
    frobnicate: false,
}

So, TL;DR:

  • We can use tagged enums with serde to parse editions
  • We can use untagged enums as a fallback (for unknown/unspecified editions)
  • Type magic makes for rather pleasant code

@xfbs
Copy link
Contributor

xfbs commented Dec 19, 2023

Another cool trick with serde (while I'm at it): suppose you have some data structure that is typed, but maybe in the future you might add fields to it. You don't care about those fields now, but you also don't want to lose them. So what do you do?

This:

#[derive(Serialize, Deserialize)]
pub struct MyType {
    author: String,
    uuid: Uuid,
    tags: BTreeSet<Tag>,

    // anything else not present when parsing gets dumped into this
    #[serde(flatten)]
    other: BTreeMap<String, serde_json::Value>,
}

Now when you parse something like this:

{
    "author": "Michael Jackson",
    "uuid": "0000-000-0000-000000000000",
    "tags": ["cheese", "pizza", "hummus"],
    "serendipity": "chumsky",
    "direction": "discombobulated",
    "something": ["abc", 12],
}

It parses neatly (losslessly):

MyType {
    author: "Michael Jackson",
    uuid: Uuid::from("0000-000-0000-000000000000"),
    tags: BTreeSet::from(["cheese", "pizza", "hummus"]),
    other: BTreeMap::from([
        ("direction", Str("discombulated")),
        ("something", List(Str("abc"), Num(12))),
    })
}

And when you serialize this, you get back the original JSON.

Neat, right! I think serde is so cool 😆

@mara-schulke
Copy link
Contributor Author

@xfbs so, I agree, and I have used internally tagged serde enums before for these kinds of work. The problem here is that the Canary edition does not serialize to one specific edition, but to a different string on every buffrs version hence the custom serialization logic is needed.

The custom deserializers for the RawManifest enable us to treat the missing tag differently from how serde would behave by default. So to summarize this quickly: Its not possible to teach serde the current implementation's behavior through combinations of rename and tag. This is why I opted to implement this manually.

If you find a way of having untagged variants default to a specific tag + rename a variant dynamically let me know! Im open for suggestions but after carefully consulting the docs and lib of serde this is just not covered by the derive syntax.

@xfbs
Copy link
Contributor

xfbs commented Dec 19, 2023

@mara-schulke I see! As far as I understand, we can certainly have untagged variants default to a specific tag. However, you are right indeed in that we cannot rename a variant dynamically, this is tracked in serde-rs/serde#450.

As far as I understand, the manually-implemented behaviour can be (almost) reimplemented like this:

#[derive(Serialize, Deserialize)]
#[serde(tag = "edition")]
pub enum TaggedManifest {
    #[serde(rename = "0.7")]
    Canary(CanaryManifest),
}

#[derive(Serialize, Deserialize)]
#[serde(untagged)]
pub enum Manifest {
    Tagged(TaggedManifest),
    Untagged(FallbackManifest),
}

The only caveat being that due to the missing dynamic rename functionality, we have to write rename = "0.7" and we currently cannot write rename = concat!("0.", env!("CARGO_MANIFEST_VERSION_MINOR")). So, if we are okay with having a place in the code that we need to manually update with the current version name (this can be unit tested quite easily), then this approach might be useful to us here.

If we don't want to have to manually update that 0.7, then obviously implementing Serialize and Deserialize manually (as is the current approach) is the only way to go 😢

@mara-schulke
Copy link
Contributor Author

Well, but the current behavior of "untagged" manifests is to deserialize as the current canary edition. I don't think that this is possible as of right now?

@xfbs
Copy link
Contributor

xfbs commented Dec 19, 2023

So we only have two parsing cases:

  • edition = "0.7" or missing: treat as current canary
  • edition = "0.6" or any other, non-current edition: emit warning

Is that correct?

@mara-schulke
Copy link
Contributor Author

Well, no, because 0.7 may be compatible with 0.6, then we would not emit an error.

@mara-schulke
Copy link
Contributor Author

@xfbs @tomkarw @Tpt @qsantos: this is blocking the migration from kicking of, may we either merge this or get concrete change requests?👀🙈

I'm open on reiterating the implementation, I'm just waiting on suggestions / concrete feedback here. If you don't have any, may we shift the discussion to a github discussion / issue and move on?

Copy link
Contributor

@Tpt Tpt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current state seems a great improvement to me. But I am not one of the project maintainer so it's not my call to make I think.

@xfbs
Copy link
Contributor

xfbs commented Dec 22, 2023

Sorry from my side for being slow!

  • I think this is a really good idea and also necessary for the long-term growth of buffrs. So it is quite important that we get this in. I think the code looks good, therefore I approve this.

  • I have a bit of uncertainty/lack of understanding of the strategy around the canary editions, backwards compatibility and all that good stuff (but I suppose that is not code review per se but rather some high-level strategy "review")

  • Random idea: Are we maybe overloading the edition key with the version? Is this what we really want:

    edition = "alpha"
    buffrs_version = "0.7" # requires 0.7 at minimum

    and

    edition = "1"

    Or something like that (aka have separate fields for edition versioning and buffrs minimum versions)? (not sure if we do, just thinking out aloud)

  • It scares me a little bit that the current approach is that you need to re-publish a package to upgrade to a new edition (and all of the dependencies need to do the same), if I understood that correctly (did I understand that correctly?)

  • I feel like there are some thoughts that could be better discussed synchronously, because I feel bad when I ask stupid questions or write a wall of text 😄

I think @Tpt left some good comments, and maybe I'm just overthinking it!

@mara-schulke
Copy link
Contributor Author

It scares me a little bit that the current approach is that you need to re-publish a package to upgrade to a new edition (and all of the dependencies need to do the same), if I understood that correctly (did I understand that correctly?)

I think what you need to acknowledge is that you would need to do this either way. If we break the compat (for whatever reason) you can't use old packages. This is already the case as of right now and will always be the case prior to being stable. What this edition system solves is that you gain stability and awareness of when this is the case (and if it is the case, you get the ability to repair it, by mapping the old manifest to the new).

Re mixing editions and versions: buffrs doesn't have editions as rust has them. We only version the manifest through editions thus there is no need to use "alpha", "beta" etc for the manifest version. The only version we track is the stability of the package manager and thus strictly related to the server version string of it. The versioning system described by this pr is basically just formalizing what is already the case: every new minor version (pre 1.0) may be incompatible with the prior one. The manifest may change completely between each version and thus the edition tracks when the manifest was valid / how it should look like and enables us to write migration guides between each edition.

Most likely each breaking change in buffrs (and thus each new minor version) will come along with some, potentially breaking, manifest change. If that is not the case anymore we are approaching stable.

@xfbs
Copy link
Contributor

xfbs commented Dec 22, 2023

You are obviously much more knowledgeable on buffrs than me, so always take what I say with a grain of salt 😅

If we break the compat (for whatever reason) you can't use old packages. This is already the case as of right now and will always be the case prior to being stable.

We could make a conscious choice to say: while buffrs is unstable (there will be breaking changes in it's API), we are explicitly versioning the manifests and guaranteeing that newer versions will be able to use older manifests. if we wanted to, anyways.

My understanding is that that is not something we want to do, because we want to be able to make many breaking changes or changes to the manifests that are not easily "convertible" (aka bijective)?

Most likely each breaking change in buffrs (and thus each new minor version) will come along with some, potentially breaking, manifest change. If that is not the case anymore we are approaching stable.

Does the manifest change often? I think I was under the impression that it is relatively stable — but I am likely wrong!

Sorry for opening a can of worms with my comment but I think this explanation from you really helped me understand the motivation behind the design the way it is 😊

@mara-schulke mara-schulke merged commit d9a8f58 into main Dec 26, 2023
@mara-schulke mara-schulke deleted the mara/manifest-version branch December 26, 2023 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complexity::medium Issues or ideas which require some discussion, research and more implementation work component::cli Everything related to the buffrs cli datamodel Changes related to the Datamodel docs::crate Improvements or additions to documentation docs::project Documenting the project vision, design decisions etc priority::high Urgent change or idea, please review quickly type::epic An epic change. This is going to make a big difference to buffrs as a product. type::idea Rough idea or proposal for buffrs
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Encode manifest version in manifest
4 participants