-
Notifications
You must be signed in to change notification settings - Fork 16
Implement a canary edition system #201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I just want to add one thing here, which is a pattern that I have used before that I found rather useful for when you have versioned data structures and getting them to parse easily with serde. From glancing at the code in this PR, it looks like this strategy might be useful here as well. Some of this explanation includes stuff we are already doing, but I repeat it here for completness' sake. The idea is that we use serde's support for internally tagged enums to get the decoding for free. To do this, we can have an // could be auto-derived from VersionedManifest using strum, for better maintainability
#[derive(Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum Edition {
V1,
V2,
}
// internal representation of a manifest (we are free to change this, as it is now decouple from the
// representation on disk).
#[derive(Serialize, Deserialize)]
pub struct Manifest {
edition: Edition,
authors: Vec<String>,
frobnicate: bool,
} Now, when we actually parse the manifests, we use a different type called a // external representation of a manifest (add new variants here for new editions)
#[derive(Serialize, Deserialize)]
#[serde(tag = "edition", rename_all = "snake_case")]
pub enum VersionedManifest {
V1(ManifestV1),
V2(ManifestV2),
} To translate this: with the The variants of this enumeration should obviously align with the Now, we can define these manifests. These are public-facing and may never change (part of the public API, if you wish): // on-disk representations for manifests, one for every edition
#[derive(Serialize, Deserialize)]
pub struct ManifestV1 {
// these fields can be public because they may never change
pub author: String,
}
#[derive(Serialize, Deserialize)]
pub struct ManifestV2 {
pub authors: Vec<String>,
pub frobnicate: bool,
} You can see that these two have different shapes — one has an author which is a single string, and one has a list of authors. This is fine — we can totally change the shape between versions. We have full flexibility to make any kind of breaking changes in between editions. Finally, we write a bit of glue code to convert a // this is where we translate the public-facing versioned manifest into our
// internal representation. we can change our internal representation at any point
// and change how we do this translation, as long as we don't change the semantics
impl Into<Manifest> for ManifestV1 {
fn into(self) -> Manifest {
Manifest {
edition: Edition::V1,
authors: vec![self.author],
frobnicate: false, // fallback to default value
}
}
}
impl Into<Manifest> for ManifestV2 {
fn into(self) -> Manifest {
Manifest {
edition: Edition::V2,
authors: self.authors,
frobnicate: self.frobnicate,
}
}
}
impl Into<Manifest> for VersionedManifest {
fn into(self) -> Manifest {
match self {
VersionedManifest::V1(manifest) => manifest.into(),
VersionedManifest::V2(manifest) => manifest.into(),
}
}
} What is nice about this is that now, the two toml files parse properly: edition = "v1"
author = "Snoop Doggy Dogg" and this one as well: edition = "v2"
authors = ["Snoop Doggy Dogg", "Dr. med. Dre"]
frobnicate = true And yet, once we have parsed them both fully and turned them into a Manifest {
edition: Edition::V1,
authors: vec!["Snoop Doggy Dogg"],
frobnicate: false,
} I'm a really big fan of how easy serde makes it to compose things like this. We could even have support for unknown editions, by writing something like this: // the untagged makes it try to match the input as a Known variant first, and if it cannot, fall
// back to keeping it as a string. so "v1" => Known(V1), "v4" => Unknown("v4").
#[derive(Serialize, Deserialize)]
#[serde(untagged)]
pub struct RawEdition {
Known(Edition),
Unknown(String),
}
Maybe this is something we could implement here as well, it might save us some custom But Patrick, what about backwards compatibility?Well I am glad you asked. This is thankfully easily possible. We can introduct a // the untagged means we will parse it first as a VersionedManifest, and fall back to the second case.
#[derive(Serialize, Deserialize)]
#[serde(untagged)]
pub struct RawManifest {
Versioned(VersionedManifest),
Unversioned {
// capture the edition, if any
#[serde(default)]
edition: Option<String>,
#[serde(flatten)]
manifest: ManifestV1,
},
}
impl Into<Manifest> for RawManifest {
fn into(self) -> Manifest {
match self {
Versioned(manifest) => manifest.into(),
// here we could raise an error if the edition is incorrect
Unversioned { edition, manifest } => manifest.into(),
}
}
} Now, when parsing everything as this author = "Snoop Doggy Dogg" Parses as such: Manifest {
edition: Edition::V1,
authors: vec!["Snoop Doggy Dogg"],
frobnicate: false,
} So, TL;DR:
|
Another cool trick with serde (while I'm at it): suppose you have some data structure that is typed, but maybe in the future you might add fields to it. You don't care about those fields now, but you also don't want to lose them. So what do you do? This: #[derive(Serialize, Deserialize)]
pub struct MyType {
author: String,
uuid: Uuid,
tags: BTreeSet<Tag>,
// anything else not present when parsing gets dumped into this
#[serde(flatten)]
other: BTreeMap<String, serde_json::Value>,
} Now when you parse something like this: {
"author": "Michael Jackson",
"uuid": "0000-000-0000-000000000000",
"tags": ["cheese", "pizza", "hummus"],
"serendipity": "chumsky",
"direction": "discombobulated",
"something": ["abc", 12],
} It parses neatly (losslessly): MyType {
author: "Michael Jackson",
uuid: Uuid::from("0000-000-0000-000000000000"),
tags: BTreeSet::from(["cheese", "pizza", "hummus"]),
other: BTreeMap::from([
("direction", Str("discombulated")),
("something", List(Str("abc"), Num(12))),
})
} And when you serialize this, you get back the original JSON. Neat, right! I think serde is so cool 😆 |
@xfbs so, I agree, and I have used internally tagged serde enums before for these kinds of work. The problem here is that the The custom deserializers for the If you find a way of having untagged variants default to a specific tag + rename a variant dynamically let me know! Im open for suggestions but after carefully consulting the docs and lib of serde this is just not covered by the derive syntax. |
@mara-schulke I see! As far as I understand, we can certainly have untagged variants default to a specific tag. However, you are right indeed in that we cannot rename a variant dynamically, this is tracked in serde-rs/serde#450. As far as I understand, the manually-implemented behaviour can be (almost) reimplemented like this: #[derive(Serialize, Deserialize)]
#[serde(tag = "edition")]
pub enum TaggedManifest {
#[serde(rename = "0.7")]
Canary(CanaryManifest),
}
#[derive(Serialize, Deserialize)]
#[serde(untagged)]
pub enum Manifest {
Tagged(TaggedManifest),
Untagged(FallbackManifest),
} The only caveat being that due to the missing dynamic rename functionality, we have to write If we don't want to have to manually update that |
Well, but the current behavior of "untagged" manifests is to deserialize as the current canary edition. I don't think that this is possible as of right now? |
So we only have two parsing cases:
Is that correct? |
Well, no, because 0.7 may be compatible with 0.6, then we would not emit an error. |
@xfbs @tomkarw @Tpt @qsantos: this is blocking the migration from kicking of, may we either merge this or get concrete change requests?👀🙈 I'm open on reiterating the implementation, I'm just waiting on suggestions / concrete feedback here. If you don't have any, may we shift the discussion to a github discussion / issue and move on? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current state seems a great improvement to me. But I am not one of the project maintainer so it's not my call to make I think.
Sorry from my side for being slow!
I think @Tpt left some good comments, and maybe I'm just overthinking it! |
I think what you need to acknowledge is that you would need to do this either way. If we break the compat (for whatever reason) you can't use old packages. This is already the case as of right now and will always be the case prior to being stable. What this edition system solves is that you gain stability and awareness of when this is the case (and if it is the case, you get the ability to repair it, by mapping the old manifest to the new). Re mixing editions and versions: buffrs doesn't have editions as rust has them. We only version the manifest through editions thus there is no need to use "alpha", "beta" etc for the manifest version. The only version we track is the stability of the package manager and thus strictly related to the server version string of it. The versioning system described by this pr is basically just formalizing what is already the case: every new minor version (pre 1.0) may be incompatible with the prior one. The manifest may change completely between each version and thus the edition tracks when the manifest was valid / how it should look like and enables us to write migration guides between each edition. Most likely each breaking change in buffrs (and thus each new minor version) will come along with some, potentially breaking, manifest change. If that is not the case anymore we are approaching stable. |
You are obviously much more knowledgeable on buffrs than me, so always take what I say with a grain of salt 😅
We could make a conscious choice to say: while buffrs is unstable (there will be breaking changes in it's API), we are explicitly versioning the manifests and guaranteeing that newer versions will be able to use older manifests. if we wanted to, anyways. My understanding is that that is not something we want to do, because we want to be able to make many breaking changes or changes to the manifests that are not easily "convertible" (aka bijective)?
Does the manifest change often? I think I was under the impression that it is relatively stable — but I am likely wrong! Sorry for opening a can of worms with my comment but I think this explanation from you really helped me understand the motivation behind the design the way it is 😊 |
This PR implements an edition system for buffrs manifests. This allows us to version manifests properly and show users errors (or map the manifests internally behind the scenes)
This is a preliminary feature to roll out before driving wide adoption of buffrs to be able to still make breaking changes (and introduce new features) while having an existing user base.
Closes #64
Canary Edition
The canary edition correlates (and de/encodes to
0.x
wherex
is the current minor version of buffrs). This means that until the stable release there is an option to introduce potentially breaking changes with every minor release.Unknown Edition
Every other manifest release edition that is not matching the current canary one is deserialized into
unknown
and thus triggers an error for users.Introduces
pub enum Edition
– The edition enumpub const CANARY_EDITION
– The current canary edition of this crateChanges
Manifest
/RawManifest
s internal layouts to accommodate the edition logicbuffrs package
: The compressed package will always contain a manifest with the current canary edition inside. Even if the "source" manifest didn't contain any edition pinPorto.toml
and inferred to be the canary edition, which means that users are responsible for keeping it compatible (this also ensures that this feature works out of the box with existing projects!)Examples
Editioned (supported)
Uneditioned (supported)
Unknow / Outdated (unsupported)
This causes the following error:
Open Points
After the review of the core concept of editions the following points are open and must be addressed before merging: