-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow the edition specifier to be an integer #12301
Conversation
r? @weihanglo (rustbot has picked a reviewer for you, use r? to override) |
2547237
to
3991892
Compare
@@ -979,6 +979,22 @@ impl StringOrVec { | |||
} | |||
} | |||
|
|||
#[derive(Clone, Debug, Deserialize, Serialize, Eq, PartialEq)] | |||
#[serde(untagged, expecting = "expected a string or an integer")] | |||
pub enum StringOrI64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other areas I suspect we should update if we move forward with this
cargo new
- embedded manifests
- documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@epage We shouldn't update cargo new
until 2024, to avoid MSRV implications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There has been some discussion on this in #12301 (review) and #12301 (comment) . I agree that cargo new
should be updated only well into the 2023 edition, or even later, not just for MSRV but also for better error messages from cargos that don't support the 2023 edition.
Previous thoughts on this steveklabnik (#5617 (comment))
withoutboats (#5617 (comment))
matklad (#5617 (comment))
withoutboats (#5617 (comment))
joshtriplett (#5617 (comment))
|
So reception among the cargo team 5 years ago was mixed.
imo a lot of the conversations around "cargo script" are measuring the wrong metric, number of characters. I think a better metric is number of unique keys touched. This is locale-centric though. Within that framework, quotes are actually more expensive ( Still I think the savings for this change are not significant compared to the content being entered However, I am also sympathetic to reason joshtriplett had 5 years ago for being in favor: easier to communicate the field's value verbally. |
An interesting counter point to this is we later added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some counterpoints.
- With the current implementation, this is allowed:
[package] name = "foo" version = "0.1.0" edition = 0x7e5
- Changing any existing syntax in Cargo.toml could lead to an MSRV bump without notice.
(dep:
feature for example) - A bit confusing that with the same set of sources, an old compiler can compile a
"2015"
package but not a2015
package.
That is true that the edition is an identifier, and not a number, and so other numerical representations are invalid. However, I'm unsure if that would matter much in practice. And yes, this would force an MSRV bump. However, I think its more for the future than for the past. That doesn't mean I disagree with your points but just providing more considerations |
Good point. But also agree with @epage 's point that this might not matter in practice. I think implementation wise, thanks to cargo using
I agree that this is a problem. I would thus recommend waiting with the changing of the docs/ This PR could document that specifying an integral edition specifier is possible, but not, say, change |
I have replied to some of these points in the thread back when they were made. The I think even back then that would have been a worthy tradeoff as there is potentially an extremely large number of crates. In fact, since that comment, Rust has grown extremely and there are hundreds of thousands of crates on crates.io. That's 200 kb savings potential if only the currently available crates switched to the model :). But in general I feel that tools like cargo or rustc should not avoid adding to their codebase if it even saves one character or just makes one error message better.
Full agree on that. Number of characters is a good proxy though for number of unique keys touched, so it is useful to study both I think. |
Nominating for discussion in the next meeting. I also wonder if we should just FCP this and see if we have consensus/objections. |
#[serde(untagged, expecting = "expected a string or an integer")] | ||
pub enum StringOrI64 { | ||
String(String), | ||
I64(i64), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why an i64
instead of a u64
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried an u32 at the start and it didn't work, as in it compiled but the tests as added by this PR failed. The problem was I think that serde represents toml integers via i64 in its data model. In the end I don't think it matters that much: a negative number will still give an error one way or another.
☔ The latest upstream changes (presumably #12317) made this pull request unmergeable. Please resolve the merge conflicts. |
This helps save the number of characters needed to specify an edition. So far, each edition was fully described by its year number, 2015, 2018, 2021. Also for the future, the intent is that editions will always just be year numbers. Therefore, we don't have to use a toml string for them in Cargo.toml, an integer is enough. This saves two characters in each Cargo.toml.
☔ The latest upstream changes (presumably #12553) made this pull request unmergeable. Please resolve the merge conflicts. |
For future reference, this was also discussed in this zulip stream. It seems that more cargo team members would support this once there is a unified library to read/parse |
2 char savings vs leverage compression at scale
How are these savings being measured? An uncompressed representation isn't necessarily a reliable metric. A server can use a filesystem with transparent compression, a database or similar forms offering compressed storage such as archive formats. This can also be leveraged over network transit. In a scenario such as crates.io, it could be a non-concern if the Cargo.toml files don't need to be uncompressed. ZSTD for example can be trained across all those Cargo.toml files as input to optimize compression further. The higher the frequency of a pattern, the more likely it'll be part of that trained compression dictionary. If such savings were important at the scale of crates.io, this approach would be far more effective than the benefit of "two characters in each Cargo.toml", which I doubt would have much of an added improvement with that ZSTD approach. Verbal instruction vs UX consistentcy
"Set edition in your Cargo.toml to the string 2018"? 🤷♂️ I don't know about users asking "why?" with that information. They're instructed to do the same for semver, which if the minor or patch versions are omitted, looks like an integer or float. For the most part, cargo could just improve the error message to reference existing editions 2015, 2018, 2021 - if that helps as an example (eg: last 3 editions). Even if you avoid the clarification and just verbally say "just put edition equals 2018 in your $ cargo check
error: failed to parse manifest at `/app/Cargo.toml`
Caused by:
invalid type: integer `2021`, expected a string or workspace
in `package.edition` Technically the edition hint exists if you at least provide an invalid string value: # Newer than latest edition:
$ cargo check
error: failed to parse manifest at `/app/Cargo.toml`
Caused by:
failed to parse the `edition` key
Caused by:
this version of Cargo is older than the `2022` edition, and only supports `2015`, `2018`, and `2021` editions.
# Any other string value (including years older than latest edition):
$ cargo check
error: failed to parse manifest at `/app/Cargo.toml`
Caused by:
failed to parse the `edition` key
Caused by:
supported edition values are `2015`, `2018`, or `2021`, but `'2018'` is unknown I'm not sure that supporting integers improves the UX when said user encounters a foreign
$ cargo check
error: failed to parse manifest at `/app/Cargo.toml`
Caused by:
invalid type: integer `1`, expected a version string like "0.9.8" or a detailed dependency like { version = "0.9.8" }
in `dependencies.serde` A similar PR / discussion will likely then shift to the above UX, and reference the change to TL;DRI just wanted to chime in with the above input:
|
The hashes of the .crate archives, post compression, are baked into the Cargo.lock files. There isn't much we can do about that, and therefore innovating on compression, which is a good idea, don't get me wrong, is non-trivial. This change in comparison is pretty easy and only takes as many lines because we use serde and serde is not known for allowing low-overhead customizations in data representation. My argument should be understood in the context of the thing it replied to: people claiming that the gains were rather minor. The argument tries to argue with the scale of crates.io, because that one is easy to make back of the envelope calculations with. But in general it applies to every Cargo project whether it lives on the local disk, only in a private github, or in some other VCS somewhere else.
That's easy to answer: all editions are representable by integers, as they are named after year numbers in the gregorian calendar. Not all integers are valid edition numbers, but toml has no concept of an enum, an integer is just as valid as a string (and in my opinion more valid). As for floats, there is two main differences:
all in all, floats (or integers) are way further from semver than integers are from editions. |
Compression - Forward-only benefit
As I'm not familiar with how that's all managed, I assume these These savings would only be for new If the concern is with Compression - Integer
|
By way of clarification: "save two bytes" / "size" was never the goal under consideration here; it was only ever a question of UX. |
We discussed this once more in today's Cargo meeting, to give one more evaluation to the tradeoffs between churn/maintenance and potential UX improvement. We confirmed that we don't have any consensus for this change, and there are several folks objecting to the change. Given that, I'm going to go ahead and close this. |
This PR allows for edition specifiers to be toml integers:
So far, each edition was fully described by its year number:
2015
,2018
,2021
. Also for the future, the intent is that editions will always just be year numbers. Therefore, the natural choice for the type is an integer, instead of the current requirement for toml strings. Using an integer saves two characters in each Cargo.toml. Over the roughly hundred thousand Cargo.toml files on crates.io alone, you'll get some nice savings. Strings are still allowed to support legacy use cases and for the (unlikely) case of non-integral editions in the future.I've proposed this 5 years ago before the 2018 edition even shipped, and received positive feedback from some people (#5617). I ended up closing it because of unrelated reasons. Also, back then it was less clear whether one would have sth like
2018-beta
or such which would have required a string, which eventually didn't end up being a thing (but even then, one could make the case that one could still allow integers for the finally released2018
). With the cargo-script support (cc #12207), I'd say that interest is large again in minimizing the number of characters/keystrokes needed, so I'm bringing back this old proposal.I feel it's a very small feature in scope so one doesn't need a feature gate to evaluate its impact, but feel free to suggest gating.