-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Nested Cargo packages #3452
base: master
Are you sure you want to change the base?
Conversation
text/0000-nested-publish.md
Outdated
Currently, developers must publish these packages separately. This has several disadvantages (see the [Rationale](#rationale-and-alternatives) section for further details): | ||
|
||
* Clutters the public view of the registry with packages not intended to be usable on their own, and which may even become obsolete as internal architecture changes. | ||
* Requires multiple `cargo publish` operations (this could be fixed with bulk publication) and writing public metadata for each package. | ||
* Can result in semver violations and thus compilation failures, due to the developer not thinking about semver compatibility within the group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While some of these indirectly touch on it, one I'd explicitly add is sheer boilerplate.
In working on #3424, one of the things I've noticed is the commentary from people who are looking to further drop boilerplate. This also came up in a recent blogpost and HN discussion of it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This proposal will still require Cargo.toml
s for each nested package. What boilerplate do you see removing (besides e.g. explanatory README.md
files)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any of the standard manifest fields that crates.io requires. Granted workspace inheritance helps with those (which will automatically be used in cargo new
in 1.72) but much nicer if we can just leave them out
Combine that with "cargo script" (if we support [lib]
packages) and you might not even need manifests (even ones embedded in source)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find a definitive statement of exactly which fields crates.io
requires to compare to when discussing boilerplate reduction. https://doc.rust-lang.org/cargo/reference/publishing.html#before-publishing-a-new-crate implies it is one, but isn't really (e.g. homepage
is not mandatory).
Thanks @kpreid for taking the time to write this up! |
text/0000-nested-publish.md
Outdated
The following changes must be made across Cargo and `crates.io`: | ||
|
||
* **Manifest schema** | ||
* The Cargo manifest now allows `"nested"` as a value for the `package.publish` key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it is already a term in cargo
, I actually lean towards "vendor"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the similarity, but vendoring normally means making a copy of a package that is available by other means, and one of the design goes here is to discourage any such copies existing (because they are likely to be accidental, and if they aren't, then they may create the same kinds of problems as multiple major versions do). I think reusing the term would create more confusion than it avoids.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vendoring normally means making a copy of a package that is available by other means, an
The perspective I was using when I came up with "vendor" was that instead of getting a dependency through the registry, we are copying it into our package. Its not vendored within the repo but in the .crate
file.
This also ties into whether we should generalize this across dependency sources at which point it feels like it becomes even more applicable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't think that vendoring is the right term, especially as one of the things that came up during my review of prior art is the concept of using nested-packages-or-whatever for vendoring, e.g. to fix a bug before upstream accepts the patch — I think these need to be kept distinct ideas.
That being considered, what do we need to do here with the RFC text to resolve this thread? Should there be an unresolved question for terminology, or can we just proceed as-is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "vendoring" isn't all-inclusive of the situations where one might use this. Usually vendoring refers to "taking some third party dependency's entire source code and jamming it into to some vendor/whatever
directory (or sometimes via a git submodule) for reasons". I don't think it makes sense to use this term in scenarios where the nested crate isn't third party, which I think will actually be most of the time with this feature, so "vendoring" at best will be a misnomer, and at worst might also carry some negative connotations for people recalling some really crazy repo setups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen any "seconding" of support for the "vendor"
name, so I'm going to keep "nested"
and resolve this. (Of course, there might be some third or fourth better idea to be found…)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least keeping this unresolved to centralize any name bikeshedding conversations
Progress on the RFC has been stalled mainly because I have been pondering how to specify the behavior given the premise that inclusion should be based on the presence of dependencies, not just directory nesting (particularly to permit nested “utility” packages which, in the workspace they are developed, are not directory children of the published package, because they are shared between multiple published packages). I'm currently thinking that:
I think that covers most of the necessary ground, but I still need to write the corresponding RFC text. |
This would dramatically improve the current state of publishing and managing proc macro crates, and just would be a huge QOL improvement for large-workspace projects, of which there are many. |
text/0000-nested-publish.md
Outdated
@@ -119,9 +119,10 @@ Then you can `cargo publish` from within the parent directory `foo/`, and this w | |||
|
|||
Two new possible values are added to the manifest. | |||
|
|||
* The `package.publish` field allows `"nested"` as a value, in addition to existing `false` and `true`. This value affects `cargo publish` and nested dependencies as described below. | |||
* The `package.publish` or `workspace.package.publish` field allows `"nested"` as a value, in addition to existing `false` and `true`. This value affects `cargo publish` and nested dependencies as described below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have mixed feelings on allowing nested
to be inherited. In part, I feel like its a weird default for the full workspace but I also appreciate leaving the individual building blocks and letting users decide what to do.
Trying to raise visibility of this on the respective teams so we can see what more problems we can uncover. |
* Instead of introducing `package.publish.nested = true`, we could only require that dependencies be declared as nested. The disadvantages of this are: | ||
* May unintentionally duplicate published code between a standalone published package and a nested package | ||
* Does not make both ends of the relationship explicit to readers of the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another reason: we want nested packages to have an explicit opt-in is we don't want a happy path for forking other people's packages (if we expanded this feature to more than path-dependencies). We should instead encourage people to be working with upstream and putting the end-user in charge of what versions get selected.
* A trait declaration and a corresponding derive macro (which must be defined in a separate proc-macro library). | ||
* A library that uses a build script that uses another library or binary (e.g. for precomputation or bindings generation). | ||
* A logically singular library broken into multiple parts to speed up compilation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I understand the value of this RFC, and these pain points are truely painful. I am a bit not comfortable about the complexity exposed to Cargo users. With the upcoming public/private dependencies in Edition 2024. The situation becomes way awkward.
# in a `foo` package
foo-priv-types = { path = "priv-types", public = false, publish = "nested" }
foo-core= { version = "0.1", path = "core", public = true }
foo-util = { version = "0.1", path = "util", public = false }
foo-derive = { path = "derive", public = true, publish = "nested" }
The above example is very likely to happen, but it not immediately clear the mixed meaning of public
and nested
.
public = false
+nested
- types are not exposed (private) in public API, and that package is published as a private module
- 👍🏾 make sense
public = true
+version
- types are exposed in public API, and that package is published separately
- 👍🏾 make sense
public = false
+version
- types are not exposed in public API, and that package is published separately
- 🤔 seems awkward; this RFC addresses it
public
+nested
- types are exposed in public API but that package is published as it is a private module
- 🤔 looks a bit more awkward; this RFC addresses it,
I may have over-complicated the situation, but it indeed introduces cognitive overhead to understand when combining different concept together. I don't know how complex inline-module would be, but that might be a chance to changing to compilation unit from crate to module (don't bash on my head, just an idea).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(No need to say when open namespace comes and joins the party. While it's a pretty independent feature, the learning curve doesn't look too good when everything gathers…)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, the big concern with public
+ nested
is when two workspace members do that for the same dependency. Locally, they will be interchangeable. When published, they will not. This delays testing and could confuse users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor question: If a Git repository contains a package with nested packages, can the other package depends on any of those nested packages as a Git dependency? Currently Git dependency searches packages whose name matches recursively inside the repository.
* It is not possible to publish a bug fix to a nested package without republishing the entire parent package; this is the cost we pay for the benefit of not needing to take care with versioning for nested packages. | ||
|
||
* Suppose `foo` has a nested package `foo-core`. Multiple major versions of `foo` cannot share the same instance of `foo-core` as they could if `foo-core` were separately published and the `foo`s depended on the same version of `foo-core`. Thus, choosing nested publishing may lead to type incompatibilities (and greater compile times) that would not occur if the same libraries had been separately published. | ||
* If this situation comes up, it can be recovered from by newly publishing `foo-core` separately (as would have been done if nested publishing were not used) and using the [semver trick](https://github.com/dtolnay/semver-trick) to maintain compatibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semver trick does solve this problem, but only when those package bumps their MSRV, so this feature won't get adopted until some point. Some popular crates using proc-macros hold a relatively conservative MSRV. The adoption rate might be mild annoying in the short-mid term.
(Granted, other alternatives would have the same issue)
* If this situation comes up, it can be recovered from by newly publishing `foo-core` separately (as would have been done if nested publishing were not used) and using the [semver trick](https://github.com/dtolnay/semver-trick) to maintain compatibility. | ||
|
||
* Support for duplicative nested publishing (that is, nested packages that are nested within more than one parent package) has the following consequences: | ||
* May increase the amount of source code duplicated between different published packages, increasing download sizes and compilation time. It's currently possible to duplicate code into multiple packages via symlinks, but this would make it an “official feature”. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Along with this, as a side effect, this might provide an easier route for people to vendor code more than it was. The Rust ecosystem is different from 5 years ago, but the initial idea of Cargo/crates.io is to encourage everybody to publish and share codes. And whenever possible, play nice and help fix bugs in upstream.
Again, Rust is not Rust anymore. The old rules might not suitable nowadays since enterprise and other large projects starts using it more. Regardless, the potential consequence of paradigm shift is something we need to be aware of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Along with this, as a side effect, this might provide an easier route for people to vendor code more than it was.
At first I wasn't too concerned about it but then I thought of what insta
had to do to vendor yaml-rust
. They have to copy the code in, merge any dependencies
, and update any crate::
relative references.
With this feature, its copy the package directory in, mark it as nested
, and you are done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking over #2224, two of the participants specifically called out wanting to use it for private forks :/
There are a number of reasons why a Rust developer currently may feel the need to create multiple library crates, and therefore multiple Cargo packages (since one package contains at most one library crate). These multiple libraries could be: | ||
|
||
* A trait declaration and a corresponding derive macro (which must be defined in a separate proc-macro library). | ||
* A library that uses a build script that uses another library or binary (e.g. for precomputation or bindings generation). | ||
* A logically singular library broken into multiple parts to speed up compilation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another use case is that this provides another way for us to break dependency cycles that involve dev-dependencies
.
Currently, the solution involves dropping the dependency on publish (by not specifying a version
). This lacked discovery so by default cargo add
does it for all path dev-dependencies. This negatively impacts crater because it means that any packages with dev-dependency cycles or where cargo add
was used to add a path dev-dependency, we lose out on a lot of testing with crater.
With this feature, we can instead nest the path dev-dependency.
|
||
If `package.publish` is a table, then `package.publish.registries` defaults to `false`, regardless of the value or presence of `package.publish.nested`. | ||
|
||
Note: This dual-publishing-mode functionality is permitted mainly to keep the functionality composable/orthogonal. We hope that in most cases, packages are either published nested exactly once, or to a registry alone, to avoid duplicating code in the registry and compiling it redundantly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A use case to call out for this is breaking cycles that come about from dev-dependencies.
|
||
* This increases the number of differences between “Cargo package (on disk)” from “Cargo package (that may be published in a registry, or downloaded as a unit)” in a way which may be confusing; it would be good if we have different words for these two entities, but we don't. | ||
|
||
* It is not possible to publish a bug fix to a nested package without republishing the entire parent package; this is the cost we pay for the benefit of not needing to take care with versioning for nested packages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, for the proc-macro case, =
version requirements are frequently used and so both would need to be published anyways in that case.
|
||
This check is intended only to prevent accidents (such as vendoring a third-party package without considering the implications of redistributing it). It is always valid to omit `package.license` from the nested package, thus making no machine-readable claims about its licensing. | ||
|
||
It is an error for a nested package to have the same package name as the parent package or any other nested package with the same parent package. This is validated by all Cargo operations that would generate or read a lockfile. Rationale: This should ensure that whenever a nested package must be named, such as in an `.crate` archive, potentially in lock files, and potentially in Cargo user interface, the pair of (parent package name, nested package name) is sufficient to uniquely identify the package. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just leave this as a conflict validation check during publish? That means one less thing to have to validate on every command and this would allow nested packages in nested directories to not be subject to this limitation
|
||
The package index, and the `crates.io` user interface, do not explicitly represent nested packages; the package is presented as if it were a single package: | ||
|
||
* Nested packages’ dependencies are flattened into the listed dependencies of the parent package. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little concerned that this seems like it could be a fairly complex operation to flatten dependencies. For example, how does it handle name collisions? How does it handle fields that aren't normally unified, like the public
field? I'm not quite sure I see how this could work.
If the intent is to have the resolver not know about nested packages, it seems like that would make it difficult for other parts of cargo to know about those nested packages since they are all driven by the resolver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another field that can't be readily merged is links
. Is it an error to have more than one links
in the tree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't hide this from the resolver, that opens up other areas of complexity
iiuc
- we would need a way for source ids to refer to these packages
- we would need a way to include these entries in the Index without conficting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By merging dependencies we probably have to mean that the index will list all of the dependencies from the merged list. I.E. the list of dependencies will have more than one entry for foo
. Rather than us merging all references to foo
into one dependency. I Think, the resolver already has to deal with index entries with duplicate package names, for target specific dependencies and rename dependencies.
If we actually want to merge into one dependency entry, semver constraints are going to make life difficult. >=1.0.0-alpha, <3.0.0
is not the merger of >=1.0.0-alpha
with <3.0.0
. (1.0.0-alpha
matches the first two but not the third one).
One drawback that I think would be good to include is the risk around having publicly exposed shared types be incompatible if a dependency is built multiple times. For example: If I have three packages, "a", "b", and "shared". There is a dependency from a→shared and b→shared, and I publish |
|
||
* A trait declaration and a corresponding derive macro (which must be defined in a separate proc-macro library). | ||
* A library that uses a build script that uses another library or binary (e.g. for precomputation or bindings generation). | ||
* A logically singular library broken into multiple parts to speed up compilation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A logically singular library broken into multiple parts to speed up compilation.
Similarly, a bin
might want to split out a lib
for local development and testing but not consider it public and not offer semver guarantees for the lib
. cargo-edit
and cargo-release
are like this.
* When packages are implementation details, it makes a permanent mark on the `crates.io` registry even if the implementation of the parent package stops needing that particular subdivision. By allowing sub-packages we can allow package authors to create whatever sub-packages they imagine might be useful, and delete them in later versions with no consequences. | ||
* It is possible to depend on a published package that is intended as an implementation detail. Ideally, library authors would document this clearly and library users would obey the documentation, but that doesn't always happen. By allowing nested packages, we introduce a simple “visibility” system that is useful in the same way that `pub` and `pub(crate)` are useful within Rust crates. | ||
|
||
## Alternatives to nested packages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another alternative is to add a form of access control related to open namespaces so you can have published packages with APIs scoped to the namespace
- Perform this access control at the
cargo
/crates.io level by allowing a published package to be marked as private, disallowing any package to depend on it unless its participating in the namespace.- However, namespace membership control is only enforced at the registry level, so for any local development, you can access the private packages by putting yourself in the namespace
- Language level support with a
pub(namespace)
(name to be bike-shedded)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, neither helps with the publishing overhead.
|
||
* We could choose to explicitly prohibit nested packages from specifying a `package.version`, to avoid giving the misleading impression that it means anything. This would be notably stricter than the current meaning of absent `package.version` as of Cargo 1.75, which is that it is completely equivalent to `version = "0.0.0"`. It would also prohibit having a package that is both nested and published to a registry, if that is desired. | ||
|
||
# Future possibilities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combined with bin deps, we could allow delegating build scripts to a nested package, allowing a more complete environment for its development.
|
||
There are a number of reasons why a Rust developer currently may feel the need to create multiple library crates, and therefore multiple Cargo packages (since one package contains at most one library crate). These multiple libraries could be: | ||
|
||
* A trait declaration and a corresponding derive macro (which must be defined in a separate proc-macro library). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should -sys
packages be a motivation with using this being a recommended path or should we discourage using this with -sys
This was talked about a little at #2224 (comment)
We discussed this further in today's cargo team meeting. The above comments opened represent some of the discussion. Several areas of concern were focused on: The original appeal of this solution is that is relatively trivial in thought as path dependencies exist. The problem is when getting into the details, particularly dependency resolution / the Index. On top of that, crates.io would likely want to duplicate the Index Summary generation from the This also adds more design choices to the user, making it harder for them to navigate the choice of what tools to use which would include
While this helps to streamline people's workflows by reducing publishing overhead, there isn't any inherent blockers from people solving this today. The closest is likely in communicating semver guarantees and there is interest in something like All of that said, the unease wasn't enough to block this proposal at this stage but we would be open to this being explored further to see how much these concerns are addressed or validated when this is put into practice. As for next steps, we didn't fully resolve that. The RFC affects two teams with crates.io mostly being involved for the Index side of things and would generally be implemented after a go/no-go from the above exploration. We didn't get to the point of discussing whether we'd want to push this RFC forward, experiment with it, and possibly reject it after approval if the concerns were shown to be blocking, or if we'd want to authorize an experiment to be done as part of the writing of this RFC, much like cargo-script. The big risk would be to where you put your time as this would likely be a big experiment with enough uncertainty around it that the odds of it being stabilized are iffy. |
FWIW, I currently believe it makes sense to wait for the accepted RFC 3243 packages as namespaces (tracking issue) to be implemented before pursuing this RFC any further; they cover some of the same ground (more coherent organization of a user-facing library broken up into many crates) and would likely have some interactions to address. |
Rendered
This is my first Rust RFC. The idea was previously discussed on IRLO (Private nested Cargo packages), and the feedback was generally positive except for confusion about exactly what was being changed, which I believe I have now precisely defined in this RFC draft.
rust-lang/cargo#2203