-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an "airplane" flag to Cargo, altering it's dependency resolution behavior #4686
Comments
I'll jump on this! Thank you for the awesome write up |
@alexcrichton Thank you for an absolutely incredible guide for potential contributors! This exactly describes the behavior I'd like in every way. |
Glad to see this topic getting some attention! I left a comment in the mentioned reddit thread regarding IPFS support in Cargo. I personally think that IPFS can be a great tool to tackle offline-like scenarios, though I didn't pursue upstreaming it since it is pretty niche, though there seems to be some interest for that. Would the Cargo team be interested in having such futures integrated into Cargo itself, or would it be preferable to keep that as outside tools due tue their experimental nature (which I would fully understand)? I'd be interested in working on the "Populating the global cache" part, and maybe go a step further there and also allow populating the cache from other sources, e.g. a supplied tarball, which I think could be a good way to interface with IPFS-like tools. |
Rather than or in addition to this flag, could carrying on with an out-of-date index be the default behavior when the network request to update it fails? |
@Ophirr33 awesome! Let me know if you need any help! @hobofan interesting! We haven't discussed much in Cargo about adding new kinds of sources (e.g. mercurial, subversion, ipfs as you're adding, etc). Unfortunately that sort of feature isn't easily addable to Cargo (unlike subcommands) as the sources implementation is pretty intrusive. I'd be interested to learn more about this but we may wish to discuss that on a separate issue perhaps? @SimonSapin ah yes that's sort of what I was alluding to with:
In that the behavior today is quite intentional, returning an error when the network fails. Silently falling back to different resolution behaviors can crop up as often quite subtle "bugs" which would likely inundate us with bug reports. This feature, as you can probably tell, is not a minor feature as well and will take some time to get right. We'll have to see what the error messages look like and make sure they're adequate before stabilizing this feature. |
@alexcrichton What I’m proposing (whether or it’s the default) seems like it would be a much smaller feature: accept to work with an out of date index. That’s it. Resolution might still select a crate version that’s not in cache, and trying to fetch that might fail. The user can then opt into a different (likely earlier) version with This would not be as convenient as a "full" airplane/offline mode, but it would already help unblock situations where the only failure is Cargo refusing to work with a minutes-old index. |
I assume that what’s desirable is the resolution algorithm, not index update failure being fatal. |
@SimonSapin I'm not sure what the exact rules are internally, but it seems like one of the current invariants is "if you do a |
Production build should probably have an up-to-date |
No, under normal circumstances we do want to stay up to date. I've had direct real-world experience with the opposite approach, with a package manager that defaults to building with the set of packages and versions that it has already installed (globally, not per-package like cargo), and only updates the global versions when explicitly requested. This has some advantages, but it also means people often build against older versions of their dependencies, so in practice that ecosystem sees less testing and more breakage between the latest versions of packages. Adding a flag seems good, as does telling the user they might want it; making it happen automatically does not. |
As discussed on reddit and internals, perhaps it should be considered naming this something other than "airplane" mode, as airplane is far from being the only use case for this. I would recommend decoupling the name from a specific use case. If it uses the local cache, maybe name it |
I’m not suggesting not to update the index at all. But how is it good to abort if we can’t right now? +1 to "offline" over "airplane". |
I do like the idea of calling it "offline" mode. (Though "airplane" might be a good keyword to use in |
@SimonSapin Better than proceeding in a different mode implicitly. |
@alexcrichton thoughts on changing the flag from "airplane" to "offline"? I've seen a couple of folks here and on reddit mention it. |
Bikeshedding: In a perfect world, it would be nice to have a flag that clearly meant something different from the existing flags |
@Ophirr33 either's fine by me! I don't feel too strongly one way or another, we can always continue the bikeshed during stabilization |
What about --no-network? |
I dislike "airplane" for a few reasons:
I think calling it "airplane mode", or referring to airplane mode, in the docs is fine and potentially helpful to users who do know the term. |
Why does it need to be a special mode? Just try to talk to the network, and if that fails, try to do without. |
@jethrogb See, among other things, #4686 (comment) . |
@joshtriplett That seems to be about being offline by default, that's not what I'm suggesting. Edit: scrolling up a bit, there does seem to be some related discussion. The concern seems to be about silent failures. I still think that just printing a warning message about not being online and proceeding with offline resolution is sufficient. |
@jethrogb Consider scripts, where that warning would go unseen. (As an alternative to consider, which I'm not advocating, we could have a |
Any scripts that requires robustness against unexpected behavior should probably be using |
The scenario I'm imagining is you have an intermittent network, then you don't get the most recent versions in your lockfile, but when you go to read the docs they have APIs that aren't present in the versions you have. This seems potentially very confusing, and forcing the user into the loop by making them add |
The scenario where you're looking at documentation of non-existent functionality because you haven't run Really, when I run |
I agree that that is also bad and would like to mitigate it somehow, it doesn't seem apropos to the discussion at hand?
If the build succeeds it seems probable to me that users will not pay attention to (or possibly even see) the warning message printed prior to all of the "Compiling ..." messages. I don't think its correct to be flippant about this possibility. |
@Ophirr33 just a quick ping on this -- are you still working on it? |
@Alex-Addy sort of! I think we'll keep it open as a tracking issue though |
@Alex-Addy #4770 done everything except "Populating the global cache" feature. |
For the "populating the global cache" feature, I have implemented my ideas from above. I have sucessfully added the following features:
In total this file takes Now this wonders me: where does this tool fit in? It doesn't just dump everything into the global cache because that would be quite a size requirement (12 GB instead of 3.3 GB). One way of wiring up cargo to it would be to extend cargo-local-serve to match the crates.io API via a localhost http service... it already implements http intended for end users so all I need to do is to imitate the json API. The issue is that currently this means to go all-in meaning that I'd have to ship with my own copy of the registry that is altered to point at localhost as well. |
Just to think out loud on the subject of caches, you could imagine a setting where cargo accepts a list of binaries that represent cache methods, and there is some 'protocol' for communicating with them to 'get' entries (maybe also supports set?). Then it doesn't matter what compressed format your cache is in - cargo just checks each one in turn, and there's not need to worry about registry indexes etc. |
If we wait for the alternative/private cargo repositories then one could
set up a private server locally. That private repo would then be
responsible for managing its own cache and cargo could use that for free.
…On Jan 20, 2018 14:09, "aidanhs" ***@***.***> wrote:
Just to think out loud on the subject of caches, you could imagine a
setting where cargo accepts a list of binaries that represent cache
methods, and there is some 'protocol' for communicating with them to 'get'
entries (maybe also supports set?). Then it doesn't matter what compressed
format your cache is in - cargo just checks each one in turn, and there's
not need to worry about registry indexes etc.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4686 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABtT49k_qGclhmZ9o4sqmu7jymOS-gP8ks5tMjoCgaJpZM4QMKiy>
.
|
@est31 While it’s interesting to play with, I don’t think the “download everything” approach is viable long-term. It’s 3.3 GB compressed today for 13k unique packages, but looking at other languages’s repositories it’s plausible that we’ll have 10× that number in a few years. Dozens of gigabytes is still manageable if you really need it, but it’s not something the average Rust developer will want to deal with. |
@SimonSapin definitely, a full mirror will get more and more out of reach for the average developer. Similarly, crater runs will grow more and more expensive. Yet we are still doing them and not avoiding them because maybe in 5 years they would become too expensive! We live in the here and now :). As for the future, we should always provide full mirrors for non-average use cases. E.g. if you really want to be off the grid or you need it for your company or similar... Unless we actually want to get fat (e.g. by promoting uploading of binaries like npm is doing it, in my eyes a very misled policy), it will take some time until crates.io gets bigger than a TB. The technique will always be helpful for downloading some interesting slice of crates.io. "top 100 packages" is far too little to make sense for any developer to use for local development. "top 10k" on the other hand... We can still attain size reductions, e.g. by changing cargo to upload test data separately. Some of the biggest files uploaded are actually test data. |
I’m not saying there’s no place for full registry downloads, only that they’re probably not a solution to "offline mode cargo" and so this might not be the best thread to discuss them. |
@SimonSapin right now full downloads are a very good solution and we can see what to do with them once the ecosystem grows. Then we can also ask people for which purposes they are using "offline mode cargo" for so we can make a more comprehensive assesment. |
When I used |
As there hasn't been any activity here in over 6 months I've marked this as stale and if no further activity happens for 7 days I will close it. I'm a bot so this may be in error! If this issue should remain open, could someone (the author, a team member, or any interested party) please comment to that effect? The team would be especially grateful if such a comment included details such as:
Thank you for contributing! (The cargo team is currently evaluating the use of Stale bot, and using #6035 as the tracking issue to gather feedback.) If you're reading this comment from the distant future, fear not if this was closed automatically. If you believe it's still an issue please leave a comment and a team member can reopen this issue. Opening a new issue is also acceptable! |
This is still a feature we need. The |
@joshtriplett this is a cool feature! If we do this, we might want the "default set" (top X or whatever) to be in a git repository so that incremental updates are pleasant. |
This is a critical part required for those working in offline or isolated networks. In an isolated network importing the entirety of crates.io is not acceptable due to various security concerns. But individual crates can be imported through review for manual inclusion. |
|
@steakknife: Great. What is the relation/interaction with https://github.com/alexcrichton/cargo-vendor? |
Stabilize offline mode. This stabilizes the `--offline` flag as detailed at #5655 (comment). It also adds the `net.offline` config value. Closes #5655 Closes #4686
The topic of "offline" Cargo has come up quite a few times historically and with the advent this reddit post it got the @rust-lang/cargo team thinking about this problem again and how we might solve it. The discussion got a little sprawled but this issue is intended to be the distillation of the core feature required to allow Cargo to work easier with offline-like situations.
As some background, this issue is intended to be useful in situations such as when you're on an airplane, subway, or tooling situation where you don't want to do extraneous network requests. In these situations Cargo's default behavior, updating the crates.io index, can often fail. Furthermore why Cargo updates the index is often subtle and opaque, causing quite a bit of frustration when users don't expect network traffic to happen and then it does!
An example situation that we'd like to enable is when you work with Rust on a day-to-day basis, perhaps with a good number of Cargo projects. In this case you've got a global crate cache in
$HOME/.cargo/registry
likely with a relatively-up-to-date index and a semi-populated crate cache. This means that if you were to get on a plane and then want to start a similar project, namely you'll share dependencies with crates you've previously worked on, in theory all the data is there for Cargo to consume. Today, however, any project missing a lock file will attempt to update the registry, failing in "airplane" situations.The key insight behind this issue is that this flag will change Cargo's crate graph resolution behavior. Namely Cargo will generate a different
Cargo.lock
depending on whether this flag is passed or not. The purpose of this is to ensure that Cargo's default behavior today (which is desirable in many situations) doesn't get affected too much with this situation.With all that in mind I believe the steps for implementing this would look similar to as followed (although perhaps not exhaustively):
airplane
. This means that your flag will be activated with something likecargo build -Z airplane
Config::cli_unstable
. TheConfig
structure is ubiquitous throughout Cargo and is in general intended to house CLI configuration or other global concerns in Cargo.network_allowed
method.sources/registry
. This module contains all the support necessary for using a registry (crates.io) as a remote (aka you'll download things).At this point is where the work will likely start. The general idea of how this will be implemented is that the behavior of the
Registry
"source" (aka an implementation of theSource
trait) will differ depending on whether-Z airplane
is passed. Ideally this flag isn't too too intrusive throughout Cargo so we'll ideally want a pretty localized implementation.The first part that we'll tackle is the
update_index
method. This will want to immediately returnOk
if the airplane mode is activated. This will prevent an index update from happening in airplane mode, even when Cargo would otherwise request it (for example a lock file is missing or a dependency was added).Next up Cargo will need to change its view of the index in airplane mode. We know that we won't be able to download any packages as the network isn't available, so we need to ensure that Cargo's crate graph resolver never asks to download something. The way the crate graph resolver works is through the
Registry
trait, mostly thequery
method. This method takes aDependency
(basically a name and semver requirement) and then invokes the callback with all possible "summaries" (aka packages) that the source has.When we're in airplane mode the number of possible summaries is far less than when we're not in airplane mode (we can't download things!). This means that we're going to want to filter the list of summaries that the crates.io registry is reported to have to be just those we've downloaded to our local computer. To do this you'll probably want to update the
load_summaries
method.In
load_summaries
we'll query the underlying index implementation (theremote.rs
modified before) for all possible versions of a crate. Thelines
iterator in this block will be an iterator over each line of a file in the index (browsable at https://github.com/rust-lang/crates.io-index). Each line in this index is parsed and then pushed onto a local list of summaries. What we'll want to do is apply another filter on top of this. If the summary isn't downloaded to the local computer then we'll want to skip it.In other words, the index will report, for example, that we could use libc 0.1.4 or libc 0.2.0. If we only have libc 0.2.0 then we'll want to report back that only libc 0.2.0 is available, whereas if the airplane flag were not passed we'd report both 0.1.4 and 0.2.0 as being available. To do this you'll probably want to add a method to the
RegistryData
trait likeis_crate_downloaded(&self, id: &PackageId)
and the function would look something like this, testing if the.crate
file exists.At this point the airplane flag should (a) avoid updating the index even if asked and (b) ensure that we never ask to download a crate that's not already downloaded. At this point I believe the feature should be effectively done! You should be able to write some tests, play around with it locally, etc, and see it all working.
There are, however, a number of extensions that will be required for stabilization, so I'll write those down as well:
Git repositories
In addition to crates.io we'll want to handle git repositories and the airplane flag as well. This one may be a bit easier where basically what we want to do is to ensure Cargo is guided towards not asking for a network update by altering the behavior of a git source.
Git repositories in Cargo are modeled in two locations. Each git source has a "database" which is a bare git checkout in
~/.cargo/git/db
. This database is basically just a store of all fetched objects from the remote. Checkouts then happen at~/.cargo/git/checkouts
. Each checkout is permanently cached and looks like~/.cargo/git/checkouts/$name-$hash/$sha
.With that in mind airplane mode for git repositories would at a high level just avoid updating the database and then otherwise the checkout would be cloned from the database as usual. I think that most of this will just fall out of updating this if branch.
Note that right now any git repository with submodules won't work. We currently don't clone submodules from the global database but instead re-clone each submodule from the network on all checkouts. Once we fix that bug (fetch all submodules to the database, then clone from there onto the disk) then it should also "just work"!
Recommend the "airplane flag"
If you're on an airplane and are unaware of the airplane flag then it would be quite nice to teach you about it! This means that when the resolution process fails with something that looks like a network error we probably want to tweak the error message with a "did you mean" style hint. The idea here is that if you're on a plane and type
cargo build
then Cargo should ideally say "you should try using-Z airplane
".I think the way we'll probably want to do this is to test for spurious network errors whenever we update a source. If it looks like a spurious network error going out then we can probably attach on some context saying "this may work if you instead pass
-Z airplane
" or something like that.You probably want to test this out by disconnecting your network and seeing what the error looks like.
Resolution errors
One of the primary failure modes of the "airplane flag" is that you added a dependency which wasn't previously cached or otherwise the local state of the index/crate cache isn't able to build a crate. This may happen because we're not updating the index (new crates/versions aren't available) or because we're filtering the return value of
load_summaries
on the registries (not all entries in the on-disk index would have been downloaded).In any case we want to make sure that intentional or accidental use of the
-Z airplane
flag doesn't cause too obscure errors. Right now Cargo has pretty bad crate graph resolution errors, unfortunately.The failure mode here is likely to come out of resolution, not querying for crates. This means that the error is generated in this module which is one of the gnarliest modules in Cargo. You may want to skim it, but I think the main location to modify is this one which is the main source for generating resolution errors.
The error here should basically say something along the lines of "crate resolution failed, we see you're passing
-Z airplane
and it may be failing because of that"Failing
cargo update
Similarly to weird resolution errors a
cargo update
is basically guaranteed to not work. We should bail out of updating as early as we can if you invokecargo update -Z airplane
.Populating the global cache
Right now Cargo has no explicit way of populating the global cache. That means this is currently only catering to the use case of "I develop Rust locally and hence have a pretty populated global cache". This isn't, however, catering to the case where you're trying out a dependency for the first time on a plane.
We'll eventually want a subcommand which downloads crates and probably their transitive dependencies, as well as explicitly updates the index. The design here is a little unclear, but if you have ideas please let us know!
The text was updated successfully, but these errors were encountered: