-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Serve crates-io registry over HTTP as static files #2789
Conversation
It would be good to touch on offline use scenarios and how these (as well as slow links) would be affected by these changes. |
I've added more info about network speed. To sum it up:
Note that cost of the git index is proportional to total size of crates-io, but cost of this solution is proportional to the project size. At some point git clone will become too large to be usable, but this solution will remain functional (even large Rust projects don't use all crates-io dependencies at the same time, so they won't grow as big as fast). |
Thanks to @carols10cents for the idea
One third-party tool that could not support a feature with this design is
well, it still could be implemented by fetching every permutation, that would be |
I think it might make sense to spell out how Cargo would use a new index in some more details. Currently, it works as follows. For every command that can change lockfile (most commonly, first The two interesting properties of status quo are:
What would we do in the new implementation? I think there's no explicit registry update anymore. Instead, we update registry on a per-crate basis lazily, when we do version resolution? |
@matklad This is covered in "greedy fetch" section. It works as follows:
The registry storage format would change from a bare git repo to a directory structure similar to a git checkout. Theoretically it'd be possible to mix the two and add "fake" commits to a local git repo to keep using it as the storage format, but I don't think that's worth the complexity. |
Possible alternative that wouldn't require http2 support: Download index as zip, which could utilize github's download as zip functionality. |
The zip would be good only for initial download (you wouldn't want to download 20MB+ on every cargo update), so it sounds like a variation of "Initial index from rustup" alternative in the RFC. |
Zips don't have to be compressed (store "compression") and if github supports range requests one could only fetch the headers and the parts that changed, i.e. do an incremental download. |
Oh, that's clever! So yes, a ZIP could be used to combine multiple requests into a two/three range requests (to find offsets of the ZIP central directory, to get the directory, to get ranges of desired files).
Currently the index requires registries to list all of their crates ahead of time. In a model where individual crates are requested directly by their name, a registry could even create crates on demand (lazily). That may be interesting for setups where Cargo registry is a facade for other code repositories. |
You are describing something akin to zsync (or librsync), which interestingly I'm in the process of re-implementing in Rust 🤔 |
If cargo hits http static files rather than a git repo wouldn't that make life a lot easier for private repositories like artifactory for example to support Rust? Seems like this would lower the barrier to supporting Rust in an enterprise setting so a thumps up from me. |
Personal opinions, not speaking for the team. This feels like a direction that could actually work! I have participated in many "just use HTTP" conversations in the past, this is the first time it has sounded plausible. (Sorry if you explained it to me before and I was to dance to get it.) At some point between now and NPM this will be faster. That being said my sense is that the tipping point is far off. Additionally it is very important to read the line "Since alternative registries are stable, the git-based protocol is stable, and can't be removed." As such I will be spending my time on ways to improve the git-based protocol. This RFC is convincing enough that I would not object to someone spending there time on developing it. On Performance: For Crates.io: For Alternative Registries: |
Should compression support be mandatory for clients? i.e. they must send an |
The Cargo team discussed this RFC over the past few weeks, and we wanted to leave some thoughts of what we discussed and what we're thinking about this RFC. Overall none of us are opposed to the principles of this RFC or the idea it's chasing after. If there's a better solution for hosting and managing the index, that'd be great to implement! If it's better after all then it's better :) Our main discussion point, though, was that implementing this style of fetching the index is likely to be a very significant undertaking from a technical perspective in Cargo. There's a ton of concerns to take care of and it's not quite as simple as "just throw HTTP/2 at the problem and let it solve everything". We'd ideally like to see a prototype implementation to prove out the possible gains that an architecture like this would bring us, but unfortunately a prototype implementation is likely such a significant undertaking that there needs to be more buy-in before even that happens. The conclusion we reached is that we'd be willing to accept this under the idea that it's an "eRFC" or "experimental RFC". It would basically be a blessing that the Cargo team is interested in seeing research and development around this area of hosting the index and fetching it on the client. We aren't wed to any particular details yet, and the actual details of this RFC as-written are highly likely to change while a solution is being developed. |
I've implemented a proof-of-concept with greedy resolution that takes into account features and version ranges. https://github.com/kornelski/cargo-static-registry-rfc-proof-of-concept Tested with
Tested on a fast connection with 144 crates.io dependencies used by
As predicted, more dependencies make lookup faster, because lookup starts with more parallelism. Just
All these tests were simulation of the worst case of a blank state with no cache. |
I've added cache and revalidation support. Cache flattens the check's critical path to a depth of 1, and revalidation makes bandwidth use minimal, so this makes all checks (regardless of crate(s) checked) almost equally fast. With disk cache and a fast connection:
With disk cache and a 3G connection:
|
A problem this RFC doesn't seem to address with its local cache is deleting entries from the index: it's something we do very rarely, but sometimes we're legally required to do so (DMCA, GDPR...), and in those caches we can't leave the deleted entries in the users' local caches. The git index solves this cleanly (it's just yet another commit).
If we do this we should just host the content on Rust infrastructure. We should still be on the 0.04$/GB CloudFront pricing tier, which is sustainable for those small files. |
Note that Cargo currently does not delete crate tarballs and checked out copies of crates even after they have been removed from the index. Deleted metadata files continue to be distributed as part of git history until the index branch is squashed. With a static file solution, when a client checks freshness of a deleted crate, it will make a request to the server and notice a 404/410/451 HTTP status. It can then be made to act accordingly, and clean up local data (even tarball and source checkout). If the client is not interested in deleted crate, it won't check it, but chances are it never did, and didn't download it. If ability to immediately erase deleted data is important, then the "incremental changelog" feature can be extended to notify about deletions proactively. |
The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. The RFC will be merged soon. |
Huzzah! The @rust-lang/cargo team has decided to accept this RFC! To track further discussion, subscribe to the tracking issue here: rust-lang/cargo#9069 |
I don't see |
@glandium It was mentioned in the discussion. Expand all the collapsed bits and Ctrl+F for "shallow clone". |
Oh, the RFC also has something about shallow clone, but I was interpreting it as narrow/partial clone for some reason. Sorry for the noise. |
HTTP registry implementation Implement HTTP registry support described in [RFC 2789](rust-lang/rfcs#2789). Adds a new unstable flag `-Z http-registry` which allows cargo to interact with remote registries served over http rather than git. These registries can be identified by urls starting with `sparse+http://` or `sparse+https://`. When fetching index metadata over http, cargo only downloads the metadata for needed crates, which can save significant time and bandwidth over git. The format of the http index is identical to a checkout of a git-based index. This change is based on `@jonhoo's` PR #8890. cc `@Eh2406` Remaining items: - [x] Performance measurements - [x] Make unstable only - [x] Investigate unification of download system. Probably best done in separate change. - [x] Unify registry tests (code duplication in `http_registry.rs`) - [x] Use existing on-disk cache, rather than adding a new one.
Please note I have removed the Please feel free to re-add the |
The existing crate index format is good enough to be served over HTTP as-is, and still fetched and updated incrementally.
This can be faster than a git clone of the whole index ahead of time, and will spare clients from unbounded growth of the index.
Originating forum thread.
Rendered RFC