Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip no-op git fetches from the registry #2451

Closed
SimonSapin opened this issue Mar 8, 2016 · 8 comments
Closed

Skip no-op git fetches from the registry #2451

SimonSapin opened this issue Mar 8, 2016 · 8 comments

Comments

@SimonSapin
Copy link
Contributor

Quoting CocoaPods/CocoaPods#4989 (comment):

this new, preview API should help: https://developer.github.com/changes/2016-02-24-commit-reference-sha-api/. It's helped Homebrew dramatically reduce the number of no-op git fetchs which also will make things better for your users as a no-op API HTTP call is significantly faster for you (and less expensive for GitHub) than a no-op git fetch.

Cargo could use this to determine that its local copy of rust-lang/crates.io-index is already up to date and it doesn’t need to git fetch.

In addition to limit the risk of hitting GitHub’s rate limiting on git traffic (see #2452) this can maybe make "Updating registry" faster.

It doesn’t seem rare in my Cargo using to see "Updating registry" multiple times within a few minutes, while the registry has had an average of one commit per ~half hour since its creation.

Servo may be more of a pathological case, but we have a ./mach cargo-update command that call cargo update in each of four "top-level" crates (that each have a Cargo.lock file), causing the registry to be updated four times within a few seconds. Updating the registry seems to often dominate the time taken by this command (and cargo update in general).

@alexcrichton
Copy link
Member

This seems reasonable to do eventually, but I don't want to start using unstable GitHub APIs as we're not really watching updates to the API.

It'd be nice though to see some benchmarks about how fast this is such as:

  • How fast is the HTTP request vs a noop fetch?
  • How fast is a HTTP request followed up by a fetch? (vs just a fetch)

@sfackler
Copy link
Member

Also worth thinking about what the behavior should be when the registry isn't Github hosted.

@dtolnay
Copy link
Member

dtolnay commented Jul 12, 2016

I am seeing the HTTP request take 440 (+/- 10) ms and a noop fetch take 1350 (+/- 150) ms but occasionally up to 4 seconds.

At those numbers, HTTP request followed if necessary by fetch would save time if less than 2/3 of registry updates require a fetch which is certainly true in my usage.

@alexcrichton
Copy link
Member

Thanks for the data @dtolnay! I'd actually be pretty curious to test this out and see what kind of perf gains we get in practice. We could perhaps even go as far as having an environment variable flag to test it out unstable-y for awhile.

Note that the referenced github API support was in preview when first introduced, but this appears to no longer be the case, so we should be good to go in that sense!

@sfackler I would expect this logic to be along the lines of when updating a git index which has the host of "github.com" the special logic takes effect, so it should be quite easy to continue to support arbitrary registries not on github. We could then implement fast paths for other registries in the future as well if need be.

@sfackler
Copy link
Member

Cool. I could see having a "github-like" flag in custom registry config to enable the fast path for stuff hosted on e.g. GitHub for Enterprise.

@alexcrichton
Copy link
Member

Turns out this wasn't too hard to implement, but it's based on #2857 which may take a moment to merge.

I didn't notice a huge difference in github registry update times, but perhaps it'll add up over time?

@Ericson2314
Copy link
Contributor

I'd be interested in trying this with the rust repo. Cargo fetches that ungodly slowly, but that may be the actual fetching itself rather than the would-be fast path.

bors added a commit that referenced this issue Aug 9, 2016
Speed up noop registry updates with GitHub

This commit adds supports to registry index updates to use GitHub's HTTP API [1]
which is purportedly [2] much faster than doing a git clone, and emprically that
appears to be the case.

This logic kicks in by looking at the URL of a registry's index, and if it looks
exactly like `github.com/$user/$repo` then we'll attempt to use GitHub's API,
otherwise we always fall back to a git update.

This behavior may *slow down* registry updates which actually need to download
information as an extra HTTP request is performed to figure out that we need to
perform a git fetch, but hopefully that won't actually be the case much of the
time!

[1]: https://developer.github.com/v3/repos/commits/#get-the-sha-1-of-a-commit-reference
[2]: https://developer.github.com/changes/2016-02-24-commit-reference-sha-api/

Closes #2451
@bors bors closed this as completed in #2974 Aug 9, 2016
@dtolnay
Copy link
Member

dtolnay commented Aug 19, 2016

I just want to follow up and say this change has been a noticeable improvement for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants