-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use GITHUB_TOKEN in github_fast_path if available. #13563
Conversation
r? @weihanglo rustbot has assigned @weihanglo. Use r? to explicitly pick a reviewer |
src/cargo/sources/git/utils.rs
Outdated
@@ -1458,6 +1458,10 @@ fn github_fast_path( | |||
if let Some(local_object) = local_object { | |||
headers.append(&format!("If-None-Match: \"{}\"", local_object))?; | |||
} | |||
if let Ok(token) = std::env::var("GITHUB_TOKEN") { | |||
// Avoid API rate limits if possible. | |||
headers.append(&format!("Authorization: token {token}"))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the GitHub REST API doc, both token
and Bearer
scheme work. The example for GitHub Actions uses Bearer
. Should we use that instead?
Note: In most cases, you can use Authorization: Bearer or Authorization: token to pass a token. However, if you are passing a JSON web token (JWT), you must use Authorization: Bearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it depends on which scenario we want to address. If this is for CI, then bearer seems more reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, went ahead and changed it.
6e04048
to
fe3cb3d
Compare
I decided to go ahead and update the github test to actually verify this behavior, testing the various code paths for different reference kinds. I also added redirect support, since it now seems to be required. |
Hm, I don't yet know why that is happening. I had tested tokens with essentially no permissions. I wonder if the Actions token is different somehow. |
But tests passed on Linux and M1 mac. Let's @bors try again |
@bors try |
Use GITHUB_TOKEN in github_fast_path if available. This changes the GitHub fast-path to include an Authorization header if the GITHUB_TOKEN environment variable is set. This is intended to help with API rate limits, which is not too hard to hit in CI. If it hits the rate limit, then the fetch falls back to fetching all branches which is significantly slower for some repositories. This is also a partial solution for #13555, where the user has a `rev="<commit-hash>"` where that commit hash points to a PR. The default refspec of `["+refs/heads/*:refs/remotes/origin/*", "+HEAD:refs/remotes/origin/HEAD"]` will not fetch commits from PRs (unless they are reachable from some branch). There is some risk that if the user has a GITHUB_TOKEN that is invalid or expired, then this will cause `github_fast_path` to fail where it previously wouldn't. I think that is probably unlikely to happen, though. This also adds redirect support. Apparently GitHub now issues a redirect for `repos/{org}/{repo}/commits/{commit}` to `/repositories/{id}/commits/{commit}`. The test `https::github_works` exercises this code path, and will use the token on CI.
💔 Test failed - checks-actions |
How worried are we about exhausting rate limit? It can be pretty easy to hit the GitHub rate limit, so I wonder if the number of queries we do will potentially make users unhappy if we exhaust the limit on their token... if this is only for CI, that's probably fine, but locally I could see it being pretty painful. What about enabling "hidden" usage of credentials with this? For example, if someone has a GitHub token that can access private repositories, just a cargo fetch would "reveal" that here, right? (My sense is that the threat model for cargo wouldn't really be affected here and so this isn't concerning). |
☔ The latest upstream changes (presumably #13561) made this pull request unmergeable. Please resolve the merge conflicts. |
I posted #13718 to fix the immediate problem with the redirect. |
The error in CI is:
I've been unable to reproduce that in my own account, though. 😦 |
Fix github fast path redirect. This fixes the GitHub fast-path check to look up the sha of a git ref. At some point, GitHub changed the API to redirect to a different URL. Currently cargo is failing the fast-path lookup with 301 response code. This can be tested in a project with a git dependency, and running `CARGO_LOG=cargo::sources::git::utils=debug cargo fetch` to verify it is picking up the fast path. This currently can't be tested in CI due to #13563.
Oh, I'm a dummy. The |
I was originally not concerned, but I'm a little uncertain now. The Actions token has a limit of 1,000 per hour (or 15,000 for GHEC). I was originally thinking that limit is per run. But now I understand it is per repository and the limit is shared across all runs. That could be a problem, for example, if you have a large number of workflows, and a hefty number of git dependencies, then I could imagine that chewing through hundreds of points relatively quickly.
I've been trying to think of ways this could be a security issue, but I haven't been able to come up with something concrete. I do have a fair sense of unease with it, though, since this could be a sensitive thing. Since the user would need to set I'm going to go ahead and close since I don't want to put any more time into it. This might be something we could consider doing for people who really want the fast-path performance, but I'm not sure it is worth it. |
This changes the GitHub fast-path to include an Authorization header if the GITHUB_TOKEN environment variable is set. This is intended to help with API rate limits, which is not too hard to hit in CI. If it hits the rate limit, then the fetch falls back to fetching all branches which is significantly slower for some repositories.
This is also a partial solution for #13555, where the user has a
rev="<commit-hash>"
where that commit hash points to a PR. The default refspec of["+refs/heads/*:refs/remotes/origin/*", "+HEAD:refs/remotes/origin/HEAD"]
will not fetch commits from PRs (unless they are reachable from some branch).There is some risk that if the user has a GITHUB_TOKEN that is invalid or expired, then this will cause
github_fast_path
to fail where it previously wouldn't. I think that is probably unlikely to happen, though.This also adds redirect support. Apparently GitHub now issues a redirect for
repos/{org}/{repo}/commits/{commit}
to/repositories/{id}/commits/{commit}
.The test
https::github_works
exercises this code path, and will use the token on CI.