Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with GitHub's rate limiting #855

Closed
marten-seemann opened this issue May 30, 2021 · 21 comments
Closed

Dealing with GitHub's rate limiting #855

marten-seemann opened this issue May 30, 2021 · 21 comments

Comments

@marten-seemann
Copy link

First of all, thank you for this awesome GitHub Action! We're using it to distribute and synchronize workflows across hundreds of repositories (libp2p, IPFS, Filecoin).

When deploying an update, we create hundreds of PRs practically at the same moment, and (understandably) GitHub is not entirely happy about that: it triggers their abuse detection mechanism.

Apparently, there's a Retry-After header that one could use to wait and automatically retry the request: https://docs.github.com/en/rest/guides/best-practices-for-integrators#dealing-with-abuse-rate-limits. Any thoughts on implementing a retry function based on this header?

@peter-evans
Copy link
Owner

Hi @marten-seemann

Glad you are finding the action useful.

I'll have a go at implementing this. I think it can be done quite easily by leveraging octokit's retry plugin. The plugin appears to respect the Retry-After header in responses.

@peter-evans
Copy link
Owner

I've added the retry octokit plugin in a feature branch. The default settings are for it to retry up to 3 times, while respecting the Retry-After header. It would be very helpful if you could try this out to make sure it works in your case. You can try the version of the action in the open pull request by changing the action version to @retry.

        uses: peter-evans/create-pull-request@retry

@marten-seemann
Copy link
Author

Hi @peter-evans, thank you for this super quick reply and the implementation!

I just tried it out, and it looks like we're still running into GitHub's abuse detection mechanism, for example here: https://github.com/protocol/.github/runs/2720817821?check_suite_focus=true. I don't see any log output that would indicate that a retry is happening, but maybe I'm missing something?

@peter-evans
Copy link
Owner

Ah, I see where the problem is now. The abuse detection is kicking in when the branch is being pushed to the repository with git push, not the calls to the GitHub API to create the pull request as I first thought. The plugin I added only works for the GitHub API calls, not the git operations. Let me investigate how best to retry the git operations.

@peter-evans
Copy link
Owner

@marten-seemann I've added logic to retry the git push command. Unfortunately, I don't think there is any way to see the Retry-After header from the git response, but I think the default value is 60 seconds. So I've hardcoded the wait time to 60 seconds, plus up to 10 seconds of jitter.

Let's see if this resolves the problem. If the command is retried it should appear in the logs.

@peter-evans
Copy link
Owner

@marten-seemann I'm periodically checking the runs here to see if there have been any retries during the "Deploy" workflow, but not seen any runs for a while. I'll wait until we can confirm that this solution works before merging it in.

@marten-seemann
Copy link
Author

@peter-evans We only do runs that deploy to all ~100 repositories infrequently - it creates a lot of noise. The next run will probably be adding Go 1.17 (which will be released in August), unless something urgent comes up before. Can we keep this issue open until then?

@peter-evans
Copy link
Owner

@marten-seemann Sure, no problem. I'm happy to wait until we can confirm that the PR changes work well.

@marten-seemann
Copy link
Author

Hi @peter-evans, thank you for your patience!

We did another deployment today, and we ran into rate limits on a large number of jobs, for example here: https://github.com/protocol/.github/runs/3351001762?check_suite_focus=true.
I'm not sure if did retry anything, but judging from the execution time of the step (2s) it probably didn't. Do you have any idea why?

@peter-evans
Copy link
Owner

@marten-seemann

I've looked through all the runs and what is interesting this time is that none of them triggered abuse detection for git push of the PR branch, which was the case previously. Such as this run from June: https://github.com/protocol/.github/runs/2720817821?check_suite_focus=true

All of the failures are from the GitHub API call to create the PR, returning this error:

Error: You have exceeded a secondary rate limit and have been temporarily blocked from content creation. Please retry your request again later.

I found an explanation of the secondary rate limits:
https://docs.github.com/en/rest/overview/resources-in-the-rest-api#secondary-rate-limits

These are not the standard rate limits, but additional limits on certain actions to prevent abuse. You can see the response example returns 403 Forbidden. This HTTP code is, by default, not retryable by the plugin-retry.js Octokit plugin. I didn't realise that, and that is why it didn't retry any of the requests. I've updated the retry feature branch to allow retrying 403 error responses.

There is some further information here:
https://docs.github.com/en/rest/guides/best-practices-for-integrators#dealing-with-secondary-rate-limits

  • Make requests for a single user or client ID serially. Do not make requests for a single user or client ID concurrently.
  • If you're making a large number of POST, PATCH, PUT, or DELETE requests for a single user or client ID, wait at least one second between each request.
  • Requests that create content which triggers notifications, such as issues, comments and pull requests, may be further limited and will not include a Retry-After header in the response. Please create this content at a reasonable pace to avoid further limiting.

The first two points above are why I think you are being caught by the abuse detection. You are using a PAT created on one user for all the requests, so some are being executed concurrently and are not 1 second apart. There is not much I can do about this in the action other than retry a few times. In your case there is no Retry-After header, either.

You might want to think about redesigning your workflows to execute serially, instead of in parallel. Or, perhaps use multiple PATs created on different users.

@peter-evans
Copy link
Owner

@marten-seemann Please could you let me know when you run another deployment. I would like to check if 403 error responses are being successfully retried.

@marten-seemann
Copy link
Author

Hi @peter-evans, first of all, thanks again for all your work!
It will probably be a while before we run another deployment. Deploying to 150 repos creates a lot of noise.

The first two points above are why I think you are being caught by the abuse detection. You are using a PAT created on one user for all the requests, so some are being executed concurrently and are not 1 second apart.

I think you're right. We might have to change the script to be a little bit less aggressive here.

@dontcallmedom
Copy link

random remark: Octokit's throttling plugin was broken for some weeks due to a change in the error message sent upon hitting the abuse rate - see octokit/plugin-throttling.js#437

@peter-evans
Copy link
Owner

@dontcallmedom Thanks. Good to know! It makes sense now why the message changed.

@villelahdenvuo
Copy link

I was also hitting the rate limit issue, I will try this branch to see if it helps!

@peter-evans
Copy link
Owner

@villelahdenvuo I don't recommend using the retry branch of this action. It's very old now and has missed some important updates. If you are hitting the rate limit then there are probably things you should do in your workflows to slow down processing.

@marten-seemann Are you still using this branch?

I think I need to revisit the code in this branch and decide whether or not to merge some of it into main. I have a feeling that not all the code in this branch was working as intended and/or didn't really make a difference.

@paulz
Copy link

paulz commented Oct 31, 2022

Our workflow just failed with this error:

Create or update the pull request
  Attempting creation of pull request
  Error: API rate limit exceeded for user ID 111111111.

where 111111111 was some number

@paulz
Copy link

paulz commented Oct 31, 2022

re-running the failed scheduled workflow seems to work.

do not understand how can we be rate limited if we use our own repo scoped token

 token: ${{ secrets.REPO_SCOPED_TOKEN }}

@peter-evans
Copy link
Owner

@paulz Using a PAT doesn't stop you from being rate-limited. I think GitHub are just more generous with the limits for authenticated user accounts. If you have used the same PAT across many workflows that are running simultaneously then I imagine you could hit the rate limit. (Or even multiple PATs associated with the same user account could also contribute to the same rate limit I think).

@jmmclean
Copy link

jmmclean commented Dec 1, 2022

im hitting secondary limits, and we arent doing much...maybe 1 PR a min. We specify the token as well, but it doesnt seem to be helping...also, this token has PLENTY of requests remaining

{
  "limit": 5000,
  "used": 21,
  "remaining": 4979,
  "reset": 1669931929
}

wonder if github is gonna put up a status soon ab some degradation.

@peter-evans
Copy link
Owner

I actually ran into this issue myself recently when a lot of automated PRs were being created for tests.

Error: You have exceeded a secondary rate limit and have been temporarily blocked from content creation. Please retry your request again later.

I'm fairly sure this error is "unretryable" in the sense of the action being able to wait and retry it, because GitHub forces you to wait a considerable length of time. It's not rate-limiting where it will let the request go through after a few seconds. So it's not feasible for the action to wait for such a long time.

The conclusion in this comment still stands. If you run into this issue, then the answer is to redesign your workflows to either slow down, or use PATs created on multiple machine user accounts.

I'm going to delete the retry branch soon. So if you are still using it, please move to v4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants