Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to prevent hitting API call rate limit #76

Closed
PinkShellos opened this issue Dec 26, 2017 · 4 comments
Closed

How to prevent hitting API call rate limit #76

PinkShellos opened this issue Dec 26, 2017 · 4 comments

Comments

@PinkShellos
Copy link

My organization has a lot of GitHub data that we want to perform nightly backups of to a Drobo. I have been attempting to use this program to build it out, but I keep hitting the API rate limit, which times out the request for an increasing amount of time. Is there a way to tell the program to limit it's requests so that the data coming in is steady but not hitting the 5000 requests per minute threshold?

@josegonzalez
Copy link
Owner

There is not. Pull requests welcome.

@karlcow
Copy link

karlcow commented Oct 15, 2019

The API rate limit is 5000 HTTP requests per hour (not minutes as said above).

Let's say we want to backup issues.
A repo with more than 5000 issues, it starts to become a problem.

The theoritical limit is

  • 1.38888 requests per second.

So we could artificially set a timer with one request per second and we would be safe.
A backup for issues in a repo of 40,000+ issues would "only " take 11h6m40s.

def retrieve_data(args, template, query_args=None, single_request=False):
return list(retrieve_data_gen(args, template, query_args, single_request))

def retrieve_data_gen(args, template, query_args=None, single_request=False):
auth = get_auth(args)
query_args = get_query_args(query_args)
per_page = 100
page = 0
while True:
page = page + 1
request = _construct_request(per_page, page, query_args, template, auth) # noqa
r, errors = _get_response(request, auth, template)
status_code = int(r.getcode())
retries = 0
while retries < 3 and status_code == 502:
print('API request returned HTTP 502: Bad Gateway. Retrying in 5 seconds')
retries += 1
time.sleep(5)
request = _construct_request(per_page, page, query_args, template, auth) # noqa
r, errors = _get_response(request, auth, template)
status_code = int(r.getcode())
if status_code != 200:
template = 'API request returned HTTP {0}: {1}'
errors.append(template.format(status_code, r.reason))
log_error(errors)
response = json.loads(r.read().decode('utf-8'))
if len(errors) == 0:
if type(response) == list:
for resp in response:
yield resp
if len(response) < per_page:
break
elif type(response) == dict and single_request:
yield response
if len(errors) > 0:
log_error(errors)
if single_request:
break

There is also this piece of code which use rate limiting but only in case there's already an error.

def _request_http_error(exc, auth, errors):
# HTTPError behaves like a Response so we can
# check the status code and headers to see exactly
# what failed.
should_continue = False
headers = exc.headers
limit_remaining = int(headers.get('x-ratelimit-remaining', 0))
if exc.code == 403 and limit_remaining < 1:
# The X-RateLimit-Reset header includes a
# timestamp telling us when the limit will reset
# so we can calculate how long to wait rather
# than inefficiently polling:
gm_now = calendar.timegm(time.gmtime())
reset = int(headers.get('x-ratelimit-reset', 0)) or gm_now
# We'll never sleep for less than 10 seconds:
delta = max(10, reset - gm_now)
limit = headers.get('x-ratelimit-limit')
print('Exceeded rate limit of {} requests; waiting {} seconds to reset'.format(limit, delta), # noqa
file=sys.stderr)
if auth is None:
print('Hint: Authenticate to raise your GitHub rate limit',
file=sys.stderr)
time.sleep(delta)
should_continue = True
return errors, should_continue

The strategy could be slightly different.

  • Counting the HTTP requests: 𝑛
  • Marking the time of the first request: 𝑡₀ (seconds)
  • time of the current request: 𝑡𝑐 (seconds)
  • rate, an optional parameter: rate ≤ 1.38
if 𝑛 > (𝑡𝑐 - 𝑡₀) × rate : 
   wait 1 sec before next request

@eht16
Copy link
Contributor

eht16 commented Apr 13, 2020

I've created a very simple throttling approach in #149.
This is not very clever and it simply pause API requests a fixed amount of seconds but it helps to stay within the rate limits.
My use case is: the GitHub API user used for the backup is also used elsewhere. It doesn't matter how long the backup takes as long as there are a few API requests left for the other uses.

@garymoon
Copy link
Contributor

I am successfully using @eht16's throttling (💙) to keep below the rate limit when backing up very large orgs. I'm using --throttle-limit 5000 --throttle-pause 0.6 but YMMV. IMO @eth16's work should close this issue 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants