How to prevent hitting API call rate limit #76

PinkShellos · 2017-12-26T22:41:11Z

My organization has a lot of GitHub data that we want to perform nightly backups of to a Drobo. I have been attempting to use this program to build it out, but I keep hitting the API rate limit, which times out the request for an increasing amount of time. Is there a way to tell the program to limit it's requests so that the data coming in is steady but not hitting the 5000 requests per minute threshold?

josegonzalez · 2017-12-26T23:45:16Z

There is not. Pull requests welcome.

karlcow · 2019-10-15T01:37:34Z

The API rate limit is 5000 HTTP requests per hour (not minutes as said above).

Let's say we want to backup issues.
A repo with more than 5000 issues, it starts to become a problem.

The theoritical limit is

1.38888 requests per second.

So we could artificially set a timer with one request per second and we would be safe.
A backup for issues in a repo of 40,000+ issues would "only " take 11h6m40s.

python-github-backup/bin/github-backup

Lines 453 to 454 in fac8e42

    
           def retrieve_data(args, template, query_args=None, single_request=False): 
        
               return list(retrieve_data_gen(args, template, query_args, single_request))

python-github-backup/bin/github-backup

Lines 409 to 451 in fac8e42

    
           def retrieve_data_gen(args, template, query_args=None, single_request=False): 
        
               auth = get_auth(args) 
        
               query_args = get_query_args(query_args) 
        
               per_page = 100 
        
               page = 0 
        
               while True: 
        
                   page = page + 1 
        
                   request = _construct_request(per_page, page, query_args, template, auth)  # noqa 
        
                   r, errors = _get_response(request, auth, template) 
        
                   status_code = int(r.getcode()) 
        
                   retries = 0 
        
                   while retries < 3 and status_code == 502: 
        
                       print('API request returned HTTP 502: Bad Gateway. Retrying in 5 seconds') 
        
                       retries += 1 
        
                       time.sleep(5) 
        
                       request = _construct_request(per_page, page, query_args, template, auth)  # noqa 
        
                       r, errors = _get_response(request, auth, template) 
        
                       status_code = int(r.getcode()) 
        
                   if status_code != 200: 
        
                       template = 'API request returned HTTP {0}: {1}' 
        
                       errors.append(template.format(status_code, r.reason)) 
        
                       log_error(errors) 
        
                   response = json.loads(r.read().decode('utf-8')) 
        
                   if len(errors) == 0: 
        
                       if type(response) == list: 
        
                           for resp in response: 
        
                               yield resp 
        
                           if len(response) < per_page: 
        
                               break 
        
                       elif type(response) == dict and single_request: 
        
                           yield response 
        
                   if len(errors) > 0: 
        
                       log_error(errors) 
        
                   if single_request: 
        
                       break

There is also this piece of code which use rate limiting but only in case there's already an error.

python-github-backup/bin/github-backup

Lines 505 to 534 in fac8e42

    
           def _request_http_error(exc, auth, errors): 
        
               # HTTPError behaves like a Response so we can 
        
               # check the status code and headers to see exactly 
        
               # what failed. 
        
               should_continue = False 
        
               headers = exc.headers 
        
               limit_remaining = int(headers.get('x-ratelimit-remaining', 0)) 
        
               if exc.code == 403 and limit_remaining < 1: 
        
                   # The X-RateLimit-Reset header includes a 
        
                   # timestamp telling us when the limit will reset 
        
                   # so we can calculate how long to wait rather 
        
                   # than inefficiently polling: 
        
                   gm_now = calendar.timegm(time.gmtime()) 
        
                   reset = int(headers.get('x-ratelimit-reset', 0)) or gm_now 
        
                   # We'll never sleep for less than 10 seconds: 
        
                   delta = max(10, reset - gm_now) 
        
                   limit = headers.get('x-ratelimit-limit') 
        
                   print('Exceeded rate limit of {} requests; waiting {} seconds to reset'.format(limit, delta),  # noqa 
        
                         file=sys.stderr) 
        
                   if auth is None: 
        
                       print('Hint: Authenticate to raise your GitHub rate limit', 
        
                             file=sys.stderr) 
        
                   time.sleep(delta) 
        
                   should_continue = True 
        
               return errors, should_continue

The strategy could be slightly different.

Counting the HTTP requests: 𝑛
Marking the time of the first request: 𝑡₀ (seconds)
time of the current request: 𝑡𝑐 (seconds)
rate, an optional parameter: rate ≤ 1.38

if 𝑛 > (𝑡𝑐 - 𝑡₀) × rate : 
   wait 1 sec before next request

eht16 · 2020-04-13T21:11:05Z

I've created a very simple throttling approach in #149.
This is not very clever and it simply pause API requests a fixed amount of seconds but it helps to stay within the rate limits.
My use case is: the GitHub API user used for the backup is also used elsewhere. It doesn't matter how long the backup takes as long as there are a few API requests left for the other uses.

garymoon · 2020-05-29T19:53:14Z

I am successfully using @eht16's throttling (💙) to keep below the rate limit when backing up very large orgs. I'm using --throttle-limit 5000 --throttle-pause 0.6 but YMMV. IMO @eth16's work should close this issue 👍

8h2a mentioned this issue Jul 3, 2019

retrieve_data cannot be filtered or stopped before it has fetched everything #119

Closed

karlcow mentioned this issue Oct 15, 2019

2020Q1 - 0.4 Create image upload service (for eventual cloud migration) mozilla/webcompat-team-okrs#108

Closed

josegonzalez closed this as completed May 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to prevent hitting API call rate limit #76

How to prevent hitting API call rate limit #76

PinkShellos commented Dec 26, 2017

josegonzalez commented Dec 26, 2017

karlcow commented Oct 15, 2019

eht16 commented Apr 13, 2020

garymoon commented May 29, 2020

How to prevent hitting API call rate limit #76

How to prevent hitting API call rate limit #76

Comments

PinkShellos commented Dec 26, 2017

josegonzalez commented Dec 26, 2017

karlcow commented Oct 15, 2019

eht16 commented Apr 13, 2020

garymoon commented May 29, 2020