Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retries for Access Token Request #19

Merged
merged 2 commits into from
Jun 25, 2024

Conversation

muhammadali286
Copy link

@muhammadali286 muhammadali286 commented Jun 11, 2024

Added retries to avoid 403 response because lms access token has rate limit of 120/m.
https://edlyio.atlassian.net/browse/EDLY-6810

@muhammadali286 muhammadali286 self-assigned this Jun 11, 2024
@muhammadali286 muhammadali286 changed the title Retries for Access Token Retries for Access Token Request Jun 14, 2024
@muhammadali286 muhammadali286 force-pushed the ali_sr/EDLY-6810-access-token branch from 832933f to c826bb1 Compare June 25, 2024 05:53
@muhammadali286 muhammadali286 marked this pull request as ready for review June 25, 2024 05:54
@@ -175,7 +177,7 @@ def oauth2_client_secret(self):
@property
def user_api_url(self):
return '{}/api/user/v1/'.format(self.lms_url_root.strip('/'))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kindly remove

Copy link

@Anas-hameed Anas-hameed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@muhammadali286 muhammadali286 merged commit 810eb07 into develop-koa Jun 25, 2024
@muhammadali286 muhammadali286 deleted the ali_sr/EDLY-6810-access-token branch June 25, 2024 06:17
expires = (expiration_datetime - datetime.datetime.utcnow()).seconds
cache.set(key, access_token, expires)
return access_token
except Exception as e:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not be catching Exception here, for the following reasons:

  1. This is bad practice.
  2. If we catch a (for example) ZeroDivisionError, then we'll have an infinite loop, as attempt will not be incremented.

The solution is to catch only the error that you are interested in (HTTPError, probably).

cache.set(key, access_token, expires)
return access_token
except Exception as e:
log.info(f'Getting exception wile getting token {e}')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not be using log.info, but log.exception(e).

return access_token
except Exception as e:
log.info(f'Getting exception wile getting token {e}')
if isinstance(e.response, Response) and e.response.status_code == 403:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will crash if e does not have a response attribute.

log.info(f'Getting exception wile getting token {e}')
if isinstance(e.response, Response) and e.response.status_code == 403:
attempt += 1
if attempt > len(retries):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part that is the most problematic. We are facing a problem of rate limiting, and we are fixing it with retries? This will only increase the problem. If anything, we are guaranteed to face the problem again in the future, as the number of client increases. The "right" fix is to increase the rate limit value for that client.

Copy link
Author

@muhammadali286 muhammadali286 Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All suggestions are really good and we can address most of them. Thank you for pointing them out.
As per our discussion with @ziafazal and @taimoor-ahmed-1, unfortunately we can not increase the rate limit since its already 120/m.

Issue happens when jobs run in parallel and it reaches the rate limit. So we have decided to optimize the jobs; add token cache in jobs and improve the logic so the there should be less requests for token.
This job already has cache also it is hitting token API only once for one site.

Another suggestion was to create a universal token for worker which can be used for every site, @taimoor-ahmed-1 was looking into the feasibility of that.

Once all jobs are optimized and shipped, we shouldn't be getting an issue


seconds = retries[attempt - 1]
log.info(f'Retrying attempt no: {attempt} for access token request after seconds: {seconds} because of exceptoin {e}')
time.sleep(seconds)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any time.sleep call in a web application should be a huge red flag. We should really avoid that.

muhammadali286 added a commit that referenced this pull request Oct 25, 2024
muhammadali286 added a commit that referenced this pull request Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants