-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retries for Access Token Request #19
Conversation
832933f
to
c826bb1
Compare
credentials/apps/core/models.py
Outdated
@@ -175,7 +177,7 @@ def oauth2_client_secret(self): | |||
@property | |||
def user_api_url(self): | |||
return '{}/api/user/v1/'.format(self.lms_url_root.strip('/')) | |||
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kindly remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
expires = (expiration_datetime - datetime.datetime.utcnow()).seconds | ||
cache.set(key, access_token, expires) | ||
return access_token | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not be catching Exception here, for the following reasons:
- This is bad practice.
- If we catch a (for example) ZeroDivisionError, then we'll have an infinite loop, as
attempt
will not be incremented.
The solution is to catch only the error that you are interested in (HTTPError, probably).
cache.set(key, access_token, expires) | ||
return access_token | ||
except Exception as e: | ||
log.info(f'Getting exception wile getting token {e}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not be using log.info, but log.exception(e)
.
return access_token | ||
except Exception as e: | ||
log.info(f'Getting exception wile getting token {e}') | ||
if isinstance(e.response, Response) and e.response.status_code == 403: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will crash if e does not have a response
attribute.
log.info(f'Getting exception wile getting token {e}') | ||
if isinstance(e.response, Response) and e.response.status_code == 403: | ||
attempt += 1 | ||
if attempt > len(retries): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the part that is the most problematic. We are facing a problem of rate limiting, and we are fixing it with retries? This will only increase the problem. If anything, we are guaranteed to face the problem again in the future, as the number of client increases. The "right" fix is to increase the rate limit value for that client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All suggestions are really good and we can address most of them. Thank you for pointing them out.
As per our discussion with @ziafazal and @taimoor-ahmed-1, unfortunately we can not increase the rate limit since its already 120/m.
Issue happens when jobs run in parallel and it reaches the rate limit. So we have decided to optimize the jobs; add token cache in jobs and improve the logic so the there should be less requests for token.
This job already has cache also it is hitting token API only once for one site.
Another suggestion was to create a universal token for worker which can be used for every site, @taimoor-ahmed-1 was looking into the feasibility of that.
Once all jobs are optimized and shipped, we shouldn't be getting an issue
|
||
seconds = retries[attempt - 1] | ||
log.info(f'Retrying attempt no: {attempt} for access token request after seconds: {seconds} because of exceptoin {e}') | ||
time.sleep(seconds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any time.sleep
call in a web application should be a huge red flag. We should really avoid that.
This reverts commit 810eb07.
Added retries to avoid 403 response because lms access token has rate limit of 120/m.
https://edlyio.atlassian.net/browse/EDLY-6810