Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retries for Access Token Request #19

Merged
merged 2 commits into from
Jun 25, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 28 additions & 10 deletions credentials/apps/core/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
import datetime
import hashlib
import logging
import time
from requests.models import Response

from django.conf import settings
from django.contrib.auth.models import AbstractUser
Expand Down Expand Up @@ -175,7 +177,7 @@ def oauth2_client_secret(self):
@property
def user_api_url(self):
return '{}/api/user/v1/'.format(self.lms_url_root.strip('/'))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kindly remove

@property
def access_token(self):
""" Returns an access token for this site's service user.
Expand All @@ -192,15 +194,31 @@ def access_token(self):

if not access_token:
url = f'{self.oauth2_provider_url}/access_token'
access_token, expiration_datetime = EdxRestApiClient.get_oauth_access_token(
url,
self.oauth2_client_id,
self.oauth2_client_secret,
token_type='jwt'
)

expires = (expiration_datetime - datetime.datetime.utcnow()).seconds
cache.set(key, access_token, expires)
retries = [15, 30, 45] # Retry delays in seconds
attempt = 0

while attempt < len(retries) + 1:
try:
log.info(f'Feching access token for URL: {url}')
access_token, expiration_datetime = EdxRestApiClient.get_oauth_access_token(
url,
self.oauth2_client_id,
self.oauth2_client_secret,
token_type='jwt'
)
expires = (expiration_datetime - datetime.datetime.utcnow()).seconds
cache.set(key, access_token, expires)
return access_token
except Exception as e:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not be catching Exception here, for the following reasons:

  1. This is bad practice.
  2. If we catch a (for example) ZeroDivisionError, then we'll have an infinite loop, as attempt will not be incremented.

The solution is to catch only the error that you are interested in (HTTPError, probably).

log.info(f'Getting exception wile getting token {e}')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not be using log.info, but log.exception(e).

if isinstance(e.response, Response) and e.response.status_code == 403:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will crash if e does not have a response attribute.

attempt += 1
if attempt > len(retries):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part that is the most problematic. We are facing a problem of rate limiting, and we are fixing it with retries? This will only increase the problem. If anything, we are guaranteed to face the problem again in the future, as the number of client increases. The "right" fix is to increase the rate limit value for that client.

Copy link
Author

@muhammadali286 muhammadali286 Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All suggestions are really good and we can address most of them. Thank you for pointing them out.
As per our discussion with @ziafazal and @taimoor-ahmed-1, unfortunately we can not increase the rate limit since its already 120/m.

Issue happens when jobs run in parallel and it reaches the rate limit. So we have decided to optimize the jobs; add token cache in jobs and improve the logic so the there should be less requests for token.
This job already has cache also it is hitting token API only once for one site.

Another suggestion was to create a universal token for worker which can be used for every site, @taimoor-ahmed-1 was looking into the feasibility of that.

Once all jobs are optimized and shipped, we shouldn't be getting an issue

raise e

seconds = retries[attempt - 1]
log.info(f'Retrying attempt no: {attempt} for access token request after seconds: {seconds} because of exceptoin {e}')
time.sleep(seconds)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any time.sleep call in a web application should be a huge red flag. We should really avoid that.


return access_token

Expand Down