-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(ingestion/bigquery): user exceeded quota for concurrent project.lists requests #10578
fix(ingestion/bigquery): user exceeded quota for concurrent project.lists requests #10578
Conversation
page_token: Optional[str] = None | ||
while True: | ||
projects_iterator = self.bq_client.list_projects( | ||
max_results=5, page_token=page_token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is max_results=5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this should be 50 as API returns max 50 projects per page. Refer link
BigqueryProject(id=p.project_id, name=p.friendly_name) | ||
for p in projects | ||
] | ||
projects: List[BigqueryProject] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment explaining why this was required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Let me add the comment.
The thing is we are not able to regenerate the error but from doc we got to know two project.lists request is allowed per second and assumed that bigquery client method list_projects is not adding any limit in calling request.
Hence we are trying to add limit in requests per second externally,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup I understand that, but having it as a comment in the code will help future editors of this file
@@ -165,6 +170,9 @@ def get_projects(self) -> List[BigqueryProject]: | |||
page_token = projects_iterator.next_page_token | |||
if page_token is None: | |||
break | |||
# Sleep of 30 sec to make limit of two requests per second | |||
time.sleep(30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a RateLimiter helper class - please use that instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
@@ -157,21 +157,19 @@ def get_projects(self) -> List[BigqueryProject]: | |||
# Also project.list returns max 50 projects per page. | |||
# Assuming list_projects method internally not adding any limit in requests call, | |||
# externally we are adding limit in requests call per second. | |||
rate_limiter = RateLimiter(max_calls=2, period=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rate_limiter = RateLimiter(max_calls=2, period=1) | |
rate_limiter = RateLimiter(max_calls=2, period=60) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quata for projects.list is 2 requests per second not per minute.
I know earlier I added the sleep of 30 sec but it was incorrect.
) | ||
with rate_limiter: | ||
projects_iterator = self.bq_client.list_projects( | ||
max_results=30, page_token=page_token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the max number of results in a single page? can you add a link to the docs here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its 50 projects per page
https://cloud.google.com/bigquery/docs/reference/rest/v2/projects/list#query-parameters
@@ -149,6 +150,10 @@ def get_query_result(self, query: str) -> RowIterator: | |||
return resp.result() | |||
|
|||
def get_projects(self) -> List[BigqueryProject]: | |||
def _should_retry(exc: BaseException) -> bool: | |||
logger.debug(f"Exception reason: {str(exc)}. Type: {type(exc)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment here explaining what we previously tried and how we arrived at this solution
projects: List[BigqueryProject] = [] | ||
page_token: Optional[str] = None | ||
# Bigquery API has limit in calling project.list request i.e. 2 request per second. | ||
# Also project.list returns max 50 projects per page. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a link to the bigquery docs
# Also project.list returns max 50 projects per page. | ||
# Assuming list_projects method internally not adding any limit in requests call, | ||
# externally we are adding limit in requests call per second. | ||
rate_limiter = RateLimiter(max_calls=1, period=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is 1 request per 2 seconds, not 2 requests per second?
page_token=page_token, | ||
retry=retry.Retry(predicate=_should_retry, timeout=20), | ||
) | ||
count = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this more descriptive
BigqueryProject(id=p.project_id, name=p.friendly_name) | ||
) | ||
count += 1 | ||
logger.debug(f"Projects count per request {str(count)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for str
…ists requests (datahub-project#10578) Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
…ists requests (#10578) Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
Checklist