-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass python model timeout to polling instead of retry #766
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpicking, LGTM
@@ -43,7 +45,7 @@ def __init__(self, parsed_model: Dict, credential: BigQueryCredentials) -> None: | |||
self.timeout = self.parsed_model["config"].get( | |||
"timeout", self.credential.job_execution_timeout_seconds or 60 * 60 * 24 | |||
) | |||
self.retry = retry.Retry(maximum=10.0, deadline=self.timeout) | |||
self.retry = retry.Retry(predicate=POLLING_PREDICATE, maximum=10.0, timeout=self.timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to https://googleapis.dev/python/google-api-core/latest/_modules/google/api_core/future/polling.html
predicate=POLLING_PREDICATE,
is already a default so you can skip it
self.retry = retry.Retry(predicate=POLLING_PREDICATE, maximum=10.0, timeout=self.timeout) | |
self.retry = retry.Retry(maximum=10.0, timeout=self.timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My read on this is that if we supply a custom Retry
to polling (i.e. that replaces the DEFAULT_POLLING
retry) it won't necessarily add the default polling predicate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I guess you're right, the DEFAULT_POLLING
is not used anymore then and the retry constructor doesn't have a predicate by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this!!
Completely out of scope for this issue but something slightly related just for providing context, when submitting python job using serverless(batch job instead of a pre-created cluster)
, we currently actually waiting for the serverless Dataproc cluster fully destroyed before reporting model finish. It would be helpful to have dbt continue on executing when the job itself is done. Related issue at #734
@@ -98,7 +100,7 @@ def _submit_dataproc_job(self) -> dataproc_v1.types.jobs.Job: | |||
"job": job, | |||
} | |||
) | |||
response = operation.result(retry=self.retry) | |||
response = operation.result(polling=self.retry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider rename self.retry
to some thing different like self.poll_policy
or self.get_result_policy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, I think result_polling_policy
is the most explicit
resolves #577
Description
GCP async interface offers two configurations that both accept the same type:
retry
andpolling
. The difference is thatretry
configures each individual polling request whilepolling
configures the long running operation (i.e. how long to poll for and how frequently to poll). We have been settingretry
and notpolling
resulting in unexpected behavior in job configuration.Checklist
changie new
to create a changelog entry