Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery: 'QueryJob.result' raises 500, even with 'retry' passed when creating the job #6301

Closed
bencaine1 opened this issue Oct 25, 2018 · 8 comments
Assignees
Labels
api: bigquery Issues related to the BigQuery API. type: question Request for information or clarification. Not an issue.

Comments

@bencaine1
Copy link

OS: Linux dc32b7e8763a 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 x86_64 x86_64 GNU/Linux
Python version: Python 2.7.6
google-cloud-bigquery: 1.5.0

My code:

from google.cloud import bigquery
from google.cloud.bigquery.retry import DEFAULT_RETRY

bigquery.Client(project=project_id)
query_job = client.query(query, job_config=config, retry=DEFAULT_RETRY)
query_job.result()

I'm getting stack traces like:

...
  File "/opt/conda/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 2643, in result
    super(QueryJob, self).result(timeout=timeout)
  File "/opt/conda/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 688, in result
    return super(_AsyncJob, self).result(timeout=timeout)
  File "/opt/conda/lib/python2.7/site-packages/google/api_core/future/polling.py", line 120, in result
    raise self._exception
InternalServerError: 500 Error encountered during execution. Retrying may solve the problem.

Could it be because PollingFuture doesn't take a Retry object (per this TODO)?

@tseaver tseaver added type: question Request for information or clarification. Not an issue. api: bigquery Issues related to the BigQuery API. labels Oct 25, 2018
@tseaver tseaver changed the title Still getting 500s using Client.query method, even though Retry was passed in BigQuery: 'QueryJob.result' raises 500, even with 'retry' passed when creating the job Oct 25, 2018
@tseaver
Copy link
Contributor

tseaver commented Oct 25, 2018

@bencaine1 that comment indicates that the caller cannot pass in a retry argument to QueryJob.result. The _AsyncJob base class inherits from google.api_core.future.polling.PollingFuture, which takes a retry constructor argument we aren't passing: it therefore uses a retry which only handles its private _OperationNotComplete exception.

As a workaround, you might be able to just replace the job._retry attribute with a more robust one. E.g.:

from google.api_core.future.polling import _OperationNotComplete
from google.api_core.retry import Retry
from google.cloud import bigquery
from google.cloud.bigquery.retry import DEFAULT_RETRY
from google.cloud.bigquery.retry import _should_retry

def _predicate(exc):
      return isinstance(exc, _OperationNotComplete) or _should_retry(exc)

result_retry = Retry(predicate=_predicate)

bigquery.Client(project=project_id)
query_job = client.query(query, job_config=config, retry=DEFAULT_RETRY)
query_job.retry = result_retry
query_job.result()

@bencaine1
Copy link
Author

Thanks so much for the fix! I'm still confused as to why there are two places where you could specify a retry. What does the retry in client.query() do?

@tseaver
Copy link
Contributor

tseaver commented Oct 25, 2018

client.query() creates the actual server-side job (i.e., makes the jobs.insert API call), and then returns it. The retry parameter passed there is used for that API call.

query_job.result() actually polls the job until it is done, and fetches the results. For the polling, it uses the _retry attribute defined in google.api_core.future.polling.PollingFuture. #6305 extends the set of errors used by that retry object.

@bencaine1
Copy link
Author

Thanks for the clarification! And thanks for working on the fix :)

@bencaine1
Copy link
Author

We're starting to get the same flaky 500 errors again.

We've implemented the workaround described here as follows:

from google.api_core.future import polling
from google.cloud import bigquery
from google.cloud.bigquery import retry as bq_retry

bq_client = bigquery.Client('my_project')
query_job = bq_client.query(query, job_config=config, retry=bq_retry.DEFAULT_RETRY)
query_job._retry = polling.DEFAULT_RETRY
query_job.result()

If it were a timeout, it would have thrown a RetryError, so it's probably not a timeout.

Any ideas?

@tseaver
Copy link
Contributor

tseaver commented Mar 11, 2019

@bencaine1 Can you show the traceback for the 500 you are seeing now?

@bencaine1
Copy link
Author

bencaine1 commented Mar 12, 2019

...
  File "/opt/conda/lib/python2.7/site-packages/verily/bigquery_wrapper/bq.py", line 108, in _wait_for_job
    return query_job.result(timeout=max_wait_secs)
  File "/opt/conda/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 2762, in result
    super(QueryJob, self).result(timeout=timeout)
  File "/opt/conda/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 703, in result
    return super(_AsyncJob, self).result(timeout=timeout)
  File "/opt/conda/lib/python2.7/site-packages/google/api_core/future/polling.py", line 127, in result
    raise self._exception
google.api_core.exceptions.InternalServerError: 500 Error encountered during execution. Retrying may solve the problem.

@tseaver
Copy link
Contributor

tseaver commented Mar 12, 2019

I agree that the error is not propagating from the retry bits inside PollingFuture: instead, it looks as though the job has already completed with the error before we ever start polling for it inside the result method: the errorResult field is set in the response returned during query_job._begin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

2 participants