Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default deadline for publish does not match expectations #303

Closed
kamalaboulhosn opened this issue Mar 1, 2021 · 7 comments · Fixed by googleapis/gapic-generator-python#814 or #345
Assignees
Labels
api: pubsub Issues related to the googleapis/python-pubsub API. external This issue is blocked on a bug with the actual product. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@kamalaboulhosn
Copy link
Contributor

The default total deadline for publish operations is supposed to be 60 seconds. However, it actually seems to be 120 seconds. It is easy to test this by starting up python and running the following with internet disabled:

from google.cloud import pubsub_v1
publisher = pubsub_v1.PublisherClient()
project_id = INSERT PROJECT ID
topic_id = INSERT TOPIC ID
topic_path = publisher.topic_path(project_id, topic_id)
data="HI"
future = publisher.publish(topic_path, data.encode("utf-8"))
future.result()

This should exit after 60 seconds. However, it actually exits after 120 seconds with an error as follows:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 73, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/grpc/_channel.py", line 923, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/grpc/_channel.py", line 826, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "failed to connect to all addresses"
	debug_error_string = "{"created":"@1614602541.607973000","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":5397,"referenced_errors":[{"created":"@1614602516.771517000","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":398,"grpc_status":14}]}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/google/api_core/retry.py", line 184, in retry_target
    return target()
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 75, in error_remapped_callable
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "<string>", line 3, in raise_from
google.api_core.exceptions.ServiceUnavailable: 503 failed to connect to all addresses

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/google/cloud/pubsub_v1/publisher/futures.py", line 52, in result
    raise err
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/google/cloud/pubsub_v1/publisher/_batch/thread.py", line 263, in _commit
Failed to publish 1 messages.
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 73, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/grpc/_channel.py", line 923, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/grpc/_channel.py", line 826, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "failed to connect to all addresses"
	debug_error_string = "{"created":"@1614602541.607973000","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":5397,"referenced_errors":[{"created":"@1614602516.771517000","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":398,"grpc_status":14}]}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/google/api_core/retry.py", line 184, in retry_target
    return target()
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 75, in error_remapped_callable
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "<string>", line 3, in raise_from
google.api_core.exceptions.ServiceUnavailable: 503 failed to connect to all addresses

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/google/cloud/pubsub_v1/publisher/_batch/thread.py", line 263, in _commit
    response = self._client.api.publish(
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/google/pubsub_v1/services/publisher/client.py", line 582, in publish
    response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/google/api_core/gapic_v1/method.py", line 145, in __call__
    return wrapped_func(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/google/api_core/retry.py", line 281, in retry_wrapped_func
    return retry_target(
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/google/api_core/retry.py", line 199, in retry_target
    six.raise_from(
  File "<string>", line 3, in raise_from
google.api_core.exceptions.RetryError: Deadline of 120.0s exceeded while calling functools.partial(<function _wrap_unary_errors.<locals>.error_remapped_callable at 0x7f8596277af0>, topic: "projects/<my project>/topics/<my topic>"
messages {
  data: "HI"
}
, metadata=[('x-goog-request-params', 'topic=projects/<my project>/topics/<my topic>'), ('x-goog-api-client', 'gl-python/3.9.1 grpc/1.36.0 gax/1.26.0 gccl/2.4.0')]), last exception: 503 failed to connect to all addresses
    response = self._client.api.publish(
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/google/pubsub_v1/services/publisher/client.py", line 582, in publish
    response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/google/api_core/gapic_v1/method.py", line 145, in __call__
    return wrapped_func(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/google/api_core/retry.py", line 281, in retry_wrapped_func
    return retry_target(
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/google/api_core/retry.py", line 199, in retry_target
    six.raise_from(
  File "<string>", line 3, in raise_from
google.api_core.exceptions.RetryError: Deadline of 120.0s exceeded while calling functools.partial(<function _wrap_unary_errors.<locals>.error_remapped_callable at 0x7f8596277af0>, topic: "projects/<my project>/topics/<my topic>"
messages {
  data: "HI"
}

Where are the default retry settings coming from and what are they? Want to confirm that we are trying requests more than once by default.

@product-auto-label product-auto-label bot added the api: pubsub Issues related to the googleapis/python-pubsub API. label Mar 1, 2021
@kamalaboulhosn kamalaboulhosn added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Mar 1, 2021
@plamut
Copy link
Contributor

plamut commented Mar 1, 2021

I'll check the logs to see how many requests are actually made, and where the total deadline comes from.

@plamut
Copy link
Contributor

plamut commented Mar 1, 2021

This is the current output with logging level set to DEBUG and the log format slightly adjusted from my machine, with internet accidentally enabled (it appears I was nevertheless lucky with a transient error):

INFO     [2021-03-01 15:32:26,902] MainThread                        [__main__] [issue_303.py:36][main] Publishing a message...
DEBUG    [2021-03-01 15:32:26,913] Thread-PubSubBatchCommitter       [google.cloud.pubsub_v1.publisher.client] [client.py:395][_wait_and_commit_sequencers] Commit thread is waking up
DEBUG    [2021-03-01 15:32:26,927] Thread-CommitBatchPublisher       [google.api_core.retry] [retry.py:212][retry_target] Retrying due to 503 failed to connect to all addresses, sleeping 0.1s ...
DEBUG    [2021-03-01 15:32:27,023] Thread-CommitBatchPublisher       [google.api_core.retry] [retry.py:212][retry_target] Retrying due to 503 failed to connect to all addresses, sleeping 0.1s ...
DEBUG    [2021-03-01 15:32:27,104] Thread-CommitBatchPublisher       [google.api_core.retry] [retry.py:212][retry_target] Retrying due to 503 failed to connect to all addresses, sleeping 0.1s ...
DEBUG    [2021-03-01 15:32:27,172] Thread-CommitBatchPublisher       [google.api_core.retry] [retry.py:212][retry_target] Retrying due to 503 failed to connect to all addresses, sleeping 0.1s ...
DEBUG    [2021-03-01 15:32:27,275] Thread-CommitBatchPublisher       [google.api_core.retry] [retry.py:212][retry_target] Retrying due to 503 failed to connect to all addresses, sleeping 0.0s ...
DEBUG    [2021-03-01 15:32:27,291] Thread-CommitBatchPublisher       [google.api_core.retry] [retry.py:212][retry_target] Retrying due to 503 failed to connect to all addresses, sleeping 0.7s ...
DEBUG    [2021-03-01 15:32:27,970] Thread-CommitBatchPublisher       [google.api_core.retry] [retry.py:212][retry_target] Retrying due to 503 failed to connect to all addresses, sleeping 0.5s ...
DEBUG    [2021-03-01 15:32:28,500] Thread-CommitBatchPublisher       [google.api_core.retry] [retry.py:212][retry_target] Retrying due to 503 failed to connect to all addresses, sleeping 1.0s ...
DEBUG    [2021-03-01 15:32:29,469] Thread-CommitBatchPublisher       [google.api_core.retry] [retry.py:212][retry_target] Retrying due to 503 failed to connect to all addresses, sleeping 0.1s ...
DEBUG    [2021-03-01 15:32:29,588] Thread-CommitBatchPublisher       [google.api_core.retry] [retry.py:212][retry_target] Retrying due to 503 failed to connect to all addresses, sleeping 1.0s ...
DEBUG    [2021-03-01 15:32:30,635] Thread-CommitBatchPublisher       [google.api_core.retry] [retry.py:212][retry_target] Retrying due to 503 failed to connect to all addresses, sleeping 2.4s ...
DEBUG    [2021-03-01 15:32:33,044] Thread-CommitBatchPublisher       [google.api_core.retry] [retry.py:212][retry_target] Retrying due to 503 failed to connect to all addresses, sleeping 2.5s ...
DEBUG    [2021-03-01 15:32:35,531] Thread-CommitBatchPublisher       [google.api_core.retry] [retry.py:212][retry_target] Retrying due to 503 failed to connect to all addresses, sleeping 3.8s ...
DEBUG    [2021-03-01 15:32:39,333] Thread-CommitBatchPublisher       [google.api_core.retry] [retry.py:212][retry_target] Retrying due to 503 failed to connect to all addresses, sleeping 0.6s ...
DEBUG    [2021-03-01 15:32:39,914] Thread-CommitBatchPublisher       [google.api_core.retry] [retry.py:212][retry_target] Retrying due to 503 failed to connect to all addresses, sleeping 6.7s ...
DEBUG    [2021-03-01 15:32:46,599] Thread-CommitBatchPublisher       [google.api_core.retry] [retry.py:212][retry_target] Retrying due to 503 failed to connect to all addresses, sleeping 2.9s ...
DEBUG    [2021-03-01 15:32:49,547] Thread-CommitBatchPublisher       [google.api_core.retry] [retry.py:212][retry_target] Retrying due to 503 failed to connect to all addresses, sleeping 4.8s ...
DEBUG    [2021-03-01 15:32:54,368] Thread-CommitBatchPublisher       [google.api_core.retry] [retry.py:212][retry_target] Retrying due to 503 failed to connect to all addresses, sleeping 16.3s ...
DEBUG    [2021-03-01 15:33:16,657] Thread-CommitBatchPublisher       [google.api_core.retry] [retry.py:212][retry_target] Retrying due to 503 failed to connect to all addresses, sleeping 3.4s ...
DEBUG    [2021-03-01 15:33:20,065] Thread-CommitBatchPublisher       [google.api_core.retry] [retry.py:212][retry_target] Retrying due to 503 failed to connect to all addresses, sleeping 18.8s ...
...
Traceback (most recent call last):
 ...
  File "<string>", line 3, in raise_from
google.api_core.exceptions.RetryError: Deadline of 120.0s exceeded while calling functools.partial(<function _wrap_unary_errors.<locals>.error_remapped_callable at 0x7f454f8b5a60>, topic: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
messages {
  data: "HI"
}
, metadata=[('x-goog-request-params', 'topic=XXXXXXXXXXXXXXXXXXXXXXXXX'), ('x-goog-api-client', 'gl-python/3.8.0 grpc/1.33.1 gax/1.26.0 gccl/2.4.0')]), last exception: 503 failed to connect to all addresses

Retries work as expected, but the total deadline is indeed 120.0 seconds.

@plamut plamut added the priority: p2 Moderately-important priority. Fix may not be included in next release. label Mar 1, 2021
@plamut
Copy link
Contributor

plamut commented Mar 1, 2021

Retries are performed by Retry instances from google-api-core. The generated transport layer wraps API methods into these Retry instances (example for publish()), but does not provide an explicit deadline, thus a _DEFAULT_DEADLINE from core is used, which is 120.0 seconds.

It's true that in the code that wraps API methods, a default_timeout=60.0 argument is given, but it's the Retry's deadline that's ultimately used.

While we could patch this in the client library, it's cleaner to update the generator.

Update: Created a gapic generator issue to track this.

@kamalaboulhosn
Copy link
Contributor Author

Does this mean that none of the retry properties one passes in when publishing are taking effect for any publish calls and we are always just using the defaults?

@kamalaboulhosn
Copy link
Contributor Author

Nevermind, I ran an experiment and custom retry settings do take effect as expected.

@plamut
Copy link
Contributor

plamut commented Mar 12, 2021

Yeah, it's only the default timeout when nothing custom is specified. 👍

Since it seems to only require a template change, I will probably just submit the fix to the generator myself.

@kamalaboulhosn Presumably the same default timeout (60 seconds) should appliy to subscriber methods as well? I can check that, too, while at it.

Update: Generator PR, when merged, an autosynth PR will presumably follow in this repo.

@plamut
Copy link
Contributor

plamut commented Mar 20, 2021

Will keep this open until an autosynth PR re-generates the GAPIC library with the generator fix that has now been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsub Issues related to the googleapis/python-pubsub API. external This issue is blocked on a bug with the actual product. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
2 participants