Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spanner: Ensure client backoff is implemented to respect client configuration. #2062

Closed
crwilcox opened this issue Jun 21, 2019 · 13 comments
Closed
Assignees
Labels
api: spanner Issues related to the Spanner API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@crwilcox
Copy link

There was an issue in the client library for python where requests to retry could occur without a proper retry object. In this case, rather than defaulting to an exponential backoff, we defaulted to immediate retry. Ensure that for this client library we respect the min, max, and multiplier as specified in the gapic configuration.

Original Bug: googleapis/google-cloud-python#7303

@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Jun 22, 2019
@jdpedrie jdpedrie added api: spanner Issues related to the Spanner API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed triage me I really want to be triaged. labels Jun 26, 2019
@andrewinc
Copy link

andrewinc commented Sep 12, 2019

@crwilcox, @dwsupplee, I believe this feature is already implemented in here and testing here, so, I think, this issue could be closed

@skuruppu skuruppu assigned AVaksman and unassigned jdpedrie Jan 16, 2020
@ava12
Copy link
Contributor

ava12 commented Jan 22, 2020

Apparently all gRPC requests are handled by Google\Cloud\Core\GrpcRequestWrapper, which wraps them with Google\Cloud\Core\ExponentialBackoff. There are no min/max/multiplier options for backoff. The only configurable parameter is maximal number of retries (default is 3). Methods execute and read of Google\Cloud\Spanner\Operation always use default value, in other cases option retries passed to Google\Cloud\Spanner\SpannerClient constructor is respected.

@ava12
Copy link
Contributor

ava12 commented Jan 29, 2020

I found that requested feature is implemented (and gapic configuration is respected), but for some reason it is disabled by default.

All gRPC methods (except for ExecuteStreamingSql and StreamingRead) are guarded by Google\ApiCore\Middleware\RetryMiddleware, all min, max, and multiplier settings are respected, default timeout and retry settings are read from Spanner/src/V1/resources/spanner_client_config.json.

Requests are handled with Google\Cloud\Core\GrpcRequestWrapper#send() method. This method intentionally disables gapic backoffs (setting retriesEnabled option to false) if no retry settings are passed. It looks like a hack, and I can't understand why it is there - maybe it's a workaround?

This behaviour can be overridden by passing gRPC retry settings (may be an empty array) to SpannerClient constructor or client/instance/database/etc method calls. Example:

$options = [
    'grpcOptions' => [
        'retrySettings' => [],
    ],
];
$spanner = new SpannerClient($options);
// gapic backoffs will be used

Works fine, but looks messy. Better solution is using this trick in SpannerClient constructor (still looks hacky). This doesn't break anything (at least unit and system tests pass).

@skuruppu
Copy link
Contributor

I found that requested feature is implemented (and gapic configuration is respected), but for some reason it is disabled by default.

All gRPC methods (except for ExecuteStreamingSql and StreamingRead) are guarded by Google\ApiCore\Middleware\RetryMiddleware, all min, max, and multiplier settings are respected, default timeout and retry settings are read from Spanner/src/V1/resources/spanner_client_config.json.

Requests are handled with Google\Cloud\Core\GrpcRequestWrapper#send() method. This method intentionally disables gapic backoffs (setting retriesEnabled option to false) if no retry settings are passed. It looks like a hack, and I can't understand why it is there - maybe it's a workaround?

This behaviour can be overridden by passing gRPC retry settings (may be an empty array) to SpannerClient constructor or client/instance/database/etc method calls. Example:

$options = [
    'grpcOptions' => [
        'retrySettings' => [],
    ],
];
$spanner = new SpannerClient($options);
// gapic backoffs will be used

Works fine, but looks messy. Better solution is using this trick in SpannerClient constructor (still looks hacky). This doesn't break anything (at least unit and system tests pass).

@ava12 thanks for this interesting finding.

@dwsupplee do you know of the history of why reading the gapic configuration for client backoff settings is disabled by default in PHP? Is there another way to enable it than the way @ava12 suggested? Thanks for your help.

@dwsupplee
Copy link
Contributor

@skuruppu Yes, I have some context. Story time!

Originally when this repository started we were building custom clients around some of our biggest services, like Storage, BigQuery, and PubSub. We built these clients to use HTTP 1.1 and a generic retry strategy which was recommended at the time. As time went on, generated clients (GAPICs) started to come into the picture. These clients introduced a more granular approach to retry mechanisms as well as support for gRPC. Our goal was to help keep things simple for users and prevent the need for breaking changes. As such, we opted for keeping the original retry mechanisms we had in place for our original HTTP 1.1 clients and to abstract the concept of gRPC away under the hood. As found here, we allowed the ability to still have granular control over the gRPC retry mechanisms, but it is a bit hidden.

To answer your question about another way to enable the retry mechanisms, at the moment - no. But as we've discussed before - we are in the planning stages for a "2.0" version of this library which will help simplify concepts across the board and reduce the need for some of these legacy solutions we've built up over the years.

@skuruppu
Copy link
Contributor

@skuruppu Yes, I have some context. Story time!

Originally when this repository started we were building custom clients around some of our biggest services, like Storage, BigQuery, and PubSub. We built these clients to use HTTP 1.1 and a generic retry strategy which was recommended at the time. As time went on, generated clients (GAPICs) started to come into the picture. These clients introduced a more granular approach to retry mechanisms as well as support for gRPC. Our goal was to help keep things simple for users and prevent the need for breaking changes. As such, we opted for keeping the original retry mechanisms we had in place for our original HTTP 1.1 clients and to abstract the concept of gRPC away under the hood. As found here, we allowed the ability to still have granular control over the gRPC retry mechanisms, but it is a bit hidden.

To answer your question about another way to enable the retry mechanisms, at the moment - no. But as we've discussed before - we are in the planning stages for a "2.0" version of this library which will help simplify concepts across the board and reduce the need for some of these legacy solutions we've built up over the years.

Wow, cool, thanks for the context @dwsupplee.

I would personally be happy to go with the solution where the options are set inside the constructor (keeping in mind that users can override these) so that users don't have to apply the trick themselves. @dwsupplee would this be considered a breaking change? Would it be better for us to wait for "2.0" instead?

@dwsupplee
Copy link
Contributor

I would personally be happy to go with the solution where the options are set inside the constructor (keeping in mind that users can override these) so that users don't have to apply the trick themselves.

If we make this change in the SpannerClient constructor it would effectively change the retry strategy used for all the calls in the library. Unfortunately, this could have adverse affects on existing code bases.

I am in the camp that leans towards waiting for 2.0, but do understand the need for flexibility here. What if we kept defaulting to what I'll refer to as the "legacy" retry mechanism, but exposed a flag in the constructor which makes it easy to opt in to using the strategy exposed by the GAPICs? This could allow us to introduce a clean (and clearly documented) way for users to opt-in to using the more granular retry strategy without needing to force everyone in to the change. Let me know what you think.

@skuruppu
Copy link
Contributor

I am in the camp that leans towards waiting for 2.0, but do understand the need for flexibility here. What if we kept defaulting to what I'll refer to as the "legacy" retry mechanism, but exposed a flag in the constructor which makes it easy to opt in to using the strategy exposed by the GAPICs? This could allow us to introduce a clean (and clearly documented) way for users to opt-in to using the more granular retry strategy without needing to force everyone in to the change. Let me know what you think.

I think that's a reasonable compromise. Thanks for the suggestion.

@ava12, would you be able to implement @dwsupplee's suggestion?

@ava12
Copy link
Contributor

ava12 commented Jan 30, 2020

Adding a flag to $config array that makes SpannerClient constructor use described trick - yes, I'll do that.

ava12 added a commit to MaxxleLLC/google-cloud-php that referenced this issue Feb 12, 2020
dwsupplee pushed a commit that referenced this issue May 6, 2020
* feat(spanner): add option to use GAPIC backoff strategy (#2062)

* chore: GAPIC backoffs

refactor: get rid of Reflection
test: new flag does not interfere with custom gRPC options

* chore: code style fixes

* chore: skip gRPC tests

* refactor: GAPIC backoff option

- rename useGapicBackoffs -> useDiscreteBackoffs
- disable legacy retries when GAPIC backoffs used

* chore: add copyright notice
@ava12 ava12 closed this as completed May 28, 2020
@larkee
Copy link
Contributor

larkee commented Jul 14, 2020

@dwsupplee Is there a timeline for the "2.0" version of this library? Does "this library" mean the whole PHP library or just Cloud Spanner PHP? Is there anything the Spanner team can do? We're working on making all the Spanner client libraries consistent in regards to retries and timeouts.

@hengfengli
Copy link
Contributor

@dwsupplee I just would like to follow up with @larkee's questions in the previous message. If you get a chance, could you please reply to them? Thanks.

@dwsupplee
Copy link
Contributor

We would be looking to do a major bump across all our libraries, with a rough date of Q3/Q4. Part of this will be removing the portions of the connection layer that currently act as an intermediary between the GAPICs and veneer surface, meaning the GAPIC retries will become the default. Does that seem to suite you and the team?

@hengfengli
Copy link
Contributor

Yes, sounds good. Our team just wants to know the timeline. Thanks a lot for the update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the Spanner API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

10 participants