Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kinesis LimitExceededException errors are not retried #1376

Closed
cbroglie opened this issue Jun 27, 2017 · 9 comments · Fixed by #2751 or #2759
Closed

Kinesis LimitExceededException errors are not retried #1376

cbroglie opened this issue Jun 27, 2017 · 9 comments · Fixed by #2751 or #2759
Labels
feature-request A feature should be added or improved.

Comments

@cbroglie
Copy link

cbroglie commented Jun 27, 2017

What issue did you see?

Calls to kinesis.DescribeStream can get throttled, returning LimitExceededException. However, since #1276, these calls are not retried.

The root of the problem appears to be inconsistent use of LimitExceededException by different AWS services. In the case of DynamoDB, this exception indicates that a hard limit was reached, and the request should not be retried (see #1271). But in the case of Kinesis, this is used for throttling describe calls, and it should definitely be retried (with back off).

I'd like to see a custom retryer be added to the kinesis package to retry these exceptions. If that sounds reasonable, I'm happy to submit a PR.

Steps to reproduce

See https://gist.github.com/cbroglie/f790f0df0df342de711402429f69d6ac

@jasdel jasdel added the service-api This issue is due to a problem in a service API, not the SDK implementation. label Jun 27, 2017
@jasdel
Copy link
Contributor

jasdel commented Jun 27, 2017

Hi @cbroglie Thanks for reaching out to us about this issue. You're correct the ambiguity of this exception prevented the SDK from retrying this error on a global bases. Also an additional ready this exception was removed is none of the other SDKs retry this exception for the reasons mentioned.

From reading the description of the DescribeStream documentation suggests that this exception is used for both too many resources used, and too many concurrent requests for a resource. I'll reach out to the Kinesis team also to get their feedback on the retryability of this exception in the context of the DescribeStream API call.

Alternatively the following request handler could be added to the Kinesis service client that will enabling retrying of LimitExceededException.

sess := session.Must(session.NewSession())

svc := kinesis.New(sess)
svc.Handler.Retry.PushBack(func(r *request.Request) {
    if r.Operation.Name != "DescribeStream" {
        return
    }
    if aerr, ok := r.Error.(awserr.Error); !ok || aerr == nil {
        return
    }
    if aerr.Code() == kinesis.ErrCodeLimitExceededException {
        r.Retryable = aws.Bool(true)
    }
})

// Use the Kinesis service client.

@cbroglie
Copy link
Author

Thanks @jasdel. I'm encountering this issue in the context of running Terraform, and I can add the custom retrier you suggested there if necessary. I just want to confirm whether or not it can be addressed by the SDK first, so please let me know when you hear back from the service team.

@cbroglie
Copy link
Author

cbroglie commented Jul 5, 2017

@jasdel Any updates on this?

cbroglie pushed a commit to cbroglie/terraform that referenced this issue Jul 5, 2017
Ideally the AWS SDK would automatically retry these transient throttling
exceptions, but it is currently unable to b/c the service is incorrectly
using LimitExceededException for throttling.

See aws/aws-sdk-go#1376 for additional
details.
@jasdel
Copy link
Contributor

jasdel commented Jul 6, 2017

Hi @cbroglie Thanks for getting back with us. I've talked about this with other SDKs and service team and the ambiguity of this exception makes it risking for the SDK to retry directly. I think performing this logic directly by the component using the SDK will have the most insight into the context of the request, and if it should be retryable.

cbroglie pushed a commit to cbroglie/terraform-provider-aws that referenced this issue Jul 7, 2017
Ideally the AWS SDK would automatically retry these transient throttling
exceptions, but it is currently unable to b/c the service is incorrectly
using LimitExceededException for throttling.

See aws/aws-sdk-go#1376 for additional
details.
@cbroglie
Copy link
Author

cbroglie commented Jul 7, 2017

@jasdel Thanks for following up. I understand the technical reasons why it's hard for the SDK to solve this generically, and as you can see from the linked PRs, it was fairly trivial to address in Terraform with your sample code.

That said, this isn't a fully satisfying outcome. As a user of the SDK I would like it to be authoritative about which errors should be retried, and encapsulate this centrally, rather than having to maintain a bunch of customizations that need to be ported to each each application. I feel like users would be better served by including these customizations in the SDK.

@jasdel
Copy link
Contributor

jasdel commented Jul 7, 2017

Thanks for the feedback @cbroglie. We're investigating how the SDKs can better determine in what circumstances a request can be retried. Currently this is mostly inconsistent tribal knowledge per SDK. We're working with the AWS service teams to provide a better way retries to be more definitive, explicit, and service driven.

@cbroglie
Copy link
Author

cbroglie commented Jul 7, 2017

Sounds good. Feel free to close out this issue if you'd like.

@jasdel
Copy link
Contributor

jasdel commented Jul 7, 2017

Thanks, I'll close this. Let us know if you run into additional issues, have feedback, or feature requests for the SDK.

@jasdel jasdel closed this as completed Jul 7, 2017
catsby pushed a commit to hashicorp/terraform-provider-aws that referenced this issue Jul 27, 2017
Ideally the AWS SDK would automatically retry these transient throttling
exceptions, but it is currently unable to b/c the service is incorrectly
using LimitExceededException for throttling.

See aws/aws-sdk-go#1376 for additional
details.
@jasdel
Copy link
Contributor

jasdel commented Mar 19, 2019

Opening this issue back up due to #2530. The SDK should be able to support applying retry behavior to specific AWS Services. Its possible that the best solution will be to implement this flexible behavior in the V2 SDK, but it would be good to investigate what options we have for implementing per service retry behavior in the v1 SDK.

@jasdel jasdel reopened this Mar 19, 2019
@jasdel jasdel added feature-request A feature should be added or improved. and removed service-api This issue is due to a problem in a service API, not the SDK implementation. labels Mar 19, 2019
jasdel added a commit to jasdel/aws-sdk-go that referenced this issue Aug 8, 2019
Reorganizes the SDK's retry utilities to follow a consistent code path.
Request.IsErrorRetryable is the primary entry pointer for determining if
a request error should be retryable instead of split between
Request.Send and DefaultRetryer calling IsErrorRetryable. This also
gives the implementation of the Retryer interface consistent control
over when a request will be retried.

Also adds support for request and service specific API error retry and
throttling to be enabled by specifying API error codes before a request
is sent. The "RetryCodes" and "ThrottleCodes" members were added to
request.Request struct. These are used by the Request's IsErrorRetryable
and IsErrorThrottle respectively to specify additional API error codes
that identify the failed request attempt as needing to be retried or
throttled. The DefaultRetryer uses both of these methods when
determining if a failed request attempt should be retried.

Related to aws#1376
jasdel added a commit to jasdel/aws-sdk-go that referenced this issue Aug 8, 2019
Reorganizes the SDK's retry utilities to follow a consistent code path.
Request.IsErrorRetryable is the primary entry pointer for determining if
a request error should be retryable instead of split between
Request.Send and DefaultRetryer calling IsErrorRetryable. This also
gives the implementation of the Retryer interface consistent control
over when a request will be retried.

Also adds support for request and service specific API error retry and
throttling to be enabled by specifying API error codes before a request
is sent. The "RetryCodes" and "ThrottleCodes" members were added to
request.Request struct. These are used by the Request's IsErrorRetryable
and IsErrorThrottle respectively to specify additional API error codes
that identify the failed request attempt as needing to be retried or
throttled. The DefaultRetryer uses both of these methods when
determining if a failed request attempt should be retried.

Related to aws#1376
jasdel added a commit to jasdel/aws-sdk-go that referenced this issue Aug 9, 2019
Reorganizes the SDK's retry utilities to follow a consistent code path.
Request.IsErrorRetryable is the primary entry pointer for determining if
a request error should be retryable instead of split between
Request.Send and DefaultRetryer calling IsErrorRetryable. This also
gives the implementation of the Retryer interface consistent control
over when a request will be retried.

Also adds support for request and service specific API error retry and
throttling to be enabled by specifying API error codes before a request
is sent. The "RetryCodes" and "ThrottleCodes" members were added to
request.Request struct. These are used by the Request's IsErrorRetryable
and IsErrorThrottle respectively to specify additional API error codes
that identify the failed request attempt as needing to be retried or
throttled. The DefaultRetryer uses both of these methods when
determining if a failed request attempt should be retried.

Related to aws#1376
jasdel added a commit to jasdel/aws-sdk-go that referenced this issue Aug 9, 2019
Reorganizes the SDK's retry utilities to follow a consistent code path.
Request.IsErrorRetryable is the primary entry pointer for determining if
a request error should be retryable instead of split between
Request.Send and DefaultRetryer calling IsErrorRetryable. This also
gives the implementation of the Retryer interface consistent control
over when a request will be retried.

Also adds support for request and service specific API error retry and
throttling to be enabled by specifying API error codes before a request
is sent. The "RetryCodes" and "ThrottleCodes" members were added to
request.Request struct. These are used by the Request's IsErrorRetryable
and IsErrorThrottle respectively to specify additional API error codes
that identify the failed request attempt as needing to be retried or
throttled. The DefaultRetryer uses both of these methods when
determining if a failed request attempt should be retried.

Related to aws#1376
jasdel added a commit that referenced this issue Aug 13, 2019
Reorganizes the SDK's retry utilities to follow a consistent code path.
Request.IsErrorRetryable is the primary entry pointer for determining if
a request error should be retryable instead of split between
Request.Send and DefaultRetryer calling IsErrorRetryable. This also
gives the implementation of the Retryer interface consistent control
over when a request will be retried.

Also adds support for request and service specific API error retry and
throttling to be enabled by specifying API error codes before a request
is sent. The "RetryErrorCodes" and "ThrottleErrorCodes" members were added to
request.Request struct. These are used by the Request's IsErrorRetryable
and IsErrorThrottle respectively to specify additional API error codes
that identify the failed request attempt as needing to be retried or
throttled. The DefaultRetryer uses both of these methods when
determining if a failed request attempt should be retried.

Related to #1376
jasdel added a commit to jasdel/aws-sdk-go that referenced this issue Aug 13, 2019
Adds support for retrying the Kinesis API error, "LimitExceededException".

Fix aws#1376
jasdel added a commit that referenced this issue Aug 14, 2019
…#2751)

Adds support for retrying the Kinesis API error, "LimitExceededException".

Fix #1376
bflad added a commit to hashicorp/terraform-provider-aws that referenced this issue Aug 30, 2019
The AWS Go SDK now automatically retries on this error.

References:

- aws/aws-sdk-go#1376
- #1085
- aws/aws-sdk-go#2751
- #9770

Output from acceptance testing:

```
--- PASS: TestAccAWSKinesisStream_createMultipleConcurrentStreams (205.83s)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved.
Projects
None yet
2 participants