Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Credential expired during retry #3408

Open
f400810-freddiemac opened this issue Sep 6, 2022 · 4 comments
Open

Credential expired during retry #3408

f400810-freddiemac opened this issue Sep 6, 2022 · 4 comments
Labels
bug This issue is a bug. p2 This is a standard priority issue

Comments

@f400810-freddiemac
Copy link

f400810-freddiemac commented Sep 6, 2022

Describe the bug

In RetryableStage execute method, the "AwsCredentails" does not attempt to renew if it has expired. Therefore, if a method called with the existing credential is expiring soon, the number of retry is less than intended due to the expiration of the credential.

Expected Behavior

For retry with EqualJitterBackoffStrategy, expect an expired credential will be renew during retry.

Current Behavior

If a request (in our case S3Client.getObject) failed with s retryable Exception and the credential expired between two retry, we got a S3Exception before the retry limit reached.

software.amazon.awssdk.services.s3.model.S3Exception: The provided token has expired. (Service: S3, Status Code: 400, Request ID: 3YWKVBNJPNTXPJX2, Extended Request ID: GkR56xA0r/Ek7zqQdB2ZdP3wqMMhf49HH7hc5N2TAIu47J3HEk6yvSgVNbX7ADuHDy/Irhr2rPQ=)
        at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleErrorResponse(CombinedResponseHandler.java:123)
        at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleResponse(CombinedResponseHandler.java:79)
        at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:59)
        at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:40)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30)
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:73)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:50)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:36)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:64)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:34)
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
        at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56)
        at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:48)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:31)
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
        at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:193)
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:135)
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:161)
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$0(BaseSyncClientHandler.java:84)
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:169)
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:62)
        at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:52)
        at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:62)
        at software.amazon.awssdk.services.s3.DefaultS3Client.getObject(DefaultS3Client.java:4371)
        at com.freddiemac.fe.distributed.computing.grid.aws.S3Bucket.getObject(S3Bucket.java:131)
        at com.freddiemac.fe.distributed.computing.grid.aws.S3Bucket.getTaskOutput(S3Bucket.java:112)
        at com.freddiemac.fe.distributed.computing.grid.aws.Job$Ready.readTask(Job.java:70)
        at com.freddiemac.fe.distributed.computing.grid.aws.Job.readTask(Job.java:257)
        at com.freddiemac.fe.distributed.computing.grid.aws.AwsProcessor.lambda$completeResponse$3(AwsProcessor.java:135)
        at com.freddiemac.fe.distributed.computing.grid.api.CompletionHandler.handle(CompletionHandler.java:223)
        at com.freddiemac.fe.distributed.computing.grid.api.CompletionHandler.lambda$newTaskHandler$18(CompletionHandler.java:213)
        at com.freddiemac.fe.distributed.computing.grid.api.GridClient.lambda$convertToBiFunction$2(GridClient.java:169)
        at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
        at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
        at java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:443)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
        at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

Reproduction Steps

To consistently reproduce the issue, we create our own implementation of ResponseTransformer that always throw RetryableException. And setup retry policy that will retry pass the credential expired (In our case, the credential has an hour of life. So we setup retry policy go over an hour). Then we call S3Client.getObject with our ResponseTransformer implementation. In stead of failed with reaching the retry limit, we got the S3Exception with the provided token has expired.

Possible Solution

For every retry, the request may call AwsCredentialsProvider resolveCredentails to ensure the freshness of the credential

Additional Information/Context

No response

AWS Java SDK version used

2.16.104

JDK version used

1.8.0_181

Operating System and version

Redhat 7.9

@f400810-freddiemac f400810-freddiemac added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Sep 6, 2022
@yasminetalby yasminetalby self-assigned this Sep 7, 2022
@yasminetalby yasminetalby added needs-review This issue or PR needs review from the team. and removed needs-triage This issue or PR still needs to be triaged. labels Sep 7, 2022
@yasminetalby
Copy link

Hello @f400810-freddiemac ,

Thank you very much for your submission.
Could you please provide your credentials configuration? What credential provider are you using while experiencing this behavior?

Best,

Yasmine

@yasminetalby yasminetalby added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. and removed needs-review This issue or PR needs review from the team. labels Sep 13, 2022
@f400810-freddiemac
Copy link
Author

Hi @yasminetalby,

We are using StsAssumeRoleWithSamlCredentialsProvider with Ping Identity as the third party to provide the token. The Ping Identity's token expires in an hour.

Basically, build the StsAssumeRoleWithSamlCredentialsProvider with:
StsAssumeRoleWithSamlCredentialsProvider.builder().stsClient(stsClient).refreshRequest(assumeRoleWithSamlRequestSupplier).build();

where build stsClient with awsStsRegionEndpoint in vpc endpoint format (https://[vpceid].sts.[region].vpce.amazonaws.com) and sdkHttpClientSupploer.get() will return a new UrlConnectionHttpClient as:
StsClient.builder().region(region).httpClient(sdkHttpClientSupplier.get()).credentialsProvider(AnonymousCredentialsProvider.create()).endpointOverride(awsStsRegionEndpoint).build()

and assumeRoleWithSamlRequestSupplier is a Supplier<AssumeRoleWithSamlRequest> which every get() call will retrieve a new Ping Identity's token.

Thanks
f400810-freddiemac

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Sep 13, 2022
@yasminetalby yasminetalby added the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Sep 15, 2022
@yasminetalby
Copy link

Hello @f400810-freddiemac ,

Thank you very much for providing this information.
The behavior you are experiencing is due to the current approach of the SDK to resolve credentials.
In the specific case you describe, this process creates limitation on the retry attempts.
We have added this item to our current backlog.

Thank you very much for your feedback and submission!
I will post an update here once this has been resolved.

Sincerely,

Yasmine

@yasminetalby yasminetalby removed the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Sep 16, 2022
@yasminetalby yasminetalby removed their assignment Sep 26, 2022
@yasminetalby yasminetalby added the p2 This is a standard priority issue label Nov 12, 2022
@steveloughran
Copy link

steveloughran commented Nov 25, 2023

to confirm them:

  1. on a retry aws credentials are not resolved again
  2. the error returned by s3 doesn't include a specific error type we can look for in our own code and retry on, just the text "software.amazon.awssdk.services.s3.model.S3Exception: The provided token has expired. (Service: S3, Status Code: 400)

#2 I can cope with as the s3a connector has effectively given up on aws retries with the v2 move, too problematic as it retries on things like UnknownHostExceptions. But our own error handling needs to know what sdk failures are recoverable, and we assume that 400 isn't.

Is there a specific, stable errorDetail we could use for this?

Created HADOOP-18990. S3A: retry on credential expiry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

3 participants