-
Notifications
You must be signed in to change notification settings - Fork 964
Description
Describe the bug
We see this when the S3 client is trying to use S3Express CreateSession and is configured such that it is doing so many retries (10) that the call times out before the retry limit is reached. Rather than include the underlying exception triggering the retries, a simpler "call timeout out" exception is raised with the suppressed exception of "java.lang.RuntimeException: Task failed."
I believe this is a regression from v1 sdk.
Expected Behavior
ApiCallTimeoutException to include the exception triggering retries internally.
Current Behavior
See HADOOP-19000 for this surfacing connecting to S3Express buckets.
Stack trace on timeouts contains no root cause information.
Caused by: software.amazon.awssdk.core.exception.ApiCallTimeoutException: Client execution did not complete before the specified timeout configuration: 10000 millis
at software.amazon.awssdk.core.exception.ApiCallTimeoutException$BuilderImpl.build(ApiCallTimeoutException.java:97)
at software.amazon.awssdk.core.exception.ApiCallTimeoutException.create(ApiCallTimeoutException.java:38)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.generateApiCallTimeoutException(ApiCallTimeoutTrackingStage.java:151)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.handleInterruptedException(ApiCallTimeoutTrackingStage.java:139)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.translatePipelineException(ApiCallTimeoutTrackingStage.java:107)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:62)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:224)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:80)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74)
at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53)
at software.amazon.awssdk.services.s3.DefaultS3Client.createSession(DefaultS3Client.java:1589)
at software.amazon.awssdk.services.s3.S3Client.createSession(S3Client.java:2505)
at software.amazon.awssdk.services.s3.internal.s3express.S3ExpressIdentityCache.getCredentials(S3ExpressIdentityCache.java:88)
at software.amazon.awssdk.services.s3.internal.s3express.S3ExpressIdentityCache.lambda$getCachedCredentials$0(S3ExpressIdentityCache.java:73)
at software.amazon.awssdk.services.s3.internal.s3express.CachedS3ExpressCredentials.refreshResult(CachedS3ExpressCredentials.java:91)
at software.amazon.awssdk.services.s3.internal.s3express.CachedS3ExpressCredentials.lambda$new$0(CachedS3ExpressCredentials.java:70)
at software.amazon.awssdk.utils.cache.CachedSupplier.lambda$jitteredPrefetchValueSupplier$8(CachedSupplier.java:300)
at software.amazon.awssdk.utils.cache.NonBlocking.fetch(NonBlocking.java:151)
at software.amazon.awssdk.utils.cache.CachedSupplier.refreshCache(CachedSupplier.java:208)
at software.amazon.awssdk.utils.cache.CachedSupplier.get(CachedSupplier.java:135)
at software.amazon.awssdk.services.s3.internal.s3express.CachedS3ExpressCredentials.get(CachedS3ExpressCredentials.java:85)
at software.amazon.awssdk.services.s3.internal.s3express.S3ExpressIdentityCache.get(S3ExpressIdentityCache.java:61)
at software.amazon.awssdk.services.s3.internal.s3express.DefaultS3ExpressIdentityProvider.lambda$resolveIdentity$0(DefaultS3ExpressIdentityProvider.java:56)
at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
at java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:628)
at java.util.concurrent.CompletableFuture.thenApply(CompletableFuture.java:1996)
at software.amazon.awssdk.services.s3.internal.s3express.DefaultS3ExpressIdentityProvider.resolveIdentity(DefaultS3ExpressIdentityProvider.java:49)
at software.amazon.awssdk.services.s3.auth.scheme.internal.S3AuthSchemeInterceptor.trySelectAuthScheme(S3AuthSchemeInterceptor.java:142)
at software.amazon.awssdk.services.s3.auth.scheme.internal.S3AuthSchemeInterceptor.selectAuthScheme(S3AuthSchemeInterceptor.java:81)
at software.amazon.awssdk.services.s3.auth.scheme.internal.S3AuthSchemeInterceptor.beforeExecution(S3AuthSchemeInterceptor.java:61)
at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.lambda$beforeExecution$1(ExecutionInterceptorChain.java:62)
at java.util.ArrayList.forEach(ArrayList.java:1259)
at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.beforeExecution(ExecutionInterceptorChain.java:62)
at software.amazon.awssdk.awscore.internal.AwsExecutionContextBuilder.runInitialInterceptors(AwsExecutionContextBuilder.java:239)
at software.amazon.awssdk.awscore.internal.AwsExecutionContextBuilder.invokeInterceptorsAndCreateExecutionContext(AwsExecutionContextBuilder.java:130)
at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.invokeInterceptorsAndCreateExecutionContext(AwsSyncClientHandler.java:67)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:76)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74)
at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53)
at software.amazon.awssdk.services.s3.DefaultS3Client.listObjectsV2(DefaultS3Client.java:7323)
at software.amazon.awssdk.services.s3.DelegatingS3Client.lambda$listObjectsV2$63(DelegatingS3Client.java:5856)
at software.amazon.awssdk.services.s3.internal.crossregion.S3CrossRegionSyncClient.invokeOperation(S3CrossRegionSyncClient.java:73)
at software.amazon.awssdk.services.s3.DelegatingS3Client.listObjectsV2(DelegatingS3Client.java:5856)
at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$13(S3AFileSystem.java:2963)
at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547)
at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468)
... 8 more
Suppressed: java.lang.RuntimeException: Task failed.
at software.amazon.awssdk.utils.CompletableFutureUtils.joinLikeSync(CompletableFutureUtils.java:254)
at software.amazon.awssdk.auth.signer.AwsSignerExecutionAttribute.awsCredentialsReadMapping(AwsSignerExecutionAttribute.java:201)
at software.amazon.awssdk.core.interceptor.ExecutionAttribute$DerivationValueStorage.get(ExecutionAttribute.java:260)
at software.amazon.awssdk.core.interceptor.ExecutionAttributes.getAttribute(ExecutionAttributes.java:53)
at software.amazon.awssdk.core.interceptor.ExecutionAttributes.getOptionalAttribute(ExecutionAttributes.java:68)
at software.amazon.awssdk.awscore.internal.AwsExecutionContextBuilder.invokeInterceptorsAndCreateExecutionContext(AwsExecutionContextBuilder.java:144)
... 22 more
Reproduction Steps
- set retries to longer than api timeout
- try to an operation which fail with a retryable exception (connectivity etc)
- wait for failure
Possible Solution
We really need that underlying exception for our own decision-making about what to do next. I fear we are going to have to change the S3A retry policies so that we have special handling for the first failure of any S3 operation on the basis that this is a configuration problem that retries will not recover from. Thanks will add the overhead of a needless S3 call.
Would it be possible to do something like save the innermost exception and add it as the root cause when throwing an ApiCallTimeoutException?
Additional Information/Context
No response
AWS Java SDK version used
2.21.33
JDK version used
openjdk version "1.8.0_362" OpenJDK Runtime Environment (Zulu 8.68.0.21-CA-macos-aarch64) (build 1.8.0_362-b09) OpenJDK 64-Bit Server VM (Zulu 8.68.0.21-CA-macos-aarch64) (build 25.362-b09, mixed mode)
Operating System and version
macos 13.4.1