Skip to content

ApiCallTimeoutException fails to include exception of last failure; loses root cause and so breaks recovery logic of applications. #4738

@steveloughran

Description

@steveloughran

Describe the bug

We see this when the S3 client is trying to use S3Express CreateSession and is configured such that it is doing so many retries (10) that the call times out before the retry limit is reached. Rather than include the underlying exception triggering the retries, a simpler "call timeout out" exception is raised with the suppressed exception of "java.lang.RuntimeException: Task failed."

I believe this is a regression from v1 sdk.

Expected Behavior

ApiCallTimeoutException to include the exception triggering retries internally.

Current Behavior

See HADOOP-19000 for this surfacing connecting to S3Express buckets.

Stack trace on timeouts contains no root cause information.

Caused by: software.amazon.awssdk.core.exception.ApiCallTimeoutException: Client execution did not complete before the specified timeout configuration: 10000 millis
        at software.amazon.awssdk.core.exception.ApiCallTimeoutException$BuilderImpl.build(ApiCallTimeoutException.java:97)
        at software.amazon.awssdk.core.exception.ApiCallTimeoutException.create(ApiCallTimeoutException.java:38)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.generateApiCallTimeoutException(ApiCallTimeoutTrackingStage.java:151)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.handleInterruptedException(ApiCallTimeoutTrackingStage.java:139)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.translatePipelineException(ApiCallTimeoutTrackingStage.java:107)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:62)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32)
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
        at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:224)
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173)
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:80)
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182)
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74)
        at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
        at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53)
        at software.amazon.awssdk.services.s3.DefaultS3Client.createSession(DefaultS3Client.java:1589)
        at software.amazon.awssdk.services.s3.S3Client.createSession(S3Client.java:2505)
        at software.amazon.awssdk.services.s3.internal.s3express.S3ExpressIdentityCache.getCredentials(S3ExpressIdentityCache.java:88)
        at software.amazon.awssdk.services.s3.internal.s3express.S3ExpressIdentityCache.lambda$getCachedCredentials$0(S3ExpressIdentityCache.java:73)
        at software.amazon.awssdk.services.s3.internal.s3express.CachedS3ExpressCredentials.refreshResult(CachedS3ExpressCredentials.java:91)
        at software.amazon.awssdk.services.s3.internal.s3express.CachedS3ExpressCredentials.lambda$new$0(CachedS3ExpressCredentials.java:70)
        at software.amazon.awssdk.utils.cache.CachedSupplier.lambda$jitteredPrefetchValueSupplier$8(CachedSupplier.java:300)
        at software.amazon.awssdk.utils.cache.NonBlocking.fetch(NonBlocking.java:151)
        at software.amazon.awssdk.utils.cache.CachedSupplier.refreshCache(CachedSupplier.java:208)
        at software.amazon.awssdk.utils.cache.CachedSupplier.get(CachedSupplier.java:135)
        at software.amazon.awssdk.services.s3.internal.s3express.CachedS3ExpressCredentials.get(CachedS3ExpressCredentials.java:85)
        at software.amazon.awssdk.services.s3.internal.s3express.S3ExpressIdentityCache.get(S3ExpressIdentityCache.java:61)
        at software.amazon.awssdk.services.s3.internal.s3express.DefaultS3ExpressIdentityProvider.lambda$resolveIdentity$0(DefaultS3ExpressIdentityProvider.java:56)
        at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
        at java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:628)
        at java.util.concurrent.CompletableFuture.thenApply(CompletableFuture.java:1996)
        at software.amazon.awssdk.services.s3.internal.s3express.DefaultS3ExpressIdentityProvider.resolveIdentity(DefaultS3ExpressIdentityProvider.java:49)
        at software.amazon.awssdk.services.s3.auth.scheme.internal.S3AuthSchemeInterceptor.trySelectAuthScheme(S3AuthSchemeInterceptor.java:142)
        at software.amazon.awssdk.services.s3.auth.scheme.internal.S3AuthSchemeInterceptor.selectAuthScheme(S3AuthSchemeInterceptor.java:81)
        at software.amazon.awssdk.services.s3.auth.scheme.internal.S3AuthSchemeInterceptor.beforeExecution(S3AuthSchemeInterceptor.java:61)
        at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.lambda$beforeExecution$1(ExecutionInterceptorChain.java:62)
        at java.util.ArrayList.forEach(ArrayList.java:1259)
        at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.beforeExecution(ExecutionInterceptorChain.java:62)
        at software.amazon.awssdk.awscore.internal.AwsExecutionContextBuilder.runInitialInterceptors(AwsExecutionContextBuilder.java:239)
        at software.amazon.awssdk.awscore.internal.AwsExecutionContextBuilder.invokeInterceptorsAndCreateExecutionContext(AwsExecutionContextBuilder.java:130)
        at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.invokeInterceptorsAndCreateExecutionContext(AwsSyncClientHandler.java:67)
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:76)
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182)
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74)
        at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
        at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53)
        at software.amazon.awssdk.services.s3.DefaultS3Client.listObjectsV2(DefaultS3Client.java:7323)
        at software.amazon.awssdk.services.s3.DelegatingS3Client.lambda$listObjectsV2$63(DelegatingS3Client.java:5856)
        at software.amazon.awssdk.services.s3.internal.crossregion.S3CrossRegionSyncClient.invokeOperation(S3CrossRegionSyncClient.java:73)
        at software.amazon.awssdk.services.s3.DelegatingS3Client.listObjectsV2(DelegatingS3Client.java:5856)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$13(S3AFileSystem.java:2963)
        at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547)
        at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528)
        at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468)
        ... 8 more
        Suppressed: java.lang.RuntimeException: Task failed.
                at software.amazon.awssdk.utils.CompletableFutureUtils.joinLikeSync(CompletableFutureUtils.java:254)
                at software.amazon.awssdk.auth.signer.AwsSignerExecutionAttribute.awsCredentialsReadMapping(AwsSignerExecutionAttribute.java:201)
                at software.amazon.awssdk.core.interceptor.ExecutionAttribute$DerivationValueStorage.get(ExecutionAttribute.java:260)
                at software.amazon.awssdk.core.interceptor.ExecutionAttributes.getAttribute(ExecutionAttributes.java:53)
                at software.amazon.awssdk.core.interceptor.ExecutionAttributes.getOptionalAttribute(ExecutionAttributes.java:68)
                at software.amazon.awssdk.awscore.internal.AwsExecutionContextBuilder.invokeInterceptorsAndCreateExecutionContext(AwsExecutionContextBuilder.java:144)
                ... 22 more

Reproduction Steps

  • set retries to longer than api timeout
  • try to an operation which fail with a retryable exception (connectivity etc)
  • wait for failure

Possible Solution

We really need that underlying exception for our own decision-making about what to do next. I fear we are going to have to change the S3A retry policies so that we have special handling for the first failure of any S3 operation on the basis that this is a configuration problem that retries will not recover from. Thanks will add the overhead of a needless S3 call.
Would it be possible to do something like save the innermost exception and add it as the root cause when throwing an ApiCallTimeoutException?

Additional Information/Context

No response

AWS Java SDK version used

2.21.33

JDK version used

openjdk version "1.8.0_362" OpenJDK Runtime Environment (Zulu 8.68.0.21-CA-macos-aarch64) (build 1.8.0_362-b09) OpenJDK 64-Bit Server VM (Zulu 8.68.0.21-CA-macos-aarch64) (build 25.362-b09, mixed mode)

Operating System and version

macos 13.4.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugThis issue is a bug.p2This is a standard priority issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions