SQS Timeouts not firing when using async ReceiveMessageAsync #1275

danielmarbach · 2019-05-03T09:25:07Z

This is a reopening of #609

As part of the above issue, the timeout property has been augmented to indicate that it is the responsibility of the caller to implement a timeout by passing a cancellation token. I do think this behavior is hard to understand due to the following reasons.

When you compare the .NET Core Version with the full framework version then you can see that the timeout property is honored due to the fact that the .NET Core version uses the HttpClient which throws an OperationCanceledException after the default timeout of 100 seconds. For the full framework version, the behavior is different. If you don't pass in a cancellation token that implements the timeout as well as potentially the shutdown SLA for EVERY async call to the SDK calls can indefinitely hang.

So as a consumer of the API I'm essentially forced to do the following

CancellationToken shutdownToken;
using(var linkedTokenSource = CancellationTokenSource.CreateLinkedTokenSource(shutdownToken)) {
    linkedTokenSource.CancelAfter(clientConfig.Timeout);

    await sdk.FooBarAsync(, linkedTokenSource.Token);
}

This is very cumbersome plus code that is framework specific and assumes someone that is using the SDK actually has to understand that the full framework implementation happens to use HttpWebRequest while the .NET Core version doesn't.

Expected Behavior

Full framework version should also honor the client config timeout and time out calls after the specified period and throw an OperationCanceledException

Current Behavior

Calls may hang indefinitely

Possible Solution

See PR

Steps to Reproduce (for bugs)

For example receive from SQS with sqsClient.ReceiveMessagesAsync() with a server timeout of 20 seconds. Pull the cable. Wait a few minutes. Reconnect the cable. Send a new message to the queue and observe how the message is never received because of the sqsClient.ReceiveMessagesAsync() hangs

Context

Particular/NServiceBus.AmazonSQS#296

Your Environment

AWSSDK.Core version used:
Service assembly and version used: AWSSDK.SQS.dll: 3.3.3.11, AWSSDK.Core.dll: 3.3.24.3
Operating System and version:
Visual Studio version:
Targeted .NET platform: .NET 4.5.2

.NET Core Info

Affects full framework

The text was updated successfully, but these errors were encountered:

klaytaybai · 2019-05-06T22:11:49Z

@danielmarbach, thanks for raising this again and submitting the PR. It seems reasonable to me to add this into the SDK, but I'll get the team's opinion.

staylr · 2019-05-30T01:59:16Z

Hi @klaytaybai. We're the downstream user who reported this. Any update?

danielmarbach · 2019-06-12T14:24:17Z

Any progress on this one?

staylr · 2019-07-12T05:04:53Z

@klaytaybai ?

danielmarbach · 2019-08-05T06:22:59Z

I don't want to sound impatient here but I think this issue requires a bit of attention after almost 3 months? // @klaytaybai @normj

danielmarbach · 2019-08-16T05:12:53Z

Thanks!

normj · 2019-08-16T06:16:17Z

No, thank you!

slang25 · 2019-08-16T13:05:18Z

A little late to the party, apologies. This once took down the whole Just Eat platform in a region when the NAT restarted and all SQS listeners were indefinitely awaiting, so this is a big deal, nice one @danielmarbach!

indy-singh · 2020-10-20T18:25:00Z

Is there any reason why OperationCanceledException is thrown here and not a TimeoutException?

Background:

Yesterday we upgraded our AWS libs across the board and today incurred a platform-wide outage; the sqs client has a timeout of 5 seconds, but the queue is long polling at 20 seconds. I get that the scenario where the client timeout is less than queue poll time is invalid (and should always be). But I strongly believe this should have thrown a TimeoutException.

Thanks,
Indy

danielmarbach · 2020-10-20T21:12:04Z

As far as I'm concerned when I fixed this I wanted the behaviour to be aligned across the TFMs so I aimed for the same design as the HttpClient even though some people find it suboptimal I felt it is at least consistent with the out of the box experience of the framework.

dotnet/runtime#21965

The team has decided to set the InnerException of the OperationCanceledException to a Timeoutexception to differentiate user cancellation vs timeouts. This seams to be a reasonable tradeoff that could be taken here as well. See

dotnet/runtime#2281

FYI I'm just providing context as I'm the original author of the fix that got approved and merged but I do not work for AWS

This was referenced May 3, 2019

Make sure the full framework path that uses HttpWebRequest honors the request timeout out of the box #1276

Merged

SQS Transport stops receiving messages after AWS connectivity failure Particular/NServiceBus.AmazonSQS#296

Closed

klaytaybai added the feature-request A feature should be added or improved. label May 6, 2019

danielmarbach closed this as completed Aug 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQS Timeouts not firing when using async ReceiveMessageAsync #1275

SQS Timeouts not firing when using async ReceiveMessageAsync #1275

danielmarbach commented May 3, 2019 •

edited

Loading

klaytaybai commented May 6, 2019

staylr commented May 30, 2019

danielmarbach commented Jun 12, 2019

staylr commented Jul 12, 2019

danielmarbach commented Aug 5, 2019

danielmarbach commented Aug 16, 2019

normj commented Aug 16, 2019

slang25 commented Aug 16, 2019

indy-singh commented Oct 20, 2020

danielmarbach commented Oct 20, 2020

SQS Timeouts not firing when using async ReceiveMessageAsync #1275

SQS Timeouts not firing when using async ReceiveMessageAsync #1275

Comments

danielmarbach commented May 3, 2019 • edited Loading

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

.NET Core Info

klaytaybai commented May 6, 2019

staylr commented May 30, 2019

danielmarbach commented Jun 12, 2019

staylr commented Jul 12, 2019

danielmarbach commented Aug 5, 2019

danielmarbach commented Aug 16, 2019

normj commented Aug 16, 2019

slang25 commented Aug 16, 2019

indy-singh commented Oct 20, 2020

danielmarbach commented Oct 20, 2020

danielmarbach commented May 3, 2019 •

edited

Loading