-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQS Timeouts not firing when using async ReceiveMessageAsync #1275
Comments
@danielmarbach, thanks for raising this again and submitting the PR. It seems reasonable to me to add this into the SDK, but I'll get the team's opinion. |
Hi @klaytaybai. We're the downstream user who reported this. Any update? |
Any progress on this one? |
I don't want to sound impatient here but I think this issue requires a bit of attention after almost 3 months? // @klaytaybai @normj |
Thanks! |
No, thank you! |
A little late to the party, apologies. This once took down the whole Just Eat platform in a region when the NAT restarted and all SQS listeners were indefinitely awaiting, so this is a big deal, nice one @danielmarbach! |
Is there any reason why Background: Yesterday we upgraded our AWS libs across the board and today incurred a platform-wide outage; the sqs client has a timeout of 5 seconds, but the queue is long polling at 20 seconds. I get that the scenario where the client timeout is less than queue poll time is invalid (and should always be). But I strongly believe this should have thrown a Thanks, |
As far as I'm concerned when I fixed this I wanted the behaviour to be aligned across the TFMs so I aimed for the same design as the HttpClient even though some people find it suboptimal I felt it is at least consistent with the out of the box experience of the framework. The team has decided to set the InnerException of the OperationCanceledException to a Timeoutexception to differentiate user cancellation vs timeouts. This seams to be a reasonable tradeoff that could be taken here as well. See FYI I'm just providing context as I'm the original author of the fix that got approved and merged but I do not work for AWS |
This is a reopening of #609
As part of the above issue, the timeout property has been augmented to indicate that it is the responsibility of the caller to implement a timeout by passing a cancellation token. I do think this behavior is hard to understand due to the following reasons.
When you compare the .NET Core Version with the full framework version then you can see that the timeout property is honored due to the fact that the .NET Core version uses the
HttpClient
which throws anOperationCanceledException
after the default timeout of 100 seconds. For the full framework version, the behavior is different. If you don't pass in a cancellation token that implements the timeout as well as potentially the shutdown SLA for EVERY async call to the SDK calls can indefinitely hang.So as a consumer of the API I'm essentially forced to do the following
This is very cumbersome plus code that is framework specific and assumes someone that is using the SDK actually has to understand that the full framework implementation happens to use HttpWebRequest while the .NET Core version doesn't.
Expected Behavior
Full framework version should also honor the client config timeout and time out calls after the specified period and throw an
OperationCanceledException
Current Behavior
Calls may hang indefinitely
Possible Solution
See PR
Steps to Reproduce (for bugs)
For example receive from SQS with
sqsClient.ReceiveMessagesAsync()
with a server timeout of 20 seconds. Pull the cable. Wait a few minutes. Reconnect the cable. Send a new message to the queue and observe how the message is never received because of thesqsClient.ReceiveMessagesAsync()
hangsContext
Particular/NServiceBus.AmazonSQS#296
Your Environment
.NET Core Info
Affects full framework
The text was updated successfully, but these errors were encountered: