-
Notifications
You must be signed in to change notification settings - Fork 860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[azservicebus] infrequently called SendMessage
is very slow
#17182
Comments
Hi @ryepup, thank you for submitting this. I'll take a look. The trick here is mostly in the azservicebus code understanding that the error might be "old" and thus a reconnect immediately would be okay. I'll be brainstorming some ideas on this. |
That PR looks pretty good to me. I got some more logs and I'm seeing one other error that probably doesn't need a sleep:
I think #17205 handles this nil error too. |
Fixing the 'stale' sender by immediately attempting to recover the link. Note, this changes the semantics of how ResetAttempts() works in that now it resets us all the way back to the 0th attempt but the old behavior wasn't used anywhere that matters. I've added in a stress test to show demo the scenario and to validate that our recovery after that is fast and doesn't have a sleep in it. Fixes #17182
Yes, those should be fine too - all detach errors are treated as immediately retryable (for a single iteration), just to guard against the potential stale link. There are still some potential scenarios where a SendMessage() can be slow - if the connection has to be recreated, or on first call (since the call itself is lazily initialized). Still, it should work better for your scenario, at the very least, and hopefully helps everyone get more even performance when there are large idle periods between calls. Once again, thanks for reporting this, we really appreciate it. This scenario has also been added to our internal stress tests as well: https://github.com/Azure/azure-sdk-for-go/blob/main/sdk/messaging/azservicebus/internal/stress/tests/idle_fast_reconnect.go |
Thanks! I re-ran my test from above with v0.3.6, the numbers are looking better: $ go run .
[14:12.53974]: send took 3.218310974s
[14:12.54000]: message sent, waiting for idle timeout
[25:12.56155]: waited, sending again
[25:12.56175]: azsb.Conn Link was previously detached. Attempting quick reconnect to recover from error: link detached, reason: *Error{Condition: amqp:link:detach-forced, Description: The link 'G7:37173493:Yzsn8jmXsb1qM_brSLavATtsg3_HY_fNAqPNh745u3H8eDuNS-JdnQ' is force detached. Code: publisher(link95224). Details: AmqpMessagePublisher.IdleTimerExpired: Idle timeout: 00:10:00., Info: map[]}
[25:12.56176]: azsb.Retry (SendMessage) Resetting attempts
[25:12.56177]: azsb.Retry (SendMessage) Attempt -1 returned retryable error: link detached, reason: *Error{Condition: amqp:link:detach-forced, Description: The link 'G7:37173493:Yzsn8jmXsb1qM_brSLavATtsg3_HY_fNAqPNh745u3H8eDuNS-JdnQ' is force detached. Code: publisher(link95224). Details: AmqpMessagePublisher.IdleTimerExpired: Idle timeout: 00:10:00., Info: map[]}
[25:12.56178]: azsb.Conn Recovering link for error link detached, reason: *Error{Condition: amqp:link:detach-forced, Description: The link 'G7:37173493:Yzsn8jmXsb1qM_brSLavATtsg3_HY_fNAqPNh745u3H8eDuNS-JdnQ' is force detached. Code: publisher(link95224). Details: AmqpMessagePublisher.IdleTimerExpired: Idle timeout: 00:10:00., Info: map[]}
[25:12.56327]: azsb.Conn Recovering link only
[25:14.57436]: azsb.Conn Recovered links
[25:14.66447]: send took 2.102870198s |
Bug Report
github.com/Azure/azure-sdk-for-go/sdk/messaging/azservicebus
v0.3.5
What happened?
My application pauses for 3-5s when calling Sender.SendMessage after a period of idle time.
I enabled azsb logs and saw this sequence in my logs:
The sleep varies, but is consistently between 3-5s.
What did you expect or want to happen?
I want "idle timeout" errors to skip this sleep:
azure-sdk-for-go/sdk/messaging/azservicebus/internal/utils/retrier.go
Line 52 in 4e61713
The sleep will help with transient network errors, but is ineffective for idle timeouts.
How can we reproduce it?
Here's a
main.go
to show the timings. Update theconst
s at the top andgo run main.go
On my machine this produced the following outpiut:
Anything we should know about your environment.
I have low traffic, so frequently have long idle periods. I call
SendMessage
from HTTP handlers. I have one Client and one Sender for my application, shared by all HTTP handlers.I recently switched to azservicebus, and my request durations went from 0-1s to 5-6s, with this sleep accounting for most of that time:
This extra time is causing further timeouts in my API consumers. I'm pursuing two workarounds: get
SendMessage
out of my HTTP handlers, and sending a "heartbeat" message every 9m to keep the connection open.The text was updated successfully, but these errors were encountered: