-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Availability: Fixes region failover logic on control plane hot path when gateway hangs #2279
Conversation
Microsoft.Azure.Cosmos/src/HttpClient/HttpTimeoutPolicyControlPlaneRetriableHotPath.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets please wait merging this change.
Latency is also equally very important as well.
May be we need to revisit basing all fail-over on just ReqeustException
@kirankumarkolli I don't see any reason to block this PR. If it is decided that we want to revisit fail-over design it should be a separate PR after the current model is fixed. |
I don't see how latency is involved here. We are already retrying 3 times, with .5, and 5 seconds, before the 65 seconds. If Gateway has a latency spike, it will be clear from the diagnostics. If Gateway had a 20 seconds latency spike, previously we were throwing a RequestTimeout, so the operation failed (not helping latency). The 3 retries are to work around Gateway upgrades and try to reach other instances, not for latency (rather availability). |
…hen gateway hangs (#2279) * timeout value * tests * Replacing 10 with 65 * comma
If Gateway is a hung state where the HttpRequestException only happens after 60 seconds, the current Hot Path policy was only waiting up to 10 seconds and surfacing a RequestTimeout, and we could not apply the failover mechanics.
This only affected the hot path (address resolution or query plans) and not other Gateway calls.
Introduced in 3.16.0 in PR #1954
Type of change