Availability: Fixes region failover logic on control plane hot path when gateway hangs #2279

ealsur · 2021-03-03T20:33:38Z

If Gateway is a hung state where the HttpRequestException only happens after 60 seconds, the current Hot Path policy was only waiting up to 10 seconds and surfacing a RequestTimeout, and we could not apply the failover mechanics.

This only affected the hot path (address resolution or query plans) and not other Gateway calls.

Introduced in 3.16.0 in PR #1954

Type of change

Bug fix (non-breaking change which fixes an issue)

Microsoft.Azure.Cosmos/src/HttpClient/HttpTimeoutPolicyControlPlaneRetriableHotPath.cs

…zure-cosmos-dotnet-v3 into users/ealsur/retrychange

kirankumarkolli

Lets please wait merging this change.

Latency is also equally very important as well.
May be we need to revisit basing all fail-over on just ReqeustException

j82w · 2021-03-04T14:07:33Z

Lets please wait merging this change.

Latency is also equally very important as well.
May be we need to revisit basing all fail-over on just ReqeustException

@kirankumarkolli I don't see any reason to block this PR. If it is decided that we want to revisit fail-over design it should be a separate PR after the current model is fixed.

ealsur · 2021-03-04T22:14:46Z

I don't see how latency is involved here. We are already retrying 3 times, with .5, and 5 seconds, before the 65 seconds. If Gateway has a latency spike, it will be clear from the diagnostics. If Gateway had a 20 seconds latency spike, previously we were throwing a RequestTimeout, so the operation failed (not helping latency).

The 3 retries are to work around Gateway upgrades and try to reach other instances, not for latency (rather availability).

…hen gateway hangs (#2279) * timeout value * tests * Replacing 10 with 65 * comma

ealsur added 2 commits March 3, 2021 11:46

timeout value

1c31ffd

tests

02ce6d2

ealsur self-assigned this Mar 3, 2021

ealsur requested review from bchong95, FabianMeiswinkel, j82w, khdang, kirankumarkolli, kirillg and sboshra as code owners March 3, 2021 20:33

j82w reviewed Mar 3, 2021

View reviewed changes

Microsoft.Azure.Cosmos/src/HttpClient/HttpTimeoutPolicyControlPlaneRetriableHotPath.cs Outdated Show resolved Hide resolved

ealsur and others added 4 commits March 3, 2021 13:42

Replacing 10 with 65

c766cfb

Merge branch 'master' into users/ealsur/retrychange

048b0c9

comma

d542fa9

Merge branch 'users/ealsur/retrychange' of https://github.com/Azure/a…

f0af576

…zure-cosmos-dotnet-v3 into users/ealsur/retrychange

j82w approved these changes Mar 3, 2021

View reviewed changes

j82w changed the title ~~Availability: Fixes detection for hot path during gateway hang~~ Availability: Fixes region failover logic on control plane hot path when gateway hangs Mar 3, 2021

kirankumarkolli requested changes Mar 4, 2021

View reviewed changes

Merge branch 'master' into users/ealsur/retrychange

d3f67d9

Merge branch 'master' into users/ealsur/retrychange

620683d

kirankumarkolli approved these changes Mar 9, 2021

View reviewed changes

ealsur merged commit e2016ed into master Mar 9, 2021

ealsur deleted the users/ealsur/retrychange branch March 9, 2021 17:03

simplynaveen20 mentioned this pull request Mar 9, 2021

Availability: Fixes region failover logic on control plane hot path when gateway hangs Azure/azure-sdk-for-java#19722

Closed

ealsur added a commit that referenced this pull request Mar 18, 2021

Availability: Fixes region failover logic on control plane hot path w…

4a230ab

…hen gateway hangs (#2279) * timeout value * tests * Replacing 10 with 65 * comma

ealsur mentioned this pull request Jun 27, 2022

CancellationToken: User provided CT not flowing for initialization related requests #3279

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Availability: Fixes region failover logic on control plane hot path when gateway hangs #2279

Availability: Fixes region failover logic on control plane hot path when gateway hangs #2279

ealsur commented Mar 3, 2021 •

edited by j82w

Loading

kirankumarkolli left a comment

j82w commented Mar 4, 2021

ealsur commented Mar 4, 2021 •

edited

Loading

Availability: Fixes region failover logic on control plane hot path when gateway hangs #2279

Availability: Fixes region failover logic on control plane hot path when gateway hangs #2279

Conversation

ealsur commented Mar 3, 2021 • edited by j82w Loading

Type of change

kirankumarkolli left a comment

Choose a reason for hiding this comment

j82w commented Mar 4, 2021

ealsur commented Mar 4, 2021 • edited Loading

ealsur commented Mar 3, 2021 •

edited by j82w

Loading

ealsur commented Mar 4, 2021 •

edited

Loading