[Per Partition Automatic Failover] Retry on Next Preferred Region For Metadata Reads Gateway Timeouts #4181

kundadebdatta · 2023-11-09T22:38:20Z

Background:

The Cosmos .NET V3 SDK should attempt to retry on to another region for fetching the collection information (Read Collection call) or partition key ranges information (Get PkRanges call), if the master partition of the primary region is in complete quorum loss. However, this is not happening today reason being the request to the routing gateway takes more than 65 seconds to respond back, thus timing out the SDK request. The SDK makes 3 retries, each of which times out within 65 seconds. Today, our .NET v3 SDK doesn't retry on gateway timeouts (on TaskCancelled exceptions), thus if the metadata information is not retrieved, then the SDK is stuck to get initialized.

Account Setup For 3 regions : Create a cosmos account with 3 regions, P1 (Write), P2 (Read) and P3 (Read). The PPAF configuration from the BE is to failover to P2, in case P1 is unavailable.

Scenario: While creating the cosmos client, in the application preferred region, provide P1, P2 and P3 as preferred regions.

            CosmosClientOptions clientOptions = new CosmosClientOptions()
            {
                ApplicationPreferredRegions = new List<string>()
                {
                    Regions.P1,
                    Regions.P2,
                    Regions.P3
                },
                EnablePartitionLevelFailover = true,

Before initializing the cosmos client, use the service fabric commands to trigger a "full quorum loss" on the master partition.

Current Behavior:

The SDK keeps retrying on the region P1 for reading the collection information and times out eventually. To understand this better, take a look at the below diagnostics:

Diagnostics Snippet - Scenario: Master Partitions are in Complete Quorum Loss.

{
                  "name": "Read Collection",
                  "duration in milliseconds": 196034.3648,
                  "data": {
                      "Client Side Request Stats": {
                          "Id": "AggregatedClientSideRequestStatistics",
                          "ContactedReplicas": [],
                          "RegionsContacted": [],
                          "FailedReplicas": [],
                          "AddressResolutionStatistics": [],
                          "StoreResponseStatistics": [],
                          "HttpResponseStats": [
                              {
                                  "StartTimeUTC": "2023-10-20T21:57:22.9028944Z",
                                  "DurationInMs": 65009.841,
                                  "RequestUri": "https://dkundatest-northcentralus.documents-test.windows-int.net/dbs/comments/colls/comments",
                                  "ResourceType": "Collection",
                                  "HttpMethod": "GET",
                                  "ActivityId": "e0029295-d533-4e5f-8d1d-9f0957abe01f",
                                  "ExceptionType": "System.Threading.Tasks.TaskCanceledException",
                                  "ExceptionMessage": "The operation was canceled."
                              },
                              {
                                  "StartTimeUTC": "2023-10-20T21:58:27.9128502Z",
                                  "DurationInMs": 65008.3658,
                                  "RequestUri": "https://dkundatest-northcentralus.documents-test.windows-int.net/dbs/comments/colls/comments",
                                  "ResourceType": "Collection",
                                  "HttpMethod": "GET",
                                  "ActivityId": "e0029295-d533-4e5f-8d1d-9f0957abe01f",
                                  "ExceptionType": "System.Threading.Tasks.TaskCanceledException",
                                  "ExceptionMessage": "The operation was canceled."
                              },
                              {
                                  "StartTimeUTC": "2023-10-20T21:59:33.9271150Z",
                                  "DurationInMs": 65010.0114,
                                  "RequestUri": "https://dkundatest-northcentralus.documents-test.windows-int.net/dbs/comments/colls/comments",
                                  "ResourceType": "Collection",
                                  "HttpMethod": "GET",
                                  "ActivityId": "e0029295-d533-4e5f-8d1d-9f0957abe01f",
                                  "ExceptionType": "System.Threading.Tasks.TaskCanceledException",
                                  "ExceptionMessage": "The operation was canceled."
                              }
                          ]
                      }
                  }
              }

Expected Behavior/ Acceptance Criteria:

Ideally, the above setup should have worked and the SDK should have retried to the region P2 to get the collection information from the gateway. Note that this behavior is expected irrespective to the fact that per partition automatic failover is enabled or not.

Scope:

The changes discussed, are applicable for a CosmosClient set in Direct mode.
These changes are applicable to cold start a CosmosClient, when all the metadata information are needed to be fetched and cached.

High Level Changes:

Update the client retry policy to retry on gateway timeouts. Timeouts often throw TaskCancelledException which is eventually wrapped in a CosmosException from the CosmosHttpClient. The idea is to extend the retry policy to retry on CosmosExceptions.
Create a new retry policy called MetadataRequestThrottleRetryPolicy, which is a wrapper around the RequestThrottleRetryPolicy and particularly handles all of the metadata requests. The purpose is to mark an endpoint unavailable for read, when a gateway timeout occurs, so that the next retry could happen on another region.

Design Approach:

sequenceDiagram
    participant A as ContainerCore.Item <br> [v3 Code]
    participant B as ClientContextCore <br> [v3 Code]
    participant C as RequestInvokerHandler <br> [v3 Code]
    participant D as RetryHandler <br> [v3 Code]
    participant E as TransportHandler <br> [v3 Code]
    participant F as ServerStoreModel <br> [Direct Code]
    participant G as StoreClient <br> [Direct Code]
    participant H as ReplicatedResourceClient <br> [Direct Code]
    participant I as ConsistencyWriter <br> [Direct Code]
    participant J as AddressSelector <br> [Direct Code]
    participant K as GlobalAddressResolver <br> [v3 Code]
    participant L as AddressResolver <br> [v3 Code]
    participant M as PartitionKeyRangeCache <br> [v3 Code]
    participant N as GatewayStoreModel <br> [v3 Code]
    participant O as GatewayAddressCache <br> [v3 Code]
    participant R as GatewayStoreClient <br> [v3 Code]
    participant P as CosmosHttpClient <br> [v3 Code]

    A->>B: 1. ProcessResourceOperationStreamAsync()
    B->>C: 2. SendAsync()
    C->>D: 3. SendAsync()
    critical ClientRetryPolicy OperationType: Create, ResourceType: Document
        D->>E: 4. ProcessMessageAsync()
        E->>F: 5. ProcessMessageAsync()
        F->>G: 6. ProcessMessageAsync()
        G->>H: 7. InvokeAsync()
        H->>I: 8. WriteAsync()
        I->>J: 9. ResolveAddressesAsync()
        J->>K: 10. ResolveAsync()
        K->>L: 11. ResolveAsync()
        L->>L: 12. ResolveAddressesAndIdentityAsync()
        L->>M: 13. TryLookupAsync()      
        M->>M: 14. GetRoutingMapFor<br>CollectionAsync()
        critical Get Pk Ranges using ResourceThrottleRetryPolicy OperationType: ReadFeed, ResourceType: PartitionKeyRange          
            M->>M: 15. ExecutePartitionKey<br>RangeReadChangeFeedAsync()
            M->>N: 16. ProcessMessageAsync()
            N->>R: 17. InvokeAsync()
            R->>P: 18. SendHttpAsync()
        end
        critical Get Server Addresses
            L->>O: 19. TryGetAddressesAsync()
            O->>P: 20. GetAsync()
        end
    end

Current Flow:

sequenceDiagram
    participant J as ClientRetryPolicy <br> [v3 Code]
    participant A as ConsistencyWriter <br> [Direct Code]
    participant B as AddressSelector <br> [Direct Code]
    participant C as GlobalAddressResolver <br> [v3 Code]
    participant D as AddressResolver <br> [v3 Code]
    participant E as PartitionKeyRangeCache <br> [v3 Code]
    participant K as ResourceThrottleRetryPolicy <br> [v3 Code]
    participant F as GatewayStoreModel <br> [v3 Code]
    participant G as GatewayAddressCache <br> [v3 Code]
    participant H as GatewayStoreClient <br> [v3 Code]
    participant I as CosmosHttpClient <br> [v3 Code]

    J-->>A: 1. DocumentService<br>Request
    loop Retry Iterations with force-refresh flag = false/ true
        A->>B: 2. ResolveAddressesAsync()
        B->>C: 3. ResolveAsync()
        C->>D: 4. ResolveAsync()
        D->>D: 5. ResolveAddressesAndIdentityAsync()
        D->>E: 6. TryLookupAsync()      
        E->>E: 7. GetRoutingMapFor<br>CollectionAsync()
        E->>E: 8. ExecutePartitionKey<br>RangeReadChangeFeedAsync()
        E->>F: 9. ProcessMessageAsync()
        F->>H: 10. InvokeAsync()
        H->>I: 11. SendHttpAsync()
        Note over K: PartitionKeyRangeCache uses the <br> ResourceThrottleRetryPolicy to <br> retry the GET PkRanges call
        I-->>K: 12. CosmosException with 503 on timeout
        K->>K: 13. ShouldRetryAsync() <br> = RetryResult.NoRetry()
    end
    K-->>J: 14. CosmosException with 503 on timeout
    J->>J: 15. RetryAfter <br> (TimeSpan.Zero)
    D->>G: 16. TryGetAddressesAsync()
    G->>I: 17. GetAsync()

Sample Diagnostics with Current Flow:

Scenario: Master Partitions are in Complete Quorum Loss.

{
  "Summary": {
      "GatewayCalls": {
          "(0, 0)": 9,
          "(200, 0)": 1
      }
  },
  "name": "CreateItemAsync",
  "start datetime": "2023-11-09T23:51:23.579Z",
  "duration in milliseconds": 168399.8519,
  "data": {
      "Client Configuration": {
          "Client Created Time Utc": "2023-11-09T23:51:23.5574825Z",
          "MachineId": "vmId:9e487c80-ca5c-4cca-9076-1210bb14fa05",
          "VM Region": "eastus",
          "NumberOfClientsCreated": 1,
          "NumberOfActiveClients": 1,
          "ConnectionMode": "Direct",
          "User Agent": "cosmos-netstandard-sdk/3.36.0|1|X64|Microsoft Windows 10.0.22631|.NET 6.0.24|L|",
          "ConnectionConfig": {
              "gw": "(cps:50, urto:5, p:False, httpf: False)",
              "rntbd": "(cto: 5, icto: -1, mrpc: 30, mcpe: 65535, erd: True, pr: ReuseUnicastPort)",
              "other": "(ed:False, be:False)"
          },
          "ConsistencyConfig": "(consistency: NotSet, prgns:[North Central US, East Asia], apprgn: )",
          "ProcessorCount": 16
      }
  },
  "children": [
      {
          "name": "ItemSerialize",
          "duration in milliseconds": 3.7082
      },
      {
          "name": "Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler",
          "duration in milliseconds": 168385.8862,
          "children": [
              {
                  "name": "Waiting for Initialization of client to complete",
                  "duration in milliseconds": 5895.6745
              },
              {
                  "name": "Get Collection Cache",
                  "duration in milliseconds": 0.006
              },
              {
                  "name": "Read Collection",
                  "duration in milliseconds": 19066.7017,
                  "data": {
                      "Client Side Request Stats": {
                          "Id": "AggregatedClientSideRequestStatistics",
                          "ContactedReplicas": [],
                          "RegionsContacted": [],
                          "FailedReplicas": [],
                          "AddressResolutionStatistics": [],
                          "StoreResponseStatistics": [],
                          "HttpResponseStats": [
                              {
                                  "StartTimeUTC": "2023-11-09T23:51:29.5225663Z",
                                  "DurationInMs": 3007.5937,
                                  "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/comments/colls/comments",
                                  "ResourceType": "Collection",
                                  "HttpMethod": "GET",
                                  "ActivityId": "f1f6707f-a56f-4489-8831-df39c8c2ec8d",
                                  "ExceptionType": "System.Threading.Tasks.TaskCanceledException",
                                  "ExceptionMessage": "The operation was canceled."
                              },
                              {
                                  "StartTimeUTC": "2023-11-09T23:51:32.5303753Z",
                                  "DurationInMs": 5025.083,
                                  "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/comments/colls/comments",
                                  "ResourceType": "Collection",
                                  "HttpMethod": "GET",
                                  "ActivityId": "f1f6707f-a56f-4489-8831-df39c8c2ec8d",
                                  "ExceptionType": "System.Threading.Tasks.TaskCanceledException",
                                  "ExceptionMessage": "The operation was canceled."
                              },
                              {
                                  "StartTimeUTC": "2023-11-09T23:51:38.5623660Z",
                                  "DurationInMs": 10015.7329,
                                  "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/comments/colls/comments",
                                  "ResourceType": "Collection",
                                  "HttpMethod": "GET",
                                  "ActivityId": "f1f6707f-a56f-4489-8831-df39c8c2ec8d",
                                  "ExceptionType": "System.Threading.Tasks.TaskCanceledException",
                                  "ExceptionMessage": "The operation was canceled."
                              }
                          ]
                      }
                  }
              },
              {
                  "name": "Read Collection",
                  "duration in milliseconds": 233.1066,
                  "data": {
                      "Client Side Request Stats": {
                          "Id": "AggregatedClientSideRequestStatistics",
                          "ContactedReplicas": [],
                          "RegionsContacted": [],
                          "FailedReplicas": [],
                          "AddressResolutionStatistics": [],
                          "StoreResponseStatistics": [],
                          "HttpResponseStats": [
                              {
                                  "StartTimeUTC": "2023-11-09T23:51:48.5879667Z",
                                  "DurationInMs": 211.8006,
                                  "RequestUri": "https://dkppaf-eastasia.documents-test.windows-int.net/dbs/comments/colls/comments",
                                  "ResourceType": "Collection",
                                  "HttpMethod": "GET",
                                  "ActivityId": "63fe7148-3895-44b1-9f51-ed0c97b5b9a8",
                                  "StatusCode": "OK"
                              }
                          ]
                      }
                  }
              },
              {
                  "name": "Microsoft.Azure.Cosmos.Handlers.DiagnosticsHandler",
                  "duration in milliseconds": 143152.8688,
                  "data": {
                      "System Info": {
                          "systemHistory": [
                              {
                                  "dateUtc": "2023-11-09T23:53:18.6722453Z",
                                  "cpu": 2.282,
                                  "memory": 47082876,
                                  "threadInfo": {
                                      "isThreadStarving": "False",
                                      "threadWaitIntervalInMs": 0.0768,
                                      "availableThreads": 32765,
                                      "minThreads": 16,
                                      "maxThreads": 32767
                                  },
                                  "numberOfOpenTcpConnection": 0
                              },
                              {
                                  "dateUtc": "2023-11-09T23:53:28.6792048Z",
                                  "cpu": 2.108,
                                  "memory": 47083920,
                                  "threadInfo": {
                                      "isThreadStarving": "False",
                                      "threadWaitIntervalInMs": 0.0628,
                                      "availableThreads": 32765,
                                      "minThreads": 16,
                                      "maxThreads": 32767
                                  },
                                  "numberOfOpenTcpConnection": 0
                              },
                              {
                                  "dateUtc": "2023-11-09T23:53:38.6920165Z",
                                  "cpu": 2.264,
                                  "memory": 47087508,
                                  "threadInfo": {
                                      "isThreadStarving": "False",
                                      "threadWaitIntervalInMs": 0.2903,
                                      "availableThreads": 32765,
                                      "minThreads": 16,
                                      "maxThreads": 32767
                                  },
                                  "numberOfOpenTcpConnection": 0
                              },
                              {
                                  "dateUtc": "2023-11-09T23:53:48.7054952Z",
                                  "cpu": 3.256,
                                  "memory": 47122948,
                                  "threadInfo": {
                                      "isThreadStarving": "False",
                                      "threadWaitIntervalInMs": 0.062,
                                      "availableThreads": 32765,
                                      "minThreads": 16,
                                      "maxThreads": 32767
                                  },
                                  "numberOfOpenTcpConnection": 0
                              },
                              {
                                  "dateUtc": "2023-11-09T23:53:58.7185744Z",
                                  "cpu": 3.471,
                                  "memory": 47119588,
                                  "threadInfo": {
                                      "isThreadStarving": "False",
                                      "threadWaitIntervalInMs": 0.0841,
                                      "availableThreads": 32765,
                                      "minThreads": 16,
                                      "maxThreads": 32767
                                  },
                                  "numberOfOpenTcpConnection": 0
                              },
                              {
                                  "dateUtc": "2023-11-09T23:54:08.7322754Z",
                                  "cpu": 2.887,
                                  "memory": 47115352,
                                  "threadInfo": {
                                      "isThreadStarving": "False",
                                      "threadWaitIntervalInMs": 0.055,
                                      "availableThreads": 32765,
                                      "minThreads": 16,
                                      "maxThreads": 32767
                                  },
                                  "numberOfOpenTcpConnection": 0
                              }
                          ]
                      }
                  },
                  "children": [
                      {
                          "name": "Microsoft.Azure.Cosmos.Handlers.TelemetryHandler",
                          "duration in milliseconds": 143152.268,
                          "children": [
                              {
                                  "name": "Microsoft.Azure.Cosmos.Handlers.RetryHandler",
                                  "duration in milliseconds": 143151.2067,
                                  "children": [
                                      {
                                          "name": "Microsoft.Azure.Cosmos.Handlers.RouterHandler",
                                          "duration in milliseconds": 71588.955,
                                          "children": [
                                              {
                                                  "name": "Microsoft.Azure.Cosmos.Handlers.TransportHandler",
                                                  "duration in milliseconds": 71588.4398,
                                                  "children": [
                                                      {
                                                          "name": "Microsoft.Azure.Documents.ServerStoreModel Transport Request",
                                                          "duration in milliseconds": 71585.6993,
                                                          "data": {
                                                              "Client Side Request Stats": {
                                                                  "Id": "AggregatedClientSideRequestStatistics",
                                                                  "ContactedReplicas": [],
                                                                  "RegionsContacted": [],
                                                                  "FailedReplicas": [],
                                                                  "AddressResolutionStatistics": [],
                                                                  "StoreResponseStatistics": [],
                                                                  "HttpResponseStats": [
                                                                      {
                                                                          "StartTimeUTC": "2023-11-09T23:51:48.8523547Z",
                                                                          "DurationInMs": 513.076,
                                                                          "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/UD9beQ==/colls/UD9beckK-kw=/pkranges",
                                                                          "ResourceType": "PartitionKeyRange",
                                                                          "HttpMethod": "GET",
                                                                          "ActivityId": "3f88c75b-1699-4df7-84fb-c32f18295fc8",
                                                                          "ExceptionType": "System.Threading.Tasks.TaskCanceledException",
                                                                          "ExceptionMessage": "The operation was canceled."
                                                                      },
                                                                      {
                                                                          "StartTimeUTC": "2023-11-09T23:51:49.3655972Z",
                                                                          "DurationInMs": 5005.4728,
                                                                          "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/UD9beQ==/colls/UD9beckK-kw=/pkranges",
                                                                          "ResourceType": "PartitionKeyRange",
                                                                          "HttpMethod": "GET",
                                                                          "ActivityId": "3f88c75b-1699-4df7-84fb-c32f18295fc8",
                                                                          "ExceptionType": "System.Threading.Tasks.TaskCanceledException",
                                                                          "ExceptionMessage": "The operation was canceled."
                                                                      },
                                                                      {
                                                                          "StartTimeUTC": "2023-11-09T23:51:55.3768757Z",
                                                                          "DurationInMs": 65019.4496,
                                                                          "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/UD9beQ==/colls/UD9beckK-kw=/pkranges",
                                                                          "ResourceType": "PartitionKeyRange",
                                                                          "HttpMethod": "GET",
                                                                          "ActivityId": "3f88c75b-1699-4df7-84fb-c32f18295fc8",
                                                                          "ExceptionType": "System.Threading.Tasks.TaskCanceledException",
                                                                          "ExceptionMessage": "The operation was canceled."
                                                                      }
                                                                  ]
                                                              }
                                                          }
                                                      }
                                                  ]
                                              }
                                          ]
                                      },
                                      {
                                          "name": "Microsoft.Azure.Cosmos.Handlers.RouterHandler",
                                          "duration in milliseconds": 71558.0089,
                                          "children": [
                                              {
                                                  "name": "Microsoft.Azure.Cosmos.Handlers.TransportHandler",
                                                  "duration in milliseconds": 71558.0034,
                                                  "children": [
                                                      {
                                                          "name": "Microsoft.Azure.Documents.ServerStoreModel Transport Request",
                                                          "duration in milliseconds": 71557.9089,
                                                          "data": {
                                                              "Client Side Request Stats": {
                                                                  "Id": "AggregatedClientSideRequestStatistics",
                                                                  "ContactedReplicas": [],
                                                                  "RegionsContacted": [],
                                                                  "FailedReplicas": [],
                                                                  "AddressResolutionStatistics": [],
                                                                  "StoreResponseStatistics": [],
                                                                  "HttpResponseStats": [
                                                                      {
                                                                          "StartTimeUTC": "2023-11-09T23:53:00.4191476Z",
                                                                          "DurationInMs": 520.8107,
                                                                          "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/UD9beQ==/colls/UD9beckK-kw=/pkranges",
                                                                          "ResourceType": "PartitionKeyRange",
                                                                          "HttpMethod": "GET",
                                                                          "ActivityId": "39c6f277-ca6e-4bf4-962d-d74d3662f832",
                                                                          "ExceptionType": "System.Threading.Tasks.TaskCanceledException",
                                                                          "ExceptionMessage": "The operation was canceled."
                                                                      },
                                                                      {
                                                                          "StartTimeUTC": "2023-11-09T23:53:00.9400252Z",
                                                                          "DurationInMs": 5014.2824,
                                                                          "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/UD9beQ==/colls/UD9beckK-kw=/pkranges",
                                                                          "ResourceType": "PartitionKeyRange",
                                                                          "HttpMethod": "GET",
                                                                          "ActivityId": "39c6f277-ca6e-4bf4-962d-d74d3662f832",
                                                                          "ExceptionType": "System.Threading.Tasks.TaskCanceledException",
                                                                          "ExceptionMessage": "The operation was canceled."
                                                                      },
                                                                      {
                                                                          "StartTimeUTC": "2023-11-09T23:53:06.9615907Z",
                                                                          "DurationInMs": 65012.6398,
                                                                          "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/UD9beQ==/colls/UD9beckK-kw=/pkranges",
                                                                          "ResourceType": "PartitionKeyRange",
                                                                          "HttpMethod": "GET",
                                                                          "ActivityId": "39c6f277-ca6e-4bf4-962d-d74d3662f832",
                                                                          "ExceptionType": "System.Threading.Tasks.TaskCanceledException",
                                                                          "ExceptionMessage": "The operation was canceled."
                                                                      }
                                                                  ]
                                                              }
                                                          }
                                                      }
                                                  ]
                                              }
                                          ]
                                      }
                                  ]
                              }
                          ]
                      }
                  ]
              }
          ]
      }
  ]
}

Proposed Flow:

sequenceDiagram
    participant J as ClientRetryPolicy <br> [v3 Code]
    participant A as ConsistencyWriter <br> [Direct Code]
    participant B as AddressSelector <br> [Direct Code]
    participant C as GlobalAddressResolver <br> [v3 Code]
    participant D as AddressResolver <br> [v3 Code]
    participant E as PartitionKeyRangeCache <br> [v3 Code]
    participant K as MetadataRequestThrottleRetryPolicy <br> [v3 Code]
    participant F as GatewayStoreModel <br> [v3 Code]
    participant G as GatewayAddressCache <br> [v3 Code]
    participant H as GatewayStoreClient <br> [v3 Code]
    participant I as CosmosHttpClient <br> [v3 Code]
    participant L as GlobalEndpointManager <br> [v3 Code]
    participant M as TransportClient <br> [Direct Code]

    J-->>A: 1. DocumentService<br>Request
    loop Retry Iterations with force-refresh flag = false/ true
        A->>B: 2. ResolveAddressesAsync()
        B->>C: 3. ResolveAsync()
        C->>D: 4. ResolveAsync()
        D->>D: 5. ResolveAddressesAndIdentityAsync()
        D->>E: 6. TryLookupAsync()      
        E->>E: 7. GetRoutingMapFor<br>CollectionAsync()
        E->>E: 8. ExecutePartitionKey<br>RangeReadChangeFeedAsync()
        E->>K: 9. OnBeforeSendRequest()
        K->>K: 10. Sets requestContext.RouteToLocation() <br> using current location index
        K->>L: 11. ResolveServiceEndpoint()
        L-->>K: 12. Resolve and save current endpoint
        E->>F: 13. ProcessMessageAsync(requestContext)
        F->>H: 14. InvokeAsync()
        H->>I: 15. SendHttpAsync()
        Note over K: PartitionKeyRangeCache uses the <br> MetadataRequestThrottleRetryPolicy <br> to retry the GET PkRanges call. <br> OperationType = ReadFeed. <br> ResourceType = PartitionKeyRange  
        I-->>K: 16. CosmosException with 503
        I-->>D: 22. Successful Response in second attempt.
        K->>K: 17. ShouldRetryAsync() kicks in.
        K->>K: 18. IncrementRetryIndexOnService<br>UnavailableForMetadataRead().<br> Increments the next location index. 
        K->>K: 19. Retry the request <br> on the next location index.
    end
    K-->>J: 20. CosmosException with 503
    J->>J: 21. RetryAfter <br> (TimeSpan.Zero)
    D->>G: 23. TryGetAddressesAsync()
    G->>I: 24. GetAsync()   
    I-->>G: 25. Gets Address Information
    G-->>D: 26. Gets Address Information
    D-->>A: 27. Gets PerProtocolPartitionAddressInformation
    A->>M: 28. InvokeResourceOperationAsync(primaryUri)

Complete Diagnostics After the Above Design Changes:

Scenario: Master Partitions are in Complete Quorum Loss.

{
  "Summary": {
      "DirectCalls": {
          "(201, 0)": 1,
          "(204, 0)": 2
      },
      "GatewayCalls": {
          "(0, 0)": 7,
          "(200, 0)": 3,
          "(304, 0)": 1
      }
  },
  "name": "CreateItemAsync",
  "start datetime": "2024-01-09T00:07:55.674Z",
  "duration in milliseconds": 52766.2797,
  "data": {
      "Client Configuration": {
          "Client Created Time Utc": "2024-01-09T00:07:55.6480544Z",
          "MachineId": "hashedMachineName:e04049ca-e421-555f-21ff-492417ecaf52",
          "NumberOfClientsCreated": 1,
          "NumberOfActiveClients": 1,
          "ConnectionMode": "Direct",
          "User Agent": "cosmos-netstandard-sdk/3.37.1|1|X64|Microsoft Windows 10.0.22635|.NET 6.0.25|L|",
          "ConnectionConfig": {
              "gw": "(cps:50, urto:500, p:False, httpf: False)",
              "rntbd": "(cto: 5, icto: -1, mrpc: 30, mcpe: 65535, erd: True, pr: ReuseUnicastPort)",
              "other": "(ed:False, be:False)"
          },
          "ConsistencyConfig": "(consistency: NotSet, prgns:[North Central US, East Asia], apprgn: )",
          "ProcessorCount": 8
      }
  },
  "children": [
      {
          "name": "ItemSerialize",
          "duration in milliseconds": 4.4378
      },
      {
          "name": "Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler",
          "duration in milliseconds": 52747.9847,
          "children": [
              {
                  "name": "Waiting for Initialization of client to complete",
                  "duration in milliseconds": 5756.9445
              },
              {
                  "name": "Get Collection Cache",
                  "duration in milliseconds": 0.0043
              },
              {
                  "name": "Read Collection",
                  "duration in milliseconds": 15549.0829,
                  "data": {
                      "Client Side Request Stats": {
                          "Id": "AggregatedClientSideRequestStatistics",
                          "ContactedReplicas": [],
                          "RegionsContacted": [],
                          "FailedReplicas": [],
                          "AddressResolutionStatistics": [],
                          "StoreResponseStatistics": [],
                          "HttpResponseStats": [
                              {
                                  "StartTimeUTC": "2024-01-09T00:08:01.4894987Z",
                                  "DurationInMs": 503.7343,
                                  "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/comments/colls/comments",
                                  "ResourceType": "Collection",
                                  "HttpMethod": "GET",
                                  "ActivityId": "465bce3f-a83e-4ba2-9f2a-8d21ae3bcc91",
                                  "ExceptionType": "System.Threading.Tasks.TaskCanceledException",
                                  "ExceptionMessage": "The operation was canceled."
                              },
                              {
                                  "StartTimeUTC": "2024-01-09T00:08:01.9935045Z",
                                  "DurationInMs": 5021.4947,
                                  "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/comments/colls/comments",
                                  "ResourceType": "Collection",
                                  "HttpMethod": "GET",
                                  "ActivityId": "465bce3f-a83e-4ba2-9f2a-8d21ae3bcc91",
                                  "ExceptionType": "System.Threading.Tasks.TaskCanceledException",
                                  "ExceptionMessage": "The operation was canceled."
                              },
                              {
                                  "StartTimeUTC": "2024-01-09T00:08:08.0141416Z",
                                  "DurationInMs": 8980.8771,
                                  "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/comments/colls/comments",
                                  "ResourceType": "Collection",
                                  "HttpMethod": "GET",
                                  "ActivityId": "465bce3f-a83e-4ba2-9f2a-8d21ae3bcc91",
                                  "StatusCode": "OK"
                              }
                          ]
                      }
                  }
              },
              {
                  "name": "Microsoft.Azure.Cosmos.Handlers.DiagnosticsHandler",
                  "duration in milliseconds": 31401.0146,
                  "data": {
                      "System Info": {
                          "systemHistory": [
                              {
                                  "dateUtc": "2024-01-09T00:08:00.6690003Z",
                                  "cpu": 3.483,
                                  "memory": 26931268,
                                  "threadInfo": {
                                      "isThreadStarving": "no info",
                                      "availableThreads": 32765,
                                      "minThreads": 8,
                                      "maxThreads": 32767
                                  },
                                  "numberOfOpenTcpConnection": 0
                              },
                              {
                                  "dateUtc": "2024-01-09T00:08:10.6887491Z",
                                  "cpu": 3.466,
                                  "memory": 26936596,
                                  "threadInfo": {
                                      "isThreadStarving": "False",
                                      "threadWaitIntervalInMs": 0.2319,
                                      "availableThreads": 32765,
                                      "minThreads": 8,
                                      "maxThreads": 32767
                                  },
                                  "numberOfOpenTcpConnection": 0
                              },
                              {
                                  "dateUtc": "2024-01-09T00:08:20.7009190Z",
                                  "cpu": 2.91,
                                  "memory": 26936124,
                                  "threadInfo": {
                                      "isThreadStarving": "False",
                                      "threadWaitIntervalInMs": 0.0865,
                                      "availableThreads": 32765,
                                      "minThreads": 8,
                                      "maxThreads": 32767
                                  },
                                  "numberOfOpenTcpConnection": 0
                              },
                              {
                                  "dateUtc": "2024-01-09T00:08:30.7059013Z",
                                  "cpu": 5.189,
                                  "memory": 27004888,
                                  "threadInfo": {
                                      "isThreadStarving": "False",
                                      "threadWaitIntervalInMs": 0.2459,
                                      "availableThreads": 32765,
                                      "minThreads": 8,
                                      "maxThreads": 32767
                                  },
                                  "numberOfOpenTcpConnection": 0
                              },
                              {
                                  "dateUtc": "2024-01-09T00:08:40.7178070Z",
                                  "cpu": 3.105,
                                  "memory": 26952512,
                                  "threadInfo": {
                                      "isThreadStarving": "False",
                                      "threadWaitIntervalInMs": 0.1425,
                                      "availableThreads": 32765,
                                      "minThreads": 8,
                                      "maxThreads": 32767
                                  },
                                  "numberOfOpenTcpConnection": 0
                              }
                          ]
                      }
                  },
                  "children": [
                      {
                          "name": "Microsoft.Azure.Cosmos.Handlers.TelemetryHandler",
                          "duration in milliseconds": 31400.1408,
                          "children": [
                              {
                                  "name": "Microsoft.Azure.Cosmos.Handlers.RetryHandler",
                                  "duration in milliseconds": 31398.8421,
                                  "children": [
                                      {
                                          "name": "Microsoft.Azure.Cosmos.Handlers.RouterHandler",
                                          "duration in milliseconds": 31396.233,
                                          "children": [
                                              {
                                                  "name": "Microsoft.Azure.Cosmos.Handlers.TransportHandler",
                                                  "duration in milliseconds": 31395.5602,
                                                  "children": [
                                                      {
                                                          "name": "Microsoft.Azure.Documents.ServerStoreModel Transport Request",
                                                          "duration in milliseconds": 31392.7365,
                                                          "data": {
                                                              "Client Side Request Stats": {
                                                                  "Id": "AggregatedClientSideRequestStatistics",
                                                                  "ContactedReplicas": [
                                                                      {
                                                                          "Count": 3,
                                                                          "Uri": "rntbd://cdb-ms-test14-northcentralus1-fd1.documents-test.windows-int.net:14300/apps/9fa82d84-46e2-49bf-91e8-ea2a4c416297/services/8d579ef7-1ec8-4ea2-b1e9-874893eb9fd6/partitions/1096a52e-dc9d-497d-a8d5-f8aff61b3b57/replicas/133425533821859690p/"
                                                                      },
                                                                      {
                                                                          "Count": 1,
                                                                          "Uri": "rntbd://cdb-ms-test14-northcentralus1-fd1.documents-test.windows-int.net:14001/apps/9fa82d84-46e2-49bf-91e8-ea2a4c416297/services/8d579ef7-1ec8-4ea2-b1e9-874893eb9fd6/partitions/1096a52e-dc9d-497d-a8d5-f8aff61b3b57/replicas/133425533821859692s/"
                                                                      },
                                                                      {
                                                                          "Count": 1,
                                                                          "Uri": "rntbd://cdb-ms-test14-northcentralus1-fd1.documents-test.windows-int.net:14003/apps/9fa82d84-46e2-49bf-91e8-ea2a4c416297/services/8d579ef7-1ec8-4ea2-b1e9-874893eb9fd6/partitions/1096a52e-dc9d-497d-a8d5-f8aff61b3b57/replicas/133425533713419672s/"
                                                                      },
                                                                      {
                                                                          "Count": 1,
                                                                          "Uri": "rntbd://cdb-ms-test14-northcentralus1-fd1.documents-test.windows-int.net:14302/apps/9fa82d84-46e2-49bf-91e8-ea2a4c416297/services/8d579ef7-1ec8-4ea2-b1e9-874893eb9fd6/partitions/1096a52e-dc9d-497d-a8d5-f8aff61b3b57/replicas/133425533821859691s/"
                                                                      }
                                                                  ],
                                                                  "RegionsContacted": [],
                                                                  "FailedReplicas": [],
                                                                  "AddressResolutionStatistics": [
                                                                      {
                                                                          "StartTimeUTC": "2024-01-09T00:08:34.0155575Z",
                                                                          "EndTimeUTC": "2024-01-09T00:08:47.6666311Z",
                                                                          "TargetEndpoint": "https://dkppaf-northcentralus.documents-test.windows-int.net//addresses/?$resolveFor=dbs%2fUD9beQ%3d%3d%2fcolls%2fUD9beckK-kw%3d%2fdocs&$filter=protocol eq rntbd&$partitionKeyRangeIds=0"
                                                                      }
                                                                  ],
                                                                  "StoreResponseStatistics": [
                                                                      {
                                                                          "ResponseTimeUTC": "2024-01-09T00:08:48.2157795Z",
                                                                          "ResourceType": "Document",
                                                                          "OperationType": "Create",
                                                                          "LocationEndpoint": "https://dkppaf-northcentralus.documents-test.windows-int.net/",
                                                                          "StoreResult": {
                                                                              "ActivityId": "c8467c84-d875-4e3b-9bf0-7cb1b2fe18a4",
                                                                              "StatusCode": "Created",
                                                                              "SubStatusCode": "Unknown",
                                                                              "LSN": 11465,
                                                                              "PartitionKeyRangeId": "0",
                                                                              "GlobalCommittedLSN": 11464,
                                                                              "ItemLSN": -1,
                                                                              "UsingLocalLSN": false,
                                                                              "QuorumAckedLSN": 11464,
                                                                              "SessionToken": "-1#11465",
                                                                              "CurrentWriteQuorum": 3,
                                                                              "CurrentReplicaSetSize": 4,
                                                                              "NumberOfReadRegions": 2,
                                                                              "IsValid": true,
                                                                              "StorePhysicalAddress": "rntbd://cdb-ms-test14-northcentralus1-fd1.documents-test.windows-int.net:14300/apps/9fa82d84-46e2-49bf-91e8-ea2a4c416297/services/8d579ef7-1ec8-4ea2-b1e9-874893eb9fd6/partitions/1096a52e-dc9d-497d-a8d5-f8aff61b3b57/replicas/133425533821859690p/",
                                                                              "RequestCharge": 7.05,
                                                                              "RetryAfterInMs": null,
                                                                              "BELatencyInMs": "5.163",
                                                                              "ReplicaHealthStatuses": [
                                                                                  "(port: 14300 | status: Connected | lkt: 1/9/2024 12:08:48 AM)"
                                                                              ],
                                                                              "transportRequestTimeline": {
                                                                                  "requestTimeline": [
                                                                                      {
                                                                                          "event": "Created",
                                                                                          "startTimeUtc": "2024-01-09T00:08:47.6949569Z",
                                                                                          "durationInMs": 7.3881
                                                                                      },
                                                                                      {
                                                                                          "event": "ChannelAcquisitionStarted",
                                                                                          "startTimeUtc": "2024-01-09T00:08:47.7023450Z",
                                                                                          "durationInMs": 374.6118
                                                                                      },
                                                                                      {
                                                                                          "event": "Pipelined",
                                                                                          "startTimeUtc": "2024-01-09T00:08:48.0769568Z",
                                                                                          "durationInMs": 26.7
                                                                                      },
                                                                                      {
                                                                                          "event": "Transit Time",
                                                                                          "startTimeUtc": "2024-01-09T00:08:48.1036568Z",
                                                                                          "durationInMs": 91.3586
                                                                                      },
                                                                                      {
                                                                                          "event": "Received",
                                                                                          "startTimeUtc": "2024-01-09T00:08:48.1950154Z",
                                                                                          "durationInMs": 12.1271
                                                                                      },
                                                                                      {
                                                                                          "event": "Completed",
                                                                                          "startTimeUtc": "2024-01-09T00:08:48.2071425Z",
                                                                                          "durationInMs": 0
                                                                                      }
                                                                                  ],
                                                                                  "serviceEndpointStats": {
                                                                                      "inflightRequests": 1,
                                                                                      "openConnections": 1
                                                                                  },
                                                                                  "connectionStats": {
                                                                                      "waitforConnectionInit": "True",
                                                                                      "callsPendingReceive": 0,
                                                                                      "lastSendAttempt": "2024-01-09T00:08:47.9935715Z",
                                                                                      "lastSend": "2024-01-09T00:08:47.9945759Z",
                                                                                      "lastReceive": "2024-01-09T00:08:48.0668450Z"
                                                                                  },
                                                                                  "requestSizeInBytes": 1176,
                                                                                  "requestBodySizeInBytes": 727,
                                                                                  "responseMetadataSizeInBytes": 450,
                                                                                  "responseBodySizeInBytes": 935
                                                                              },
                                                                              "TransportException": null
                                                                          }
                                                                      },
                                                                      {
                                                                          "ResponseTimeUTC": "2024-01-09T00:08:48.3154370Z",
                                                                          "ResourceType": "Collection",
                                                                          "OperationType": "Head",
                                                                          "LocationEndpoint": "https://dkppaf-northcentralus.documents-test.windows-int.net/",
                                                                          "StoreResult": {
                                                                              "ActivityId": "c8467c84-d875-4e3b-9bf0-7cb1b2fe18a4",
                                                                              "StatusCode": "NoContent",
                                                                              "SubStatusCode": "Unknown",
                                                                              "LSN": 11465,
                                                                              "PartitionKeyRangeId": "0",
                                                                              "GlobalCommittedLSN": 11464,
                                                                              "ItemLSN": 1,
                                                                              "UsingLocalLSN": false,
                                                                              "QuorumAckedLSN": 11465,
                                                                              "SessionToken": "-1#11465",
                                                                              "CurrentWriteQuorum": 3,
                                                                              "CurrentReplicaSetSize": 4,
                                                                              "NumberOfReadRegions": 2,
                                                                              "IsValid": true,
                                                                              "StorePhysicalAddress": "rntbd://cdb-ms-test14-northcentralus1-fd1.documents-test.windows-int.net:14300/apps/9fa82d84-46e2-49bf-91e8-ea2a4c416297/services/8d579ef7-1ec8-4ea2-b1e9-874893eb9fd6/partitions/1096a52e-dc9d-497d-a8d5-f8aff61b3b57/replicas/133425533821859690p/",
                                                                              "RequestCharge": 0,
                                                                              "RetryAfterInMs": null,
                                                                              "BELatencyInMs": "0.403",
                                                                              "ReplicaHealthStatuses": [
                                                                                  "(port: 14300 | status: Unknown | lkt: 1/9/2024 12:08:47 AM)",
                                                                                  "(port: 14001 | status: Unknown | lkt: 1/9/2024 12:08:47 AM)",
                                                                                  "(port: 14003 | status: Unknown | lkt: 1/9/2024 12:08:47 AM)",
                                                                                  "(port: 14302 | status: Unknown | lkt: 1/9/2024 12:08:47 AM)"
                                                                              ],
                                                                              "transportRequestTimeline": {
                                                                                  "requestTimeline": [
                                                                                      {
                                                                                          "event": "Created",
                                                                                          "startTimeUtc": "2024-01-09T00:08:48.2418862Z",
                                                                                          "durationInMs": 0.0984
                                                                                      },
                                                                                      {
                                                                                          "event": "ChannelAcquisitionStarted",
                                                                                          "startTimeUtc": "2024-01-09T00:08:48.2419846Z",
                                                                                          "durationInMs": 0.3812
                                                                                      },
                                                                                      {
                                                                                          "event": "Pipelined",
                                                                                          "startTimeUtc": "2024-01-09T00:08:48.2423658Z",
                                                                                          "durationInMs": 4.8247
                                                                                      },
                                                                                      {
                                                                                          "event": "Transit Time",
                                                                                          "startTimeUtc": "2024-01-09T00:08:48.2471905Z",
                                                                                          "durationInMs": 67.4608
                                                                                      },
                                                                                      {
                                                                                          "event": "Received",
                                                                                          "startTimeUtc": "2024-01-09T00:08:48.3146513Z",
                                                                                          "durationInMs": 0.1506
                                                                                      },
                                                                                      {
                                                                                          "event": "Completed",
                                                                                          "startTimeUtc": "2024-01-09T00:08:48.3148019Z",
                                                                                          "durationInMs": 0
                                                                                      }
                                                                                  ],
                                                                                  "serviceEndpointStats": {
                                                                                      "inflightRequests": 1,
                                                                                      "openConnections": 1
                                                                                  },
                                                                                  "connectionStats": {
                                                                                      "waitforConnectionInit": "False",
                                                                                      "callsPendingReceive": 0,
                                                                                      "lastSendAttempt": "2024-01-09T00:08:48.1032923Z",
                                                                                      "lastSend": "2024-01-09T00:08:48.1036567Z",
                                                                                      "lastReceive": "2024-01-09T00:08:48.1913953Z"
                                                                                  },
                                                                                  "requestSizeInBytes": 424,
                                                                                  "responseMetadataSizeInBytes": 295
                                                                              },
                                                                              "TransportException": null
                                                                          }
                                                                      },
                                                                      {
                                                                          "ResponseTimeUTC": "2024-01-09T00:08:48.4046844Z",
                                                                          "ResourceType": "Collection",
                                                                          "OperationType": "Head",
                                                                          "LocationEndpoint": "https://dkppaf-northcentralus.documents-test.windows-int.net/",
                                                                          "StoreResult": {
                                                                              "ActivityId": "c8467c84-d875-4e3b-9bf0-7cb1b2fe18a4",
                                                                              "StatusCode": "NoContent",
                                                                              "SubStatusCode": "Unknown",
                                                                              "LSN": 11465,
                                                                              "PartitionKeyRangeId": "0",
                                                                              "GlobalCommittedLSN": 11465,
                                                                              "ItemLSN": 1,
                                                                              "UsingLocalLSN": false,
                                                                              "QuorumAckedLSN": 11465,
                                                                              "SessionToken": "-1#11465",
                                                                              "CurrentWriteQuorum": 3,
                                                                              "CurrentReplicaSetSize": 4,
                                                                              "NumberOfReadRegions": 2,
                                                                              "IsValid": true,
                                                                              "StorePhysicalAddress": "rntbd://cdb-ms-test14-northcentralus1-fd1.documents-test.windows-int.net:14300/apps/9fa82d84-46e2-49bf-91e8-ea2a4c416297/services/8d579ef7-1ec8-4ea2-b1e9-874893eb9fd6/partitions/1096a52e-dc9d-497d-a8d5-f8aff61b3b57/replicas/133425533821859690p/",
                                                                              "RequestCharge": 0,
                                                                              "RetryAfterInMs": null,
                                                                              "BELatencyInMs": "0.177",
                                                                              "ReplicaHealthStatuses": [
                                                                                  "(port: 14300 | status: Unknown | lkt: 1/9/2024 12:08:47 AM)",
                                                                                  "(port: 14001 | status: Unknown | lkt: 1/9/2024 12:08:47 AM)",
                                                                                  "(port: 14003 | status: Unknown | lkt: 1/9/2024 12:08:47 AM)",
                                                                                  "(port: 14302 | status: Unknown | lkt: 1/9/2024 12:08:47 AM)"
                                                                              ],
                                                                              "transportRequestTimeline": {
                                                                                  "requestTimeline": [
                                                                                      {
                                                                                          "event": "Created",
                                                                                          "startTimeUtc": "2024-01-09T00:08:48.3379270Z",
                                                                                          "durationInMs": 0.0068
                                                                                      },
                                                                                      {
                                                                                          "event": "ChannelAcquisitionStarted",
                                                                                          "startTimeUtc": "2024-01-09T00:08:48.3379338Z",
                                                                                          "durationInMs": 0.0075
                                                                                      },
                                                                                      {
                                                                                          "event": "Pipelined",
                                                                                          "startTimeUtc": "2024-01-09T00:08:48.3379413Z",
                                                                                          "durationInMs": 0.2067
                                                                                      },
                                                                                      {
                                                                                          "event": "Transit Time",
                                                                                          "startTimeUtc": "2024-01-09T00:08:48.3381480Z",
                                                                                          "durationInMs": 66.3662
                                                                                      },
                                                                                      {
                                                                                          "event": "Received",
                                                                                          "startTimeUtc": "2024-01-09T00:08:48.4045142Z",
                                                                                          "durationInMs": 0.1483
                                                                                      },
                                                                                      {
                                                                                          "event": "Completed",
                                                                                          "startTimeUtc": "2024-01-09T00:08:48.4046625Z",
                                                                                          "durationInMs": 0
                                                                                      }
                                                                                  ],
                                                                                  "serviceEndpointStats": {
                                                                                      "inflightRequests": 1,
                                                                                      "openConnections": 1
                                                                                  },
                                                                                  "connectionStats": {
                                                                                      "waitforConnectionInit": "False",
                                                                                      "callsPendingReceive": 0,
                                                                                      "lastSendAttempt": "2024-01-09T00:08:48.2470408Z",
                                                                                      "lastSend": "2024-01-09T00:08:48.2471906Z",
                                                                                      "lastReceive": "2024-01-09T00:08:48.3146214Z"
                                                                                  },
                                                                                  "requestSizeInBytes": 424,
                                                                                  "responseMetadataSizeInBytes": 295
                                                                              },
                                                                              "TransportException": null
                                                                          }
                                                                      }
                                                                  ],
                                                                  "HttpResponseStats": [
                                                                      {
                                                                          "StartTimeUTC": "2024-01-09T00:08:17.0721705Z",
                                                                          "DurationInMs": 519.3992,
                                                                          "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/UD9beQ==/colls/UD9beckK-kw=/pkranges",
                                                                          "ResourceType": "PartitionKeyRange",
                                                                          "HttpMethod": "GET",
                                                                          "ActivityId": "7f8bcf09-0acc-41b9-aa31-2c3c6193041b",
                                                                          "ExceptionType": "System.Threading.Tasks.TaskCanceledException",
                                                                          "ExceptionMessage": "The operation was canceled."
                                                                      },
                                                                      {
                                                                          "StartTimeUTC": "2024-01-09T00:08:17.5916268Z",
                                                                          "DurationInMs": 5015.3219,
                                                                          "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/UD9beQ==/colls/UD9beckK-kw=/pkranges",
                                                                          "ResourceType": "PartitionKeyRange",
                                                                          "HttpMethod": "GET",
                                                                          "ActivityId": "7f8bcf09-0acc-41b9-aa31-2c3c6193041b",
                                                                          "ExceptionType": "System.Threading.Tasks.TaskCanceledException",
                                                                          "ExceptionMessage": "The operation was canceled."
                                                                      },
                                                                      {
                                                                          "StartTimeUTC": "2024-01-09T00:08:23.6230150Z",
                                                                          "DurationInMs": 10013.8951,
                                                                          "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net/dbs/UD9beQ==/colls/UD9beckK-kw=/pkranges",
                                                                          "ResourceType": "PartitionKeyRange",
                                                                          "HttpMethod": "GET",
                                                                          "ActivityId": "7f8bcf09-0acc-41b9-aa31-2c3c6193041b",
                                                                          "ExceptionType": "System.Threading.Tasks.TaskCanceledException",
                                                                          "ExceptionMessage": "The operation was canceled."
                                                                      },
                                                                      {
                                                                          "StartTimeUTC": "2024-01-09T00:08:33.6579613Z",
                                                                          "DurationInMs": 165.3261,
                                                                          "RequestUri": "https://dkppaf-eastasia.documents-test.windows-int.net/dbs/UD9beQ==/colls/UD9beckK-kw=/pkranges",
                                                                          "ResourceType": "PartitionKeyRange",
                                                                          "HttpMethod": "GET",
                                                                          "ActivityId": "e9ad8c72-40b6-4528-bd30-732f45ca474e",
                                                                          "StatusCode": "OK"
                                                                      },
                                                                      {
                                                                          "StartTimeUTC": "2024-01-09T00:08:33.8354862Z",
                                                                          "DurationInMs": 157.063,
                                                                          "RequestUri": "https://dkppaf-eastasia.documents-test.windows-int.net/dbs/UD9beQ==/colls/UD9beckK-kw=/pkranges",
                                                                          "ResourceType": "PartitionKeyRange",
                                                                          "HttpMethod": "GET",
                                                                          "ActivityId": "bda47b29-44e7-4340-b4a2-670868a72096",
                                                                          "StatusCode": "NotModified",
                                                                          "ReasonPhrase": "Not Modified"
                                                                      },
                                                                      {
                                                                          "StartTimeUTC": "2024-01-09T00:08:34.0172521Z",
                                                                          "DurationInMs": 503.2223,
                                                                          "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net//addresses/?$resolveFor=dbs%2fUD9beQ%3d%3d%2fcolls%2fUD9beckK-kw%3d%2fdocs&$filter=protocol eq rntbd&$partitionKeyRangeIds=0",
                                                                          "ResourceType": "Document",
                                                                          "HttpMethod": "GET",
                                                                          "ActivityId": "c8467c84-d875-4e3b-9bf0-7cb1b2fe18a4",
                                                                          "ExceptionType": "System.Threading.Tasks.TaskCanceledException",
                                                                          "ExceptionMessage": "The operation was canceled."
                                                                      },
                                                                      {
                                                                          "StartTimeUTC": "2024-01-09T00:08:34.5205260Z",
                                                                          "DurationInMs": 5012.3242,
                                                                          "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net//addresses/?$resolveFor=dbs%2fUD9beQ%3d%3d%2fcolls%2fUD9beckK-kw%3d%2fdocs&$filter=protocol eq rntbd&$partitionKeyRangeIds=0",
                                                                          "ResourceType": "Document",
                                                                          "HttpMethod": "GET",
                                                                          "ActivityId": "c8467c84-d875-4e3b-9bf0-7cb1b2fe18a4",
                                                                          "ExceptionType": "System.Threading.Tasks.TaskCanceledException",
                                                                          "ExceptionMessage": "The operation was canceled."
                                                                      },
                                                                      {
                                                                          "StartTimeUTC": "2024-01-09T00:08:40.5337065Z",
                                                                          "DurationInMs": 7131.9352,
                                                                          "RequestUri": "https://dkppaf-northcentralus.documents-test.windows-int.net//addresses/?$resolveFor=dbs%2fUD9beQ%3d%3d%2fcolls%2fUD9beckK-kw%3d%2fdocs&$filter=protocol eq rntbd&$partitionKeyRangeIds=0",
                                                                          "ResourceType": "Document",
                                                                          "HttpMethod": "GET",
                                                                          "ActivityId": "c8467c84-d875-4e3b-9bf0-7cb1b2fe18a4",
                                                                          "StatusCode": "OK"
                                                                      }
                                                                  ]
                                                              }
                                                          }
                                                      }
                                                  ]
                                              }
                                          ]
                                      }
                                  ]
                              }
                          ]
                      }
                  ]
              }
          ]
      },
      {
          "name": "Response Serialization",
          "duration in milliseconds": 1.1941
      }
  ]
}

Known Issues:

Currently we have observed that the GET Document calls to fetch the address intermittently fails, when the master partition is in complete quorum loss. This is a known issue, and the Routing Gateway team is currently investigating this.

The text was updated successfully, but these errors were encountered:

kundadebdatta self-assigned this Nov 9, 2023

microsoft-github-policy-service bot added the needs-investigation label Nov 9, 2023

kundadebdatta added PerPartitionAutomaticFailover and removed needs-investigation labels Nov 9, 2023

kundadebdatta added this to Azure Cosmos SDKs Nov 9, 2023

kundadebdatta moved this to In Progress in Azure Cosmos SDKs Nov 9, 2023

kundadebdatta mentioned this issue Nov 9, 2023

Gallium Semester Tasks #3850

Open

27 tasks

kundadebdatta changed the title ~~[Per Partition Automatic Failover] Retry on Next Preferred Region For Gateway Reads If SDK is Timing Out While Connecting to Primary Region~~ [Per Partition Automatic Failover] Retry on Next Preferred Region For Metadata Reads Gateway Timeouts Nov 9, 2023

kundadebdatta mentioned this issue Dec 8, 2023

[Internal] Per Partition Automatic Failover: Fixes Metadata Requests Retry Policy #4205

Merged

1 task

microsoft-github-policy-service bot closed this as completed in #4205 Dec 30, 2023

github-project-automation bot moved this from In Progress to Done in Azure Cosmos SDKs Dec 30, 2023

kundadebdatta mentioned this issue Jan 9, 2024

Germanium Semester Tasks #4246

Open

16 tasks

kundadebdatta mentioned this issue Jan 25, 2024

[Internal] Researching capabilities to support PPAF for .NET, Java SDK and Compute Gateway #3499

Closed

10 tasks

kundadebdatta mentioned this issue Jun 28, 2024

[Per Partition Automatic Failover] Failover Partition on Next region when RG Fails with HttpRequestException #4564

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Per Partition Automatic Failover] Retry on Next Preferred Region For Metadata Reads Gateway Timeouts #4181

[Per Partition Automatic Failover] Retry on Next Preferred Region For Metadata Reads Gateway Timeouts #4181

kundadebdatta commented Nov 9, 2023 •

edited

Loading

[Per Partition Automatic Failover] Retry on Next Preferred Region For Metadata Reads Gateway Timeouts #4181

[Per Partition Automatic Failover] Retry on Next Preferred Region For Metadata Reads Gateway Timeouts #4181

Comments

kundadebdatta commented Nov 9, 2023 • edited Loading

Background:

Current Behavior:

Expected Behavior/ Acceptance Criteria:

Scope:

High Level Changes:

Design Approach:

Known Issues:

kundadebdatta commented Nov 9, 2023 •

edited

Loading