You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add PkRanges request CT for pkRanges in the diagnostics to help investigate the following exceptions/scenarios.
Request failed at:
{""Id"":""PointOperationStatistics"",""ActivityId"":""e035d1b0-7646-4a09-97cc-002534d7b4c4"",""ResponseTimeUtc"":""2022-04-28T17:18:49.2581143Z"",""StatusCode"":404,""SubStatusCode"":0,""RequestCharge"":0,""RequestUri"":""dbs/usersettings/colls/usersettings"",""ErrorMessage"":""Microsoft.Azure.Documents.NotFoundException: Entity with the specified id does not exist in the system. More info: https://aka.ms/cosmosdb-tsg-not-found\r\nActivityId: e035d1b0-7646-4a09-97cc-002534d7b4c4, Microsoft.Azure.Cosmos.Tracing.TraceData.ClientSideRequestStatisticsTraceDatum, Windows/10.0.17763 cosmos-netstandard-sdk/3.19.3\r\n at Microsoft.Azure.Cosmos.AddressResolver.<ResolveAddressesAndIdentityAsync>d__12.MoveNext()\r\n--- End of stack trace from previous location where exception was thrown ---\r\n at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()\r\n at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n at Microsoft.Azure.Cosmos.AddressResolver.<ResolveAsync>d__9.MoveNext()\r\n--- End of stack trace from previous location where exception was thrown ---\r\n at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()\r\n at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n at Microsoft.Azure.Cosmos.Routing.GlobalAddressResolver.<ResolveAsync>d__14.MoveNext()\r\n--- End of stack trace from previous location where exception was thrown ---\r\n at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()\r\n at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n at Microsoft.Azure.Documents.AddressSelector.<ResolveAddressesAsync>d__5.MoveNext()\r\n--- End of stack trace from previous location where exception was thrown ---\r\n at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()\r\n at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n at Microsoft.Azure.Documents.AddressSelector.<ResolveAllTransportAddressUriAsync>d__3.MoveNext()\r\n--- End of stack trace from previous location where exception was thrown ---\r\n at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()\r\n at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n at Microsoft.Azure.Documents.StoreReader.<ReadMultipleReplicasInternalAsync>d__12.MoveNext()\r\n--- End of stack trace from previous location where exception was thrown ---\r\n at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()\r\n at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n at Microsoft.Azure.Documents.StoreReader.<ReadMultipleReplicaAsync>d__10.MoveNext()\r\n--- End of stack trace from previous location where exception was thrown ---\r\n at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()\r\n at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n at Microsoft.Azure.Documents.ConsistencyReader.<ReadSessionAsync>d__13.MoveNext()\r\n--- End of stack trace from previous location where exception was thrown ---\r\n at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()\r\n at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n at Microsoft.Azure.Documents.BackoffRetryUtility`1.<ExecuteRetryAsync>d__5.MoveNext()\r\n--- End of stack trace from previous location where exception was thrown ---\r\n at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()\r\n at Microsoft.Azure.Documents.ShouldRetryResult.ThrowIfDoneTrying(ExceptionDispatchInfo capturedException)\r\n at Microsoft.Azure.Documents.BackoffRetryUtility`1.<ExecuteRetryAsync>d__5.MoveNext()\r\n--- End of stack trace from previous location where exception was thrown ---\r\n at
Request flow is:
Will use pkRangeId 125 as exmaple. First of all, when customer is using direct mode, in order for SDK to find which server to send the request to, there are few critical information we need to get back from gateway, one is pkRanges, one is addresses for a certain partition.
- At some time, SDK get pkRanges list from gateway which includes 125
- Split happend for pkRange 125
- Request come in, SDK use the existing pkRanges from the cache to resolve which partition the request should be routed to, which resulted as 125
- SDK trying to get address list from gateway for pkrange 125. But gateway encountered ServiceFabricNotFoundException because the service has been deleted as part of the split process, so gateway return empty list in this case
- Since it is empty list, SDK has tried to refresh its internal status, including to get any latest changes of pkRanges from gateway. However, SDK get NotModified result back from gateway
- Step #3 and #4 got repeated, and then NotFoundException returned.
Due to few informations missing in the current diagnostics, we are not able to reason about what were the updates to the pkranges cache in client side, why we are getting NotModified from gateway team and why SDK has tried to get addresses for pkRange 125 again (instead of the new child ranges).
Based on the investigation above, there are two piece information will be helpful for the investigation in the future.
xinlian12
changed the title
Add ContinuationToken for pkRanges request in Diagnostics
[Diagnostics]Add ContinuationToken for pkRanges request in Diagnostics
May 2, 2022
Add PkRanges request CT for pkRanges in the diagnostics to help investigate the following exceptions/scenarios.
Request failed at:
Request flow is:
Due to few informations missing in the current diagnostics, we are not able to reason about what were the updates to the pkranges cache in client side, why we are getting NotModified from gateway team and why SDK has tried to get addresses for pkRange 125 again (instead of the new child ranges).
Based on the investigation above, there are two piece information will be helpful for the investigation in the future.
The text was updated successfully, but these errors were encountered: