Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: Fixes TaskCanceledException being converted to InternalServerError and losing diagnostics #2359

Closed
wants to merge 8 commits into from

Conversation

bchong95
Copy link
Contributor

@bchong95 bchong95 commented Apr 1, 2021

Tracing: Adds support for CancellationTokens on Feed paths.

Description

The SDK has a soft contract where CancellationTokens throw a CosmosOperationCancelledException which is an OperationCancellationTokenException with CosmosDiagnostics. This PR adds baseline tests, so we can assert the behavior. An example of the tracing on an cancellation path looks likes so:

└── Root Trace(fa9634c1-23f0-4f4f-9385-562254d5882b)  Unknown-Component  GetRootTrace@Trace.cs:97  03:24:17:442  138.15 milliseconds  
    │   (
    │       [Operation Cancelled Exception]
    │       System.OperationCanceledException: The operation was canceled.
    │          at System.Threading.CancellationToken.ThrowOperationCanceledException()
    │          at Microsoft.Azure.Cosmos.Common.CollectionCache.ResolveByNameAsync(String apiVersion, String resourceAddress, Boolean forceRefesh, CancellationToken cancellationToken, ITrace trace) in C:\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\Routing\CollectionCache.cs:line 257
    │          at Microsoft.Azure.Cosmos.Common.CollectionCache.ResolveCollectionAsync(DocumentServiceRequest request, CancellationToken cancellationToken, ITrace trace) in C:\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\Routing\CollectionCache.cs:line 145
    │          at Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler.SendAsync(String resourceUriString, ResourceType resourceType, OperationType operationType, RequestOptions requestOptions, ContainerInternal cosmosContainerCore, FeedRange feedRange, Stream streamPayload, Action`1 requestEnricher, ITrace trace, CancellationToken cancellationToken) in C:\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\Handler\RequestInvokerHandler.cs:line 201
    │          at Microsoft.Azure.Cosmos.Pagination.NetworkAttachedDocumentContainer.MonadicReadFeedAsync(FeedRangeState`1 feedRangeState, ReadFeedPaginationOptions readFeedPaginationOptions, ITrace trace, CancellationToken cancellationToken) in C:\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\Pagination\NetworkAttachedDocumentContainer.cs:line 171
    │          at Microsoft.Azure.Cosmos.Pagination.PartitionRangePageAsyncEnumerator`2.MoveNextAsync(ITrace trace) in C:\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\Pagination\PartitionRangePageAsyncEnumerator.cs:line 56
    │          at Microsoft.Azure.Cosmos.Pagination.CrossPartitionRangePageAsyncEnumerator`2.MoveNextAsync(ITrace trace) in C:\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\Pagination\CrossPartitionRangePageAsyncEnumerator.cs:line 126
    │          at Microsoft.Azure.Cosmos.ReadFeed.Pagination.CrossPartitionReadFeedAsyncEnumerator.MoveNextAsync(ITrace trace) in C:\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\ReadFeed\Pagination\CrossPartitionReadFeedAsyncEnumerator.cs:line 43
    │          at Microsoft.Azure.Cosmos.ReadFeed.ReadFeedIteratorCore.ReadNextAsync(ITrace trace, CancellationToken cancellationToken) in C:\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\ReadFeed\ReadFeedIteratorCore.cs:line 216
    │   )
    ├── MoveNextAsync(cb8de160-b5a8-4279-abab-a84ea2786f01)  ReadFeed-Component  MoveNextAsync@CrossPartitionReadFeedAsyncEnumerator.cs:41  03:24:17:447  121.39 milliseconds  
    │   └── MoveNextAsync(ecee72d2-9e81-43c6-93a0-a73e1403ce62)  Pagination-Component  MoveNextAsync@CrossPartitionRangePageAsyncEnumerator.cs:112  03:24:17:451  117.27 milliseconds  
    │       ├── [,FF) move next(58796559-1c67-4dcc-bec7-e2f35773ec7c)  Pagination-Component  MoveNextAsync@PartitionRangePageAsyncEnumerator.cs:49  03:24:17:464  17.16 milliseconds  
    │       │   └── Prefetch(ccee5e65-7572-4132-8cdd-7536b4e0b36f)  Pagination-Component  PrefetchAsync@BufferedPartitionRangePageAsyncEnumerator.cs:50  03:24:17:466  14.39 milliseconds  
    │       │       └── [,FF) move next(7442d072-4aa8-4b34-a56f-0a8bbc4abc53)  Pagination-Component  MoveNextAsync@PartitionRangePageAsyncEnumerator.cs:49  03:24:17:467  13.08 milliseconds  
    │       │           └── Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler(907cf401-8eec-4b6a-b49c-0ff6c488d8c3)  RequestHandler-Component  SendAsync@RequestInvokerHandler.cs:148  03:24:17:468  6.76 milliseconds  
    │       │               ├── Get Partition Key Range Cache(a604247c-d18c-4708-a59e-fc0703741263)  Routing-Component  GetPartitionKeyRangeCacheAsync@DocumentClient.cs:551  03:24:17:469  0.00 milliseconds  
    │       │               ├── Get Collection Cache(c95a73d9-e782-4ce2-90b5-51fa48c80afe)  Routing-Component  GetCollectionCacheAsync@DocumentClient.cs:542  03:24:17:469  0.00 milliseconds  
    │       │               └── Try Get Overlapping Ranges(7a5626ab-76da-4080-b303-31741fa81756)  Routing-Component  TryGetOverlappingRangesAsync@PartitionKeyRangeCache.cs:49  03:24:17:470  3.50 milliseconds  
    │       ├── Get Container Properties(e2efd643-219d-4b2b-bfa4-55fd52dc7696)  Transport-Component  GetCachedContainerPropertiesAsync@ClientContextCore.cs:363  03:24:17:484  0.03 milliseconds  
    │       │   └── Get Collection Cache(7b5e4d45-18bf-4214-9916-8fd4a0cb73a2)  Routing-Component  GetCollectionCacheAsync@DocumentClient.cs:542  03:24:17:484  0.00 milliseconds  
    │       ├── Get RID(4b20bdb4-2849-4062-a278-1f72d434d637)  Routing-Component  GetCachedRIDAsync@ContainerCore.cs:436  03:24:17:485  0.01 milliseconds  
    │       ├── Get Collection Cache(bfe65c11-f10f-41d4-9978-9b0ea0f298ef)  Routing-Component  GetCollectionCacheAsync@DocumentClient.cs:542  03:24:17:485  0.00 milliseconds  
    │       ├── Get Overlapping Feed Ranges(55861794-61f0-444b-8feb-3ebc0c794216)  Routing-Component  GetTargetPartitionKeyRangeByFeedRangeAsync@CosmosQueryClientCore.cs:207  03:24:17:486  6.44 milliseconds  
    │       │   └── Get Partition Key Ranges(45fe734f-f63c-4b00-94a9-04139dd24fd3)  Routing-Component  GetTargetPartitionKeyRangesAsync@CosmosQueryClientCore.cs:240  03:24:17:488  4.19 milliseconds  
    │       │       └── Try Get Overlapping Ranges(ef1d629e-c7e9-484d-9723-a962999e7030)  Routing-Component  TryGetOverlappingRangesAsync@PartitionKeyRangeCache.cs:49  03:24:17:492  0.06 milliseconds  
    │       └── MoveNextAsync(c6b69b92-bb33-43a6-b4bd-2c644d5e64f8)  Pagination-Component  MoveNextAsync@CrossPartitionRangePageAsyncEnumerator.cs:112  03:24:17:497  71.50 milliseconds  
    │           └── [,05C1CFFFFFFFF8) move next(058e3329-e35d-4e1a-95de-c027e30bdac4)  Pagination-Component  MoveNextAsync@PartitionRangePageAsyncEnumerator.cs:49  03:24:17:497  67.11 milliseconds  
    │               └── Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler(7835b22d-d7ab-4820-9680-dc5bb84be01b)  RequestHandler-Component  SendAsync@RequestInvokerHandler.cs:148  03:24:17:497  38.81 milliseconds  
    │                   ├── Get Partition Key Range Cache(a2e0986f-f092-432a-ab42-281fe189f622)  Routing-Component  GetPartitionKeyRangeCacheAsync@DocumentClient.cs:551  03:24:17:497  0.00 milliseconds  
    │                   ├── Get Collection Cache(f528680c-90ae-40d9-8b73-71c13ccb907d)  Routing-Component  GetCollectionCacheAsync@DocumentClient.cs:542  03:24:17:497  0.00 milliseconds  
    │                   ├── Try Get Overlapping Ranges(08926340-ab2e-4ef7-a870-571f4cc68dd9)  Routing-Component  TryGetOverlappingRangesAsync@PartitionKeyRangeCache.cs:49  03:24:17:497  0.03 milliseconds  
    │                   └── Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler(5a405249-73d4-4e23-9253-74577f8888f1)  RequestHandler-Component  SendAsync@RequestInvokerHandler.cs:51  03:24:17:501  35.02 milliseconds  
    │                       ├── Get Collection Cache(2f927fe2-3e7a-46b2-bc20-63c1e7fabe3e)  Routing-Component  GetCollectionCacheAsync@DocumentClient.cs:542  03:24:17:503  0.00 milliseconds  
    │                       └── Send Async(351e657d-7f2b-41b2-9446-d813b8cd3027)  RequestHandler-Component  SendAsync@RequestHandler.cs:50  03:24:17:504  31.84 milliseconds  
    │                           │   (
    │                           │       [CPU Load History]
    │                           │       (2021-04-01T15:24:14.3548650Z 62.500)
    │                           │   )
    │                           └── Send Async(694070e3-073e-47c8-995f-cdfc34022e77)  RequestHandler-Component  SendAsync@RequestHandler.cs:50  03:24:17:504  31.84 milliseconds  
    │                               └── Send Async(f085971a-f8f8-4617-916b-453426dd1c6d)  RequestHandler-Component  SendAsync@AbstractRetryHandler.cs:23  03:24:17:504  31.84 milliseconds  
    │                                   └── Send Async(4dff0480-bc34-41dd-a63a-0b56161cbf94)  RequestHandler-Component  SendAsync@RequestHandler.cs:50  03:24:17:504  31.83 milliseconds  
    │                                       └── Microsoft.Azure.Documents.ServerStoreModel Transport Request(6cf8550a-74e4-481d-bb3e-f9024baa380f)  Transport-Component  ProcessMessageAsync@TransportHandler.cs:87  03:24:17:504  31.79 milliseconds  
    │                                               (
    │                                                   [User Agent]
    │                                                   cosmos-netstandard-sdk/3.17.0|3.17.1|01|X64|Microsoft Windows 10.0.19042 |.NET Core 4.6.29812.02|
    │                                                   [Client Side Request Stats]
    │                                                   Start Time: 03:24:17:504
    │                                                   End Time: 03:24:17:534
    │                                                   Contacted Replicas
    │                                                     rntbd://127.0.0.1:10253/apps/DocDbApp/services/DocDbServer0/partitions/a4cb494c-38c8-11e6-8106-8cdcd42c33be/replicas/1p/: 1
    │                                                   Failed to Contact Replicas
    │                                                   Regions Contacted
    │                                                     rntbd://127.0.0.1:10253/apps/DocDbApp/services/DocDbServer0/partitions/a4cb494c-38c8-11e6-8106-8cdcd42c33be/replicas/1p/
    │                                                   Address Resolution Statistics
    │                                                   ┌────────────────┬──────────────┬──────────────────────────────────────────────────┐
    │                                                   │Start Time (utc)│End Time (utc)│Endpoint                                          │
    │                                                   ├────────────────┼──────────────┼──────────────────────────────────────────────────┤
    │                                                   └────────────────┴──────────────┴──────────────────────────────────────────────────┘
    │                                                   Store Response Statistics
    │                                                     Start Time: 03:24:17:508
    │                                                     End Time: 03:24:17:534
    │                                                     Resource Type: Document
    │                                                     Operation Type: ReadFeed
    │                                                     Store Result
    │                                                       Activity Id: 15522c24-f97f-4962-be9c-1823ddba1fa2
    │                                                       Store Physical Address: rntbd://127.0.0.1:10253/apps/DocDbApp/services/DocDbServer0/partitions/a4cb494c-38c8-11e6-8106-8cdcd42c33be/replicas/1p/
    │                                                       Status Code: Ok/Unknown
    │                                                       Is Valid: True
    │                                                       LSN Info
    │                                                         LSN: 51
    │                                                         Item LSN: -1
    │                                                         Global LSN: -1
    │                                                         Quorum Acked LSN: 51
    │                                                         Using LSN: True
    │                                                       Session Token: -1#51
    │                                                       Quorum Info
    │                                                         Current Replica Set Size: 1
    │                                                         Current Write Quorum: 1
    │                                                       Is Client CPU Overloaded: False
    │                                                       Exception
    │                                               )
    └── MoveNextAsync(856686f7-3041-4451-9ec6-ffbd86986811)  ReadFeed-Component  MoveNextAsync@CrossPartitionReadFeedAsyncEnumerator.cs:41  03:24:17:579  1.35 milliseconds  
        └── MoveNextAsync(061d861e-57e5-4068-b8a8-fa3f371cc8d1)  Pagination-Component  MoveNextAsync@CrossPartitionRangePageAsyncEnumerator.cs:112  03:24:17:579  1.32 milliseconds  
            └── [,05C1CFFFFFFFF8) move next(2eee26bf-dde7-42cd-a351-dab1b9230536)  Pagination-Component  MoveNextAsync@PartitionRangePageAsyncEnumerator.cs:49  03:24:17:579  1.15 milliseconds  
                └── Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler(9872645f-a19d-4763-b993-6b76cf6d1234)  RequestHandler-Component  SendAsync@RequestInvokerHandler.cs:148  03:24:17:579  0.24 milliseconds  
                    ├── Get Partition Key Range Cache(84783cd1-fa67-4ad2-932e-0c53c7649d6c)  Routing-Component  GetPartitionKeyRangeCacheAsync@DocumentClient.cs:551  03:24:17:579  0.00 milliseconds  
                    └── Get Collection Cache(0866c559-849d-4ab0-8acf-8d78b6609f97)  Routing-Component  GetCollectionCacheAsync@DocumentClient.cs:542  03:24:17:579  0.00 milliseconds  

Which shows the OperationCancelledException and the last methods the request made it to before getting cancelled.

closes ##2065 Introduced in 3.14.0 in #1812

sboshra
sboshra previously approved these changes Apr 1, 2021
Copy link
Contributor

@sboshra sboshra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

neildsh
neildsh previously approved these changes Apr 22, 2021
@j82w
Copy link
Contributor

j82w commented Apr 22, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@neildsh
Copy link
Contributor

neildsh commented Apr 22, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@j82w
Copy link
Contributor

j82w commented Apr 22, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@j82w j82w changed the title Tracing: Adds support for CancellationTokens on Feed paths. Query: Fixes exception losing diagnostics and incorrectly converting operation cancelled exceptions to internal server errors Apr 23, 2021
@j82w
Copy link
Contributor

j82w commented Apr 23, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@j82w
Copy link
Contributor

j82w commented Apr 23, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@j82w j82w changed the title Query: Fixes exception losing diagnostics and incorrectly converting operation cancelled exceptions to internal server errors Query: Fixes TaskCanceledException being converted to InternalServerError and losing diagnostics Apr 23, 2021
@j82w
Copy link
Contributor

j82w commented Apr 23, 2021

This PR is moved to #2424

@j82w j82w closed this Apr 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants