Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query performance issues between 3.9.0 and 3.17.0 #2270

Closed
dpiessens opened this issue Mar 1, 2021 · 25 comments
Closed

Query performance issues between 3.9.0 and 3.17.0 #2270

dpiessens opened this issue Mar 1, 2021 · 25 comments

Comments

@dpiessens
Copy link
Contributor

dpiessens commented Mar 1, 2021

Describe the bug
We're upgrading from 3.9.0 to 3.17.0-preview on a performance intensive product. We've noticed a 23 ms per call increase in query times between the 2 SDKs.

To Reproduce
Upgraded from 3.9.0 to 3.17.0-preview. Read the release notes on all releases to follow the upgrade notes. Added the following client performance items:
`
.WithConnectionModeDirect()

options.PortReuseMode = PortReuseMode.PrivatePortPool;
options.IdleTcpConnectionTimeout = TimeSpan.FromHours(1);
options.AllowBulkExecution = false;
`

Also converted query code from:

`
query.ToStreamIterator();

to
container.GetItemQueryStreamIterator(query.ToQueryDefinition(), null, new QueryRequestOptions { MaxConcurrency = -1, MaxBufferedItemCount = -1 }))
`

Expected behavior
Performance difference between the two versions to be nominal.

Actual behavior
Ran load tests and observed query durations take 20 ms longer with newer APIs and changes.

Environment summary
SDK Version: 3.17.0-preview
OS Version Linux containers

Additional context
Below is a graph of performance comparisons:

image

Yellow box is the 3.17.0-preview library upgrade with no code changes,
Green box is 3.9.0 pre-upgrade
Red box is 3.17.0-preview with performance related code changes made.

@j82w
Copy link
Contributor

j82w commented Mar 1, 2021

@sboshra and/or @bchong95 can you please take a look?

@timsander1
Copy link
Contributor

Thanks for creating this issue @dpiessens! Could you include the diagnostics for a slow query for both 3.8 and 3.17-preview?

@j82w
Copy link
Contributor

j82w commented Mar 2, 2021

@dpiessens if you can also provide a way to reproduce the issue it would be very helpful to root cause the issue.

@dpiessens dpiessens changed the title Query performance issues between 3.8.0 and 3.17.0 Query performance issues between 3.9.0 and 3.17.0 Mar 2, 2021
@dpiessens
Copy link
Contributor Author

dpiessens commented Mar 2, 2021

Apologies it was 3.9.0 not 3.8.0. Here's an example of the baseline diagnostics, I did redact URLs and container names.

Here's a trace from 3.9.0

{
    "DiagnosticVersion": "2",
    "Summary": {
        "StartUtc": "2021-03-02T14:06:20.4003215Z",
        "TotalElapsedTimeInMs": 11.7576,
        "UserAgent": "cosmos-netstandard-sdk/3.9.0|3.9.0|30681|X64|Linux 5.4.0-1031-azure 32 18.|.NET 5.0.3|",
        "TotalRequestCount": 2,
        "FailedRequestCount": 0
    },
    "Context": [{
        "Id": "CreateQueryPipeline",
        "ElapsedTimeInMs": 2.0086
    }, {
        "Id": "Microsoft.Azure.Cosmos.Handlers.RetryHandler",
        "HandlerElapsedTimeInMs": 1.8598
    }, {
        "Id": "Microsoft.Azure.Cosmos.Handlers.RouterHandler",
        "HandlerElapsedTimeInMs": 1.8553
    }, {
        "Id": "Microsoft.Azure.Cosmos.Handlers.TransportHandler",
        "HandlerElapsedTimeInMs": 0.2206
    }, {
        "Id": "AggregatedClientSideRequestStatistics",
        "ContactedReplicas": [],
        "RegionsContacted": [],
        "FailedReplicas": []
    }, {
        "Id": "Microsoft.Azure.Cosmos.GatewayStoreModel",
        "ElapsedTimeInMs": 1.7965
    }, {
        "Id": "GatewayRequestTime",
        "ElapsedTimeInMs": 1.6982
    }, {
        "Id": "PointOperationStatistics",
        "ActivityId": null,
        "ResponseTimeUtc": "2021-03-02T14:06:20.4022321Z",
        "StatusCode": 200,
        "SubStatusCode": 0,
        "RequestCharge": 0.0,
        "RequestUri": "db/reveal/colls/*******",
        "RequestSessionToken": null,
        "ResponseSessionToken": null
    }, {
        "PKRangeId": "94",
        "StartUtc": "2021-03-02T14:06:20.4023608Z",
        "QueryMetric": "totalExecutionTimeInMs=3.33;queryCompileTimeInMs=0.39;queryLogicalPlanBuildTimeInMs=0.11;queryPhysicalPlanBuildTimeInMs=0.37;queryOptimizationTimeInMs=0.02;VMExecutionTimeInMs=2.09;indexLookupTimeInMs=0.89;documentLoadTimeInMs=0.65;systemFunctionExecuteTimeInMs=0.00;userFunctionExecuteTimeInMs=0.00;retrievedDocumentCount=31;retrievedDocumentSize=245648;outputDocumentCount=31;outputDocumentSize=15401;writeOutputTimeInMs=0.10;indexUtilizationRatio=1.00",
        "IndexUtilization": "",
        "ClientCorrelationId": "02b10690-70c9-4485-b222-81b5c570298c",
        "Context": [{
            "Id": "AggregatedClientSideRequestStatistics",
            "ContactedReplicas": [],
            "RegionsContacted": ["https://**********.documents.azure.com:443/"],
            "FailedReplicas": []
        }, {
            "Id": "Microsoft.Azure.Cosmos.Handlers.RetryHandler",
            "HandlerElapsedTimeInMs": 9.1108
        }, {
            "Id": "Microsoft.Azure.Cosmos.Handlers.RouterHandler",
            "HandlerElapsedTimeInMs": 9.1067
        }, {
            "Id": "Microsoft.Azure.Cosmos.Handlers.TransportHandler",
            "HandlerElapsedTimeInMs": 0.201
        }, {
            "Id": "Microsoft.Azure.Documents.ServerStoreModel",
            "ElapsedTimeInMs": 9.0283
        }, {
            "Id": "StoreResponseStatistics",
            "StartTimeUtc": "2021-03-02T14:06:20.4024462Z",
            "ResponseTimeUtc": "2021-03-02T14:06:20.4113962Z",
            "ElapsedTimeInMs": 8.95,
            "ResourceType": "Document",
            "OperationType": "Query",
            "LocationEndpoint": "https://**********.documents.azure.com:443/",
            "StoreResult": "StorePhysicalAddress: rntbd:/***********.documents.azure.com:14071/apps/1b020ad4-35b5-405c-817f-54701bf5cd52/services/acd83627-ccde-40f9-8ca4-261837813e6c/partitions/3c3faf19-dcc9-4f08-b65f-41625e86bc59/replicas/132587311508741895s/, LSN: 3711341, GlobalCommittedLsn: 3711340, PartitionKeyRangeId: 94, IsValid: True, StatusCode: 200, SubStatusCode: 0, RequestCharge: 7.41, ItemLSN: -1, SessionToken: -1#3711341, UsingLocalLSN: True, TransportException: null"
        }]
    }]
}

Dumping the trace in 3.17.0-preview gives me this error:

System.ArgumentNullException: Value cannot be null. (Parameter 'chars')
at System.Text.UTF8Encoding.GetByteCount(String chars)
at Microsoft.Azure.Cosmos.Json.JsonWriter.WriteStringValue(String value)
at Microsoft.Azure.Cosmos.Tracing.TraceWriter.TraceDatumJsonWriter.Visit(PointOperationStatisticsTraceDatum pointOperationStatisticsTraceDatum)
at Microsoft.Azure.Cosmos.Tracing.TraceWriter.WriteTraceDatum(IJsonWriter writer, Object value)
at Microsoft.Azure.Cosmos.Tracing.TraceWriter.TraceJsonWriter.WriteTrace(IJsonWriter writer, ITrace trace)
at Microsoft.Azure.Cosmos.Tracing.TraceWriter.TraceJsonWriter.WriteTrace(IJsonWriter writer, ITrace trace)
at Microsoft.Azure.Cosmos.Tracing.TraceWriter.TraceJsonWriter.WriteTrace(IJsonWriter writer, ITrace trace)
at Microsoft.Azure.Cosmos.Tracing.TraceWriter.TraceJsonWriter.WriteTrace(IJsonWriter writer, ITrace trace)
at Microsoft.Azure.Cosmos.Tracing.TraceWriter.TraceJsonWriter.WriteTrace(IJsonWriter writer, ITrace trace)
at Microsoft.Azure.Cosmos.Tracing.TraceWriter.TraceJsonWriter.WriteTrace(IJsonWriter writer, ITrace trace)
at Microsoft.Azure.Cosmos.Tracing.TraceWriter.TraceJsonWriter.WriteTrace(IJsonWriter writer, ITrace trace)
at Microsoft.Azure.Cosmos.Tracing.TraceWriter.TraceJsonWriter.WriteTrace(IJsonWriter writer, ITrace trace)
at Microsoft.Azure.Cosmos.Tracing.TraceWriter.TraceJsonWriter.WriteTrace(IJsonWriter writer, ITrace trace)
at Microsoft.Azure.Cosmos.Diagnostics.CosmosTraceDiagnostics.WriteTraceToJsonWriter(JsonSerializationFormat jsonSerializationFormat)
at Microsoft.Azure.Cosmos.Diagnostics.CosmosTraceDiagnostics.ToJsonString()

@dpiessens
Copy link
Contributor Author

I was still able to get overall client elapsed time and it's between 22 and 27 ms for 3.17.0

@dpiessens
Copy link
Contributor Author

@dpiessens if you can also provide a way to reproduce the issue it would be very helpful to root cause the issue.

I'd like to try to get traces from both first. This code is at the core of a lot of moving parts and would be a multi-day project to create an example that replicates what it does. If the traces don't point to the issue then I'll be happy to create one.

@j82w
Copy link
Contributor

j82w commented Mar 2, 2021

I'm working on a 3.17 release #2247 which should be done today. This new releases will contain the fix for the ArgumentNullException.

@j82w
Copy link
Contributor

j82w commented Mar 2, 2021

@dpiessens 3.17.0-preview1 is released.

@dpiessens
Copy link
Contributor Author

Thanks @j82w, I updated to preview 1 and got a trace. This trace executed the same query, but on a different ID so it hit a different server side partition. Let me know if anything obvious stands out that I should tweak:

{
    "name": "FeedIterator Read Next Async",
    "id": "ced01b61-0762-47b2-9f96-dc715c0e6ee0",
    "component": "Unknown",
    "caller info": {
        "member": "ReadNextWithRootTraceAsync",
        "file": "FeedIteratorInlineCore.cs",
        "line": 48
    },
    "start time": "09:46:22:240",
    "duration in milliseconds": 21.0619,
    "data": {},
    "children": [{
        "name": "Create Query Pipeline",
        "id": "a129d9ed-a372-457f-8b70-4e88b4aa1e68",
        "component": "Query",
        "caller info": {
            "member": "TryCreateCoreContextAsync",
            "file": "CosmosQueryExecutionContextFactory.cs",
            "line": 85
        },
        "start time": "09:46:22:240",
        "duration in milliseconds": 2.5604,
        "data": {},
        "children": [{
            "name": "Get Container Properties",
            "id": "efc67f67-3c40-4b18-8df0-35b591d6545e",
            "component": "Transport",
            "caller info": {
                "member": "GetCachedContainerPropertiesAsync",
                "file": "ClientContextCore.cs",
                "line": 349
            },
            "start time": "09:46:22:240",
            "duration in milliseconds": 0.0119,
            "data": {},
            "children": [{
                "name": "Get Collection Cache",
                "id": "4a5fc555-d8c6-4ea5-8da8-0f0f7d24f3ad",
                "component": "Routing",
                "caller info": {
                    "member": "GetCollectionCacheAsync",
                    "file": "DocumentClient.cs",
                    "line": 542
                },
                "start time": "09:46:22:240",
                "duration in milliseconds": 0.0016,
                "data": {},
                "children": []
            }]
        }, {
            "name": "Gateway QueryPlan",
            "id": "e128ce0f-004f-4377-9530-f0a9771f402f",
            "component": "Query",
            "caller info": {
                "member": "GetQueryPlanThroughGatewayAsync",
                "file": "QueryPlanRetriever.cs",
                "line": 112
            },
            "start time": "09:46:22:240",
            "duration in milliseconds": 0.4017,
            "data": {},
            "children": []
        }, {
            "name": "Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler",
            "id": "4671a334-115e-4af8-9c9c-410717d163cc",
            "component": "RequestHandler",
            "caller info": {
                "member": "SendAsync",
                "file": "RequestInvokerHandler.cs",
                "line": 148
            },
            "start time": "09:46:22:240",
            "duration in milliseconds": 2.3294,
            "data": {},
            "children": [{
                "name": "Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler",
                "id": "88de0897-25ec-4b45-b0a3-91905abd84e6",
                "component": "RequestHandler",
                "caller info": {
                    "member": "SendAsync",
                    "file": "RequestInvokerHandler.cs",
                    "line": 51
                },
                "start time": "09:46:22:240",
                "duration in milliseconds": 2.3144,
                "data": {},
                "children": [{
                    "name": "Send Async",
                    "id": "60ae3d73-4d90-4902-92f1-3d99dda58a0f",
                    "component": "RequestHandler",
                    "caller info": {
                        "member": "SendAsync",
                        "file": "RequestHandler.cs",
                        "line": 50
                    },
                    "start time": "09:46:22:240",
                    "duration in milliseconds": 2.2835,
                    "data": {
                        "CPU Load History": {
                            "CPU History": "(2021-03-02T21:45:05.7947927Z 100.000), (2021-03-02T21:45:15.6891312Z 100.000), (2021-03-02T21:45:25.6716779Z 97.755), (2021-03-02T21:45:45.6840788Z 100.000), (2021-03-02T21:45:55.7350774Z 95.455), (2021-03-02T21:46:15.7287876Z 100.000)"
                        }
                    },
                    "children": [{
                        "name": "Send Async",
                        "id": "f67e1e06-c66e-436d-9a28-9ecca557415b",
                        "component": "RequestHandler",
                        "caller info": {
                            "member": "SendAsync",
                            "file": "RequestHandler.cs",
                            "line": 50
                        },
                        "start time": "09:46:22:240",
                        "duration in milliseconds": 2.2773,
                        "data": {},
                        "children": [{
                            "name": "Send Async",
                            "id": "43c0f323-f2fd-42d1-9116-981b905c4b43",
                            "component": "RequestHandler",
                            "caller info": {
                                "member": "SendAsync",
                                "file": "AbstractRetryHandler.cs",
                                "line": 23
                            },
                            "start time": "09:46:22:240",
                            "duration in milliseconds": 2.2742,
                            "data": {},
                            "children": [{
                                "name": "Send Async",
                                "id": "5784f620-b9b3-44f5-a925-874f6eb87d27",
                                "component": "RequestHandler",
                                "caller info": {
                                    "member": "SendAsync",
                                    "file": "RequestHandler.cs",
                                    "line": 50
                                },
                                "start time": "09:46:22:240",
                                "duration in milliseconds": 2.2653,
                                "data": {},
                                "children": [{
                                    "name": "Microsoft.Azure.Cosmos.GatewayStoreModel Transport Request",
                                    "id": "cc8582f6-21b3-426d-8717-204ba853c431",
                                    "component": "Transport",
                                    "caller info": {
                                        "member": "ProcessMessageAsync",
                                        "file": "TransportHandler.cs",
                                        "line": 87
                                    },
                                    "start time": "09:46:22:240",
                                    "duration in milliseconds": 2.2059,
                                    "data": {
                                        "User Agent": "cosmos-netstandard-sdk/3.17.0|3.17.1|01|X64|Linux 5.4.0-1031-azure 32 18.|.NET 5.0.3|",
                                        "Client Side Request Stats": {
                                            "Id": "AggregatedClientSideRequestStatistics",
                                            "ContactedReplicas": [],
                                            "RegionsContacted": [],
                                            "FailedReplicas": [],
                                            "AddressResolutionStatistics": [],
                                            "StoreResponseStatistics": []
                                        },
                                        "PointOperationStatisticsTraceDatum": {
                                            "Id": "PointOperationStatistics",
                                            "ActivityId": null,
                                            "ResponseTimeUtc": "2021-03-02T21:46:22.2427593Z",
                                            "StatusCode": 200,
                                            "SubStatusCode": 0,
                                            "RequestCharge": 0,
                                            "RequestUri": "dbs/reveal/colls/*******",
                                            "ErrorMessage": null,
                                            "RequestSessionToken": null,
                                            "ResponseSessionToken": null
                                        }
                                    },
                                    "children": []
                                }]
                            }]
                        }]
                    }]
                }]
            }]
        }, {
            "name": "Get Partition Key Ranges",
            "id": "561ec0c8-122f-48df-a3dc-591f5c3ad5ad",
            "component": "Routing",
            "caller info": {
                "member": "GetTargetPartitionKeyRangesAsync",
                "file": "CosmosQueryClientCore.cs",
                "line": 240
            },
            "start time": "09:46:22:242",
            "duration in milliseconds": 0.032,
            "data": {},
            "children": []
        }]
    }, {
        "name": "MoveNextAsync",
        "id": "03a8f150-9a88-47bc-9a75-f16f1b6c5d64",
        "component": "Pagination",
        "caller info": {
            "member": "MoveNextAsync",
            "file": "CrossPartitionRangePageAsyncEnumerator.cs",
            "line": 114
        },
        "start time": "09:46:22:242",
        "duration in milliseconds": 18.4254,
        "data": {},
        "children": [{
            "name": "Prefetching",
            "id": "b9a9499a-3234-43ac-a053-fe0714e4f64b",
            "component": "Pagination",
            "caller info": {
                "member": "PrefetchInParallelAsync",
                "file": "ParallelPrefetch.cs",
                "line": 31
            },
            "start time": "09:46:22:242",
            "duration in milliseconds": 18.3498,
            "data": {},
            "children": [{
                "name": "Prefetch",
                "id": "1660d23a-ab1f-4c64-88aa-704c97b92904",
                "component": "Pagination",
                "caller info": {
                    "member": "PrefetchAsync",
                    "file": "BufferedPartitionRangePageAsyncEnumerator.cs",
                    "line": 54
                },
                "start time": "09:46:22:242",
                "duration in milliseconds": 18.3004,
                "data": {},
                "children": [{
                    "name": "[05C1E7FFFFFFFE,05C1E93F5D0DD0) move next",
                    "id": "39837de7-a964-47d5-b13e-872b81afb6b4",
                    "component": "Pagination",
                    "caller info": {
                        "member": "MoveNextAsync",
                        "file": "PartitionRangePageAsyncEnumerator.cs",
                        "line": 49
                    },
                    "start time": "09:46:22:242",
                    "duration in milliseconds": 18.2933,
                    "data": {
                        "Query Metrics": "Retrieved Document Count : 9 Retrieved Document Size : 72,156 bytes Output Document Count : 9 Output Document Size : 5,767 bytes Index Utilization : 100.00 % Total Query Execution Time : 11.24 milliseconds Query Preparation Time : 1.03 milliseconds Index Lookup Time : 1.90 milliseconds Document Load Time : 7.65 milliseconds Runtime Execution Times : 0.00 milliseconds Document Write Time : 0.05 milliseconds  Index Utilization Information  Utilized Single Indexes  Potential Single Indexes  Utilized Composite Indexes  Potential Composite Indexes"
                    },
                    "children": [{
                        "name": "Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler",
                        "id": "440227a4-d376-4da0-b8b4-aec6900df583",
                        "component": "RequestHandler",
                        "caller info": {
                            "member": "SendAsync",
                            "file": "RequestInvokerHandler.cs",
                            "line": 148
                        },
                        "start time": "09:46:22:243",
                        "duration in milliseconds": 17.9901,
                        "data": {},
                        "children": [{
                            "name": "Get Collection Cache",
                            "id": "077d3811-e1b1-4ae3-aab3-e393d6517fb2",
                            "component": "Routing",
                            "caller info": {
                                "member": "GetCollectionCacheAsync",
                                "file": "DocumentClient.cs",
                                "line": 542
                            },
                            "start time": "09:46:22:243",
                            "duration in milliseconds": 0.0009,
                            "data": {},
                            "children": []
                        }, {
                            "name": "Try Get Overlapping Ranges",
                            "id": "fcc9baac-7078-49fa-9cde-2af2c3e23ffb",
                            "component": "Routing",
                            "caller info": {
                                "member": "TryGetOverlappingRangesAsync",
                                "file": "PartitionKeyRangeCache.cs",
                                "line": 49
                            },
                            "start time": "09:46:22:243",
                            "duration in milliseconds": 0.0098,
                            "data": {},
                            "children": []
                        }, {
                            "name": "Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler",
                            "id": "5d2d935a-a3e0-49e0-b52f-fb114cd80322",
                            "component": "RequestHandler",
                            "caller info": {
                                "member": "SendAsync",
                                "file": "RequestInvokerHandler.cs",
                                "line": 51
                            },
                            "start time": "09:46:22:243",
                            "duration in milliseconds": 17.9389,
                            "data": {},
                            "children": [{
                                "name": "Send Async",
                                "id": "b3fedb6d-806e-407f-9d13-169fec787e28",
                                "component": "RequestHandler",
                                "caller info": {
                                    "member": "SendAsync",
                                    "file": "RequestHandler.cs",
                                    "line": 50
                                },
                                "start time": "09:46:22:243",
                                "duration in milliseconds": 17.9299,
                                "data": {
                                    "CPU Load History": {
                                        "CPU History": "(2021-03-02T21:45:05.7947927Z 100.000), (2021-03-02T21:45:15.6891312Z 100.000), (2021-03-02T21:45:25.6716779Z 97.755), (2021-03-02T21:45:45.6840788Z 100.000), (2021-03-02T21:45:55.7350774Z 95.455), (2021-03-02T21:46:15.7287876Z 100.000)"
                                    }
                                },
                                "children": [{
                                    "name": "Send Async",
                                    "id": "6e2b26e0-8576-476f-8f94-8871bd7e2d8d",
                                    "component": "RequestHandler",
                                    "caller info": {
                                        "member": "SendAsync",
                                        "file": "RequestHandler.cs",
                                        "line": 50
                                    },
                                    "start time": "09:46:22:243",
                                    "duration in milliseconds": 17.926,
                                    "data": {},
                                    "children": [{
                                        "name": "Send Async",
                                        "id": "3f7b1211-c41d-466b-8056-068630dcb85a",
                                        "component": "RequestHandler",
                                        "caller info": {
                                            "member": "SendAsync",
                                            "file": "AbstractRetryHandler.cs",
                                            "line": 23
                                        },
                                        "start time": "09:46:22:243",
                                        "duration in milliseconds": 17.9233,
                                        "data": {},
                                        "children": [{
                                            "name": "Send Async",
                                            "id": "9ef57489-b1cd-4a7f-b3cc-13c2534891a0",
                                            "component": "RequestHandler",
                                            "caller info": {
                                                "member": "SendAsync",
                                                "file": "RequestHandler.cs",
                                                "line": 50
                                            },
                                            "start time": "09:46:22:243",
                                            "duration in milliseconds": 17.9173,
                                            "data": {},
                                            "children": [{
                                                "name": "Microsoft.Azure.Documents.ServerStoreModel Transport Request",
                                                "id": "decc7268-c234-46f0-b0da-54f59aa38ff7",
                                                "component": "Transport",
                                                "caller info": {
                                                    "member": "ProcessMessageAsync",
                                                    "file": "TransportHandler.cs",
                                                    "line": 87
                                                },
                                                "start time": "09:46:22:243",
                                                "duration in milliseconds": 17.8771,
                                                "data": {
                                                    "User Agent": "cosmos-netstandard-sdk/3.17.0|3.17.1|01|X64|Linux 5.4.0-1031-azure 32 18.|.NET 5.0.3|",
                                                    "Client Side Request Stats": {
                                                        "Id": "AggregatedClientSideRequestStatistics",
                                                        "ContactedReplicas": [],
                                                        "RegionsContacted": ["https://**********.documents.azure.com/"],
                                                        "FailedReplicas": [],
                                                        "AddressResolutionStatistics": [],
                                                        "StoreResponseStatistics": [{
                                                            "ResponseTimeUTC": "2021-03-02T21:46:22.2609143Z",
                                                            "ResourceType": "Document",
                                                            "OperationType": "Query",
                                                            "LocationEndpoint": "https://**********.documents.azure.com/",
                                                            "StoreResult": "StorePhysicalAddress: rntbd://**********.documents.azure.com:14198/apps/1072af28-7265-4a2c-9208-a3ae5332dec2/services/fa4dd569-7f60-4935-9ade-412d6ff2de42/partitions/01123852-8264-4de5-8c62-07fa08e93cb4/replicas/132460929699733530s/, LSN: 3697338, GlobalCommittedLsn: 3697337, PartitionKeyRangeId: 117, IsValid: True, StatusCode: 200, SubStatusCode: 0, RequestCharge: 5.76, ItemLSN: -1, SessionToken: -1#3697338, UsingLocalLSN: True, TransportException: null"
                                                        }]
                                                    }
                                                },
                                                "children": []
                                            }]
                                        }]
                                    }]
                                }]
                            }]
                        }]
                    }, {
                        "name": "Get Cosmos Element Response",
                        "id": "62a0895c-100c-4bc1-ad5e-dd91c8b9051a",
                        "component": "Json",
                        "caller info": {
                            "member": "GetCosmosElementResponse",
                            "file": "CosmosQueryClientCore.cs",
                            "line": 282
                        },
                        "start time": "09:46:22:261",
                        "duration in milliseconds": 0.2346,
                        "data": {},
                        "children": []
                    }]
                }]
            }]
        }, {
            "name": "[05C1E7FFFFFFFE,05C1E93F5D0DD0) move next",
            "id": "c9eb1392-ee93-4211-8b36-93fa4cfa50c3",
            "component": "Pagination",
            "caller info": {
                "member": "MoveNextAsync",
                "file": "PartitionRangePageAsyncEnumerator.cs",
                "line": 49
            },
            "start time": "09:46:22:261",
            "duration in milliseconds": 0.007,
            "data": {},
            "children": [{
                "name": "Prefetch",
                "id": "4c087252-3717-4a2a-adcf-aa17f6aedd59",
                "component": "Pagination",
                "caller info": {
                    "member": "PrefetchAsync",
                    "file": "BufferedPartitionRangePageAsyncEnumerator.cs",
                    "line": 54
                },
                "start time": "09:46:22:261",
                "duration in milliseconds": 0.0008,
                "data": {},
                "children": []
            }]
        }]
    }]
}

@j82w
Copy link
Contributor

j82w commented Mar 3, 2021

@timsander1 can you please take a look? Just taking a quick look it seems like 3.17 is doing 2 page read requests while 3.9 is only doing 1 page read request. After reviewing the diagnostics more closely the issue is that it is comparing 2 very different queries.

@dpiessens are you doing a paging API? Can you share the code on how you are draining the query.

@dpiessens
Copy link
Contributor Author

Sure here's the code we're using:

public virtual async Task<(IEnumerable<T> results, string sessionToken)> ExecuteQueryAsync<T>(IQueryable<T> query, CancellationToken cancellationToken = default(CancellationToken))
{
    using (LogContext.PushProperty("operationName", "Query"))
    using (LogContext.PushProperty("query", query.ToString()))
    {
        var results = new List<T>();
        string sessionToken = null;
        var options = new QueryRequestOptions { MaxConcurrency = -1, MaxBufferedItemCount = -1 };
        using (var iterator = this.container.GetItemQueryStreamIterator(query.ToQueryDefinition(), null, options))
        {
            while (iterator.HasMoreResults)
            {
                Func<Task<ResponseMessage>> action = () => iterator.ReadNextAsync(cancellationToken);
                using var response = await Execute(action, "Query");
                using var jsondoc = await JsonDocument.ParseAsync(response.Content, cancellationToken: cancellationToken);
                var queryResults = JsonConverter.Deserialize<T>(jsondoc.RootElement.GetProperty("Documents").EnumerateArray());

                results.AddRange(queryResults);

                sessionToken = response.Headers.Session;
            }
        }

        return (results, sessionToken);
    }
}

One note on this the await Execute line simply allows us to wrap the call in our metrics and extract your client metadata consistently for all operations. Also, we're using a direct serializer to date since we always have lists of results for queries and to date serialize methods have done objects one at a time which is slower.

We didn't change that part of the code yet so there to limit perf change areas, but we're happy to move that over if you feel the newer query methods have better performance.

@timsander1
Copy link
Contributor

Hi @dpiessens

Would it be possible to send two traces for the same query with the same filter? The query metrics are different enough that it's hard to rule this out as a possible reason for the difference.

Thanks!

@dpiessens
Copy link
Contributor Author

dpiessens commented Mar 4, 2021 via email

@dpiessens
Copy link
Contributor Author

OK so I ran two full runs and wrote something to compare items down to the query. The result is that query calls on average increased 15 ms. This query was at the average:

Baseline
Elapsed Time: 29.3056 ms
Client Elapsed Time: 29.2932 ms
Diagnostics:

{
    "DiagnosticVersion": "2",
    "Summary": {
        "StartUtc": "2021-03-05T03:33:01.2618255Z",
        "TotalElapsedTimeInMs": 29.2932,
        "UserAgent": "cosmos-netstandard-sdk|3.9.0|3.9.0|61579|X64|Linux 5.4.0-1031-azure 32 18.|.NET 5.0.3|",
        "TotalRequestCount": 2,
        "FailedRequestCount": 0
    },
    "Context": [{
        "Id": "CreateQueryPipeline",
        "ElapsedTimeInMs": 2.095
    }, {
        "Id": "Microsoft.Azure.Cosmos.Handlers.RetryHandler",
        "HandlerElapsedTimeInMs": 1.8682
    }, {
        "Id": "Microsoft.Azure.Cosmos.Handlers.RouterHandler",
        "HandlerElapsedTimeInMs": 1.8624
    }, {
        "Id": "Microsoft.Azure.Cosmos.Handlers.TransportHandler",
        "HandlerElapsedTimeInMs": 0.2576
    }, {
        "Id": "AggregatedClientSideRequestStatistics",
        "ContactedReplicas": [],
        "RegionsContacted": [],
        "FailedReplicas": []
    }, {
        "Id": "Microsoft.Azure.Cosmos.GatewayStoreModel",
        "ElapsedTimeInMs": 1.7582
    }, {
        "Id": "GatewayRequestTime",
        "ElapsedTimeInMs": 1.6436
    }, {
        "Id": "PointOperationStatistics",
        "ActivityId": null,
        "ResponseTimeUtc": "2021-03-05T03:33:01.2637739Z",
        "StatusCode": 200,
        "SubStatusCode": 0,
        "RequestCharge": 0.0,
        "RequestUri": "dbs/*****/colls/****",
        "RequestSessionToken": null,
        "ResponseSessionToken": null
    }, {
        "PKRangeId": "126",
        "StartUtc": "2021-03-05T03:33:01.2640741Z",
        "QueryMetric": "totalExecutionTimeInMs=20.91;queryCompileTimeInMs=0.43;queryLogicalPlanBuildTimeInMs=0.13;queryPhysicalPlanBuildTimeInMs=0.45;queryOptimizationTimeInMs=0.03;VMExecutionTimeInMs=19.56;indexLookupTimeInMs=1.57;documentLoadTimeInMs=17.29;systemFunctionExecuteTimeInMs=0.00;userFunctionExecuteTimeInMs=0.00;retrievedDocumentCount=18;retrievedDocumentSize=151748;outputDocumentCount=18;outputDocumentSize=11511;writeOutputTimeInMs=0.24;indexUtilizationRatio=1.00",
        "IndexUtilization": "",
        "ClientCorrelationId": "2e835ccf-4585-4006-9298-1f453b6b1129",
        "Context": [{
            "Id": "AggregatedClientSideRequestStatistics",
            "ContactedReplicas": [],
            "RegionsContacted": ["https://**********.documents.azure.com:443/"],
            "FailedReplicas": []
        }, {
            "Id": "Microsoft.Azure.Cosmos.Handlers.RetryHandler",
            "HandlerElapsedTimeInMs": 26.7368
        }, {
            "Id": "Microsoft.Azure.Cosmos.Handlers.RouterHandler",
            "HandlerElapsedTimeInMs": 26.7322
        }, {
            "Id": "Microsoft.Azure.Cosmos.Handlers.TransportHandler",
            "HandlerElapsedTimeInMs": 0.2232
        }, {
            "Id": "Microsoft.Azure.Documents.ServerStoreModel",
            "ElapsedTimeInMs": 26.6797
        }, {
            "Id": "StoreResponseStatistics",
            "StartTimeUtc": "2021-03-05T03:33:01.2642095Z",
            "ResponseTimeUtc": "2021-03-05T03:33:01.2908277Z",
            "ElapsedTimeInMs": 26.6182,
            "ResourceType": "Document",
            "OperationType": "Query",
            "LocationEndpoint": "https://**********.documents.azure.com:443/",
            "StoreResult": "StorePhysicalAddress: rntbd://cdb-ms-prod-eastus2-***.documents.azure.com:14440/apps/796f6a08-ae9e-4a5f-8014-6b8972ffefdc/services/b0c7544f-49fc-49d3-ae79-ef9df71474e3/partitions/5b8fb755-4144-4fda-8245-3e6f180ed766/replicas/132588272726853466s/, LSN: 3777274, GlobalCommittedLsn: 3777273, PartitionKeyRangeId: 126, IsValid: True, StatusCode: 200, SubStatusCode: 0, RequestCharge: 6.56, ItemLSN: -1, SessionToken: -1#3777274, UsingLocalLSN: True, TransportException: null"
        }]
    }]
}

Latest
Elapsed Time: 44.9036 ms
Client Elapsed Time: 44.3216 ms
Diagnostics:

{
    "name": "FeedIterator Read Next Async",
    "id": "4baf1d60-acee-4beb-aa3c-6fd37f277eab",
    "component": "Unknown",
    "caller info": {
        "member": "ReadNextWithRootTraceAsync",
        "file": "FeedIteratorInlineCore.cs",
        "line": 48
    },
    "start time": "10:38:54:058",
    "duration in milliseconds": 44.3216,
    "data": {},
    "children": [{
        "name": "Create Query Pipeline",
        "id": "0279b6ab-cd23-455a-b7d1-837047ed52ed",
        "component": "Query",
        "caller info": {
            "member": "TryCreateCoreContextAsync",
            "file": "CosmosQueryExecutionContextFactory.cs",
            "line": 85
        },
        "start time": "10:38:54:058",
        "duration in milliseconds": 3.1787,
        "data": {},
        "children": [{
            "name": "Get Container Properties",
            "id": "7cc3492a-f7eb-4029-88fa-329be4981076",
            "component": "Transport",
            "caller info": {
                "member": "GetCachedContainerPropertiesAsync",
                "file": "ClientContextCore.cs",
                "line": 349
            },
            "start time": "10:38:54:058",
            "duration in milliseconds": 0.0092,
            "data": {},
            "children": [{
                "name": "Get Collection Cache",
                "id": "c07c8bd1-bcc0-41cd-899d-fdd9e6cfd63d",
                "component": "Routing",
                "caller info": {
                    "member": "GetCollectionCacheAsync",
                    "file": "DocumentClient.cs",
                    "line": 542
                },
                "start time": "10:38:54:058",
                "duration in milliseconds": 0.001,
                "data": {},
                "children": []
            }]
        }, {
            "name": "Gateway QueryPlan",
            "id": "bce69c10-abc3-44ed-be52-6733d9446317",
            "component": "Query",
            "caller info": {
                "member": "GetQueryPlanThroughGatewayAsync",
                "file": "QueryPlanRetriever.cs",
                "line": 112
            },
            "start time": "10:38:54:058",
            "duration in milliseconds": 0.2977,
            "data": {},
            "children": []
        }, {
            "name": "Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler",
            "id": "f414fc24-a0c2-4f9b-8ab2-8a8594951503",
            "component": "RequestHandler",
            "caller info": {
                "member": "SendAsync",
                "file": "RequestInvokerHandler.cs",
                "line": 148
            },
            "start time": "10:38:54:058",
            "duration in milliseconds": 3.004,
            "data": {},
            "children": [{
                "name": "Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler",
                "id": "8ec8c65c-f356-4337-a84b-26bd0c252023",
                "component": "RequestHandler",
                "caller info": {
                    "member": "SendAsync",
                    "file": "RequestInvokerHandler.cs",
                    "line": 51
                },
                "start time": "10:38:54:058",
                "duration in milliseconds": 2.9855,
                "data": {},
                "children": [{
                    "name": "Send Async",
                    "id": "1fc064a3-614d-4b71-afd4-350d32648bc4",
                    "component": "RequestHandler",
                    "caller info": {
                        "member": "SendAsync",
                        "file": "RequestHandler.cs",
                        "line": 50
                    },
                    "start time": "10:38:54:058",
                    "duration in milliseconds": 2.9796,
                    "data": {
                        "CPU Load History": {
                            "CPU History": "(2021-03-03T11:12:09.7821310Z 0.000), (2021-03-03T11:12:19.7807023Z 100.000), (2021-03-03T11:12:29.7818508Z 33.333), (2021-03-03T11:12:39.7828943Z 33.333), (2021-03-03T11:12:49.7805599Z 37.220), (2021-03-03T11:12:59.7806116Z 50.319)"
                        }
                    },
                    "children": [{
                        "name": "Send Async",
                        "id": "fc912e4a-fd49-418f-b283-9268b9b5c45c",
                        "component": "RequestHandler",
                        "caller info": {
                            "member": "SendAsync",
                            "file": "RequestHandler.cs",
                            "line": 50
                        },
                        "start time": "10:38:54:058",
                        "duration in milliseconds": 2.9748,
                        "data": {},
                        "children": [{
                            "name": "Send Async",
                            "id": "94cb9950-5127-47f8-afc6-184ed4f8b093",
                            "component": "RequestHandler",
                            "caller info": {
                                "member": "SendAsync",
                                "file": "AbstractRetryHandler.cs",
                                "line": 23
                            },
                            "start time": "10:38:54:058",
                            "duration in milliseconds": 2.9717,
                            "data": {},
                            "children": [{
                                "name": "Send Async",
                                "id": "40709019-a2b2-4bce-981c-393ce763a50b",
                                "component": "RequestHandler",
                                "caller info": {
                                    "member": "SendAsync",
                                    "file": "RequestHandler.cs",
                                    "line": 50
                                },
                                "start time": "10:38:54:058",
                                "duration in milliseconds": 2.9651,
                                "data": {},
                                "children": [{
                                    "name": "Microsoft.Azure.Cosmos.GatewayStoreModel Transport Request",
                                    "id": "09e1b5b5-adc9-4533-97e2-99d597ea49b4",
                                    "component": "Transport",
                                    "caller info": {
                                        "member": "ProcessMessageAsync",
                                        "file": "TransportHandler.cs",
                                        "line": 87
                                    },
                                    "start time": "10:38:54:058",
                                    "duration in milliseconds": 2.9294,
                                    "data": {
                                        "User Agent": "cosmos-netstandard-sdk/3.17.0|3.17.1|01|X64|Linux 5.4.0-1031-azure 32 18.|.NET 5.0.3|",
                                        "Client Side Request Stats": {
                                            "Id": "AggregatedClientSideRequestStatistics",
                                            "ContactedReplicas": [],
                                            "RegionsContacted": [],
                                            "FailedReplicas": [],
                                            "AddressResolutionStatistics": [],
                                            "StoreResponseStatistics": []
                                        },
                                        "PointOperationStatisticsTraceDatum": {
                                            "Id": "PointOperationStatistics",
                                            "ActivityId": null,
                                            "ResponseTimeUtc": "2021-03-04T22:38:54.0611779Z",
                                            "StatusCode": 200,
                                            "SubStatusCode": 0,
                                            "RequestCharge": 0,
                                            "RequestUri": "dbs/*****/colls/*****",
                                            "ErrorMessage": null,
                                            "RequestSessionToken": null,
                                            "ResponseSessionToken": null
                                        }
                                    },
                                    "children": []
                                }]
                            }]
                        }]
                    }]
                }]
            }]
        }, {
            "name": "Get Partition Key Ranges",
            "id": "c4b31db9-7ca4-4489-839c-309187ec6c16",
            "component": "Routing",
            "caller info": {
                "member": "GetTargetPartitionKeyRangesAsync",
                "file": "CosmosQueryClientCore.cs",
                "line": 240
            },
            "start time": "10:38:54:061",
            "duration in milliseconds": 0.0212,
            "data": {},
            "children": []
        }]
    }, {
        "name": "MoveNextAsync",
        "id": "540549f1-6217-4d5a-85e9-9f68ced9a622",
        "component": "Pagination",
        "caller info": {
            "member": "MoveNextAsync",
            "file": "CrossPartitionRangePageAsyncEnumerator.cs",
            "line": 114
        },
        "start time": "10:38:54:061",
        "duration in milliseconds": 41.0726,
        "data": {},
        "children": [{
            "name": "Prefetching",
            "id": "8796703e-32e5-4d44-a287-f2c40539c435",
            "component": "Pagination",
            "caller info": {
                "member": "PrefetchInParallelAsync",
                "file": "ParallelPrefetch.cs",
                "line": 31
            },
            "start time": "10:38:54:061",
            "duration in milliseconds": 41.0125,
            "data": {},
            "children": [{
                "name": "Prefetch",
                "id": "b18990d6-8eb7-4736-9057-aa3dee27588a",
                "component": "Pagination",
                "caller info": {
                    "member": "PrefetchAsync",
                    "file": "BufferedPartitionRangePageAsyncEnumerator.cs",
                    "line": 54
                },
                "start time": "10:38:54:061",
                "duration in milliseconds": 40.9627,
                "data": {},
                "children": [{
                    "name": "[05C1E5C373C764,05C1E5FFFFFFFE) move next",
                    "id": "26a19e66-b841-4ca2-a8e4-928ee3df2aeb",
                    "component": "Pagination",
                    "caller info": {
                        "member": "MoveNextAsync",
                        "file": "PartitionRangePageAsyncEnumerator.cs",
                        "line": 49
                    },
                    "start time": "10:38:54:061",
                    "duration in milliseconds": 40.9557,
                    "data": {
                        "Query Metrics": "Retrieved Document Count: 16, Retrieved Document Size: 132, 449 bytes Output Document Count: 16, Output Document Size: 10, 238 bytes, Index Utilization 100.00 % Total Query Execution Time: 21.27 milliseconds, Query Preparation Time: 1.06 milliseconds Index Lookup Time: 1.58 milliseconds  Document Load Time: 17.49 milliseconds  Runtime Execution Times: 0.00 milliseconds   Document Write Time: 0.25 milliseconds Index Utilization Information   Utilized Single Indexes   Potential Single Indexes   Utilized Composite Indexes   Potential Composite Indexes"
                    },
                    "children": [{
                        "name": "Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler",
                        "id": "65fab615-615d-443f-9d54-a2e0ad983e92",
                        "component": "RequestHandler",
                        "caller info": {
                            "member": "SendAsync",
                            "file": "RequestInvokerHandler.cs",
                            "line": 148
                        },
                        "start time": "10:38:54:061",
                        "duration in milliseconds": 40.4332,
                        "data": {},
                        "children": [{
                            "name": "Get Collection Cache",
                            "id": "cce8562a-e318-489e-bfb8-ddbbb6b7585c",
                            "component": "Routing",
                            "caller info": {
                                "member": "GetCollectionCacheAsync",
                                "file": "DocumentClient.cs",
                                "line": 542
                            },
                            "start time": "10:38:54:061",
                            "duration in milliseconds": 0.0007,
                            "data": {},
                            "children": []
                        }, {
                            "name": "Try Get Overlapping Ranges",
                            "id": "affdc18e-4717-4564-af95-b56d9ba3b5d6",
                            "component": "Routing",
                            "caller info": {
                                "member": "TryGetOverlappingRangesAsync",
                                "file": "PartitionKeyRangeCache.cs",
                                "line": 49
                            },
                            "start time": "10:38:54:061",
                            "duration in milliseconds": 0.008,
                            "data": {},
                            "children": []
                        }, {
                            "name": "Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler",
                            "id": "1f460122-e5db-4354-9473-f96848cf434f",
                            "component": "RequestHandler",
                            "caller info": {
                                "member": "SendAsync",
                                "file": "RequestInvokerHandler.cs",
                                "line": 51
                            },
                            "start time": "10:38:54:061",
                            "duration in milliseconds": 40.3982,
                            "data": {},
                            "children": [{
                                "name": "Send Async",
                                "id": "d5487e60-b44e-4538-bdd0-38f2d0c75151",
                                "component": "RequestHandler",
                                "caller info": {
                                    "member": "SendAsync",
                                    "file": "RequestHandler.cs",
                                    "line": 50
                                },
                                "start time": "10:38:54:061",
                                "duration in milliseconds": 40.3917,
                                "data": {
                                    "CPU Load History": {
                                        "CPU History": "(2021-03-03T11:12:09.7821310Z 0.000), (2021-03-03T11:12:19.7807023Z 100.000), (2021-03-03T11:12:29.7818508Z 33.333), (2021-03-03T11:12:39.7828943Z 33.333), (2021-03-03T11:12:49.7805599Z 37.220), (2021-03-03T11:12:59.7806116Z 50.319)"
                                    }
                                },
                                "children": [{
                                    "name": "Send Async",
                                    "id": "4bee8637-2b50-4464-b360-aeaf15c49ae7",
                                    "component": "RequestHandler",
                                    "caller info": {
                                        "member": "SendAsync",
                                        "file": "RequestHandler.cs",
                                        "line": 50
                                    },
                                    "start time": "10:38:54:061",
                                    "duration in milliseconds": 40.388,
                                    "data": {},
                                    "children": [{
                                        "name": "Send Async",
                                        "id": "7287901b-9dbc-40bc-8602-fd354caf1bfe",
                                        "component": "RequestHandler",
                                        "caller info": {
                                            "member": "SendAsync",
                                            "file": "AbstractRetryHandler.cs",
                                            "line": 23
                                        },
                                        "start time": "10:38:54:061",
                                        "duration in milliseconds": 40.3856,
                                        "data": {},
                                        "children": [{
                                            "name": "Send Async",
                                            "id": "533445a3-6377-4fc1-b0b2-de92f9e5ac61",
                                            "component": "RequestHandler",
                                            "caller info": {
                                                "member": "SendAsync",
                                                "file": "RequestHandler.cs",
                                                "line": 50
                                            },
                                            "start time": "10:38:54:061",
                                            "duration in milliseconds": 40.3797,
                                            "data": {},
                                            "children": [{
                                                "name": "Microsoft.Azure.Documents.ServerStoreModel Transport Request",
                                                "id": "67a59ec8-2a43-468e-b3e3-3d7a8f338aa7",
                                                "component": "Transport",
                                                "caller info": {
                                                    "member": "ProcessMessageAsync",
                                                    "file": "TransportHandler.cs",
                                                    "line": 87
                                                },
                                                "start time": "10:38:54:061",
                                                "duration in milliseconds": 40.3399,
                                                "data": {
                                                    "User Agent": "cosmos-netstandard-sdk/3.17.0|3.17.1|01|X64|Linux 5.4.0-1031-azure 32 18.|.NET 5.0.3|",
                                                    "Client Side Request Stats": {
                                                        "Id": "AggregatedClientSideRequestStatistics",
                                                        "ContactedReplicas": [],
                                                        "RegionsContacted": ["https://*********.documents.azure.com/"],
                                                        "FailedReplicas": [],
                                                        "AddressResolutionStatistics": [],
                                                        "StoreResponseStatistics": [{
                                                            "ResponseTimeUTC": "2021-03-04T22:38:54.1017619Z",
                                                            "ResourceType": "Document",
                                                            "OperationType": "Query",
                                                            "LocationEndpoint": "https://**********.documents.azure.com/",
                                                            "StoreResult": "StorePhysicalAddress: rntbd://cdb-ms-prod-eastus2-***.documents.azure.com:****/apps/796f6a08-ae9e-4a5f-8014-6b8972ffefdc/services/b0c7544f-49fc-49d3-ae79-ef9df71474e3/partitions/5b8fb755-4144-4fda-8245-3e6f180ed766/replicas/132587240566165890s/, LSN: 3763194, GlobalCommittedLsn: 3763193, PartitionKeyRangeId: 126, IsValid: True, StatusCode: 200, SubStatusCode: 0, RequestCharge: 6.33, ItemLSN: -1, SessionToken: -1 #3763194, UsingLocalLSN: True, TransportException: null"
                                                        }]
                                                    }
                                                },
                                                "children": []
                                            }]
                                        }]
                                    }]
                                }]
                            }]
                        }]
                    }, {
                        "name": "Get Cosmos Element Response",
                        "id": "7fa43540-943a-4c62-9a4b-bdfb1ca5c2ba",
                        "component": "Json",
                        "caller info": {
                            "member": "GetCosmosElementResponse",
                            "file": "CosmosQueryClientCore.cs",
                            "line": 282
                        },
                        "start time": "10:38:54:101",
                        "duration in milliseconds": 0.4712,
                        "data": {},
                        "children": []
                    }]
                }]
            }]
        }, {
            "name": "[05C1E5C373C764,05C1E5FFFFFFFE) move next",
            "id": "f08a1c5a-35a8-4bb5-88e4-ef5cbd6b3046",
            "component": "Pagination",
            "caller info": {
                "member": "MoveNextAsync",
                "file": "PartitionRangePageAsyncEnumerator.cs",
                "line": 49
            },
            "start time": "10:38:54:102",
            "duration in milliseconds": 0.0068,
            "data": {},
            "children": [{
                "name": "Prefetch",
                "id": "49b8698e-dd60-49a7-bcb4-7daf2d284a8a",
                "component": "Pagination",
                "caller info": {
                    "member": "PrefetchAsync",
                    "file": "BufferedPartitionRangePageAsyncEnumerator.cs",
                    "line": 54
                },
                "start time": "10:38:54:102",
                "duration in milliseconds": 0.0006,
                "data": {},
                "children": []
            }]
        }]
    }]
}

It does appear the new version is doing a double read.

@j82w
Copy link
Contributor

j82w commented Mar 8, 2021

@timsander1 here is a formatted version:

root|root
└── FeedIterator Read Next Async|44.3216
    ├── Create Query Pipeline|3.1787
    │   ├── Get Container Properties|0.0092
    │   │   └── Get Collection Cache|0.001
    │   ├── Gateway QueryPlan|0.2977
    │   ├── Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler|3.004
    │   │   └── Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler|2.9855
    │   │       └── Send Async|2.9796
    │   │           └── Send Async|2.9748
    │   │               └── Send Async|2.9717
    │   │                   └── Send Async|2.9651
    │   │                       └── Microsoft.Azure.Cosmos.GatewayStoreModel Transport Request|2.9294
    │   └── Get Partition Key Ranges|0.0212
    └── MoveNextAsync|41.0726
        ├── Prefetching|41.0125
        │   └── Prefetch|40.9627
        │       └── [05C1E5C373C764,05C1E5FFFFFFFE) move next|40.9557 | Retrieved Document Count: 16, Retrieved Document Size: 132, 449 bytes Output Document Count: 16, Output Document Size: 10, 238 bytes, Index Utilization 100.00 % Total Query Execution Time: 21.27 milliseconds, Query Preparation Time: 1.06 milliseconds Index Lookup Time: 1.58 milliseconds  Document Load Time: 17.49 milliseconds  Runtime Execution Times: 0.00 milliseconds   Document Write Time: 0.25 milliseconds Index Utilization Information   Utilized Single Indexes   Potential Single Indexes   Utilized Composite Indexes   Potential Composite Indexes
        │           ├── Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler|40.4332
        │           │   ├── Get Collection Cache|0.0007
        │           │   ├── Try Get Overlapping Ranges|0.008
        │           │   └── Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler|40.3982
        │           │       └── Send Async|40.3917
        │           │           └── Send Async|40.388
        │           │               └── Send Async|40.3856
        │           │                   └── Send Async|40.3797
        │           │                       └── Microsoft.Azure.Documents.ServerStoreModel Transport Request|40.3399
        │           └── Get Cosmos Element Response|0.4712
        └── [05C1E5C373C764,05C1E5FFFFFFFE) move next|0.0068
            └── Prefetch|0.0006

@bchong95
Copy link
Contributor

bchong95 commented Mar 8, 2021

Looks like 90 percent of the time is spent over the network:

Microsoft.Azure.Documents.ServerStoreModel Transport Request|40.3399

So moving to the transport team to investigate.

@bchong95 bchong95 assigned j82w and unassigned sboshra Mar 8, 2021
@bchong95 bchong95 added Transport and removed QUERY labels Mar 8, 2021
@j82w j82w added QUERY and removed Transport labels Mar 8, 2021
@j82w
Copy link
Contributor

j82w commented Mar 8, 2021

@bchong95 please verify that the queries are the same, that same amount of data is being sent over the wire, and the same amount of time was spent on the backend.

@dpiessens
Copy link
Contributor Author

For background we are running through AKS with a virtual endpoint on that VNET to cosmos. The environment, files etc. between the calls were as identical as networking can be, The only difference between the two calls is the client library used.

@timsander1
Copy link
Contributor

The queries are different but the output doc size is about the same:

3.9: Query had OutputDocumentCount=18;outputDocumentSize=11511
3.17: Query had Output Document Count =16, Output Document Size: 10, 238 bytes

@j82w
Copy link
Contributor

j82w commented Mar 8, 2021

@dpiessens just to confirm the following questions

  1. Are both using the exact same AKS container? We have seen some latency issues based on the AKS container that was being used.
  2. Are both running in the same region as the cosmos container?
  3. Is there any way to provide multiple diagnostics?

@j82w j82w removed their assignment Mar 8, 2021
@dpiessens
Copy link
Contributor Author

@timsander1

The queries are different but the output doc size is about the same:

3.9: Query had OutputDocumentCount=18;outputDocumentSize=11511
3.17: Query had Output Document Count =16, Output Document Size: 10, 238 bytes

The query is actually the same, the results have to do with the process it's embedded in and the comparison. 3.17 ran first so it was loading the data. When it ran the query the record it was working on wasn't in the DB yet so not in the results. The target being loaded also had another record after this one so that was loaded. 3.9 was then run so it was re-loading the results. This affects the upsert and replace operations but those had no performance changes. Queries for both loads are identical aside from the 2 additional records, but I don't this this can account for the difference.

Our process tracks the load, origin record, method name and the query. I made all of these items the key when I lined up comparisons so I'm completely confident the results are from the same query with the same parameters.

@j82w

  1. Yes we use a CI process to construct each container image, and both tests were deployed to the same AKS cluster. That cluster had no other work on it during the tests and the container has an anti-affinity so they are distributed on the same node set.
  2. Yes both the cluster an DB are in East US 2
  3. Yes I can provide additional traces. I'll send them to @timsander1 as they will have information I'm not comfortable publicly posting.

@j82w
Copy link
Contributor

j82w commented Mar 11, 2021

Just to update the issue based on the email thread. I’ve looked through a several of the diagnostics, and noticed that the call to get the query plans from gateway was taking longer on average on 3.17. Can you try making the following changes and let us know if you see any improvement?

  1. Set the CosmosClientOptions.GatewayModeMaxConnectionLimit to int.MaxValue
    public int GatewayModeMaxConnectionLimit

    a. There was a fix in 3.12 to make it where the connection limit is actually honored on .NET core. It’s possible that the connection limit is causing the latency of the gateway requests to be higher: Connection limits: Fixes .NET core to honor gateway connection limit by j82w · Pull Request Connection limits: Fixes .NET core to honor gateway connection limit #1740 · Azure/azure-cosmos-dotnet-v3 (github.com)
  2. Set the PartitionKey in the QueryRequestOptions when possible.
    a. In 3.13 an optimization was added to skip the get query plan network call if it’s a single partition query with no aggregate operations like group by. https://github.com/Azure/azure-cosmos-dotnet-v3/blob/master/changelog.md#-3130---2020-09-21

@dpiessens
Copy link
Contributor Author

So I updated our code with the two performance items you listed, and the lag is reduced. The average delta went from 15ms to 10ms so that is an improvement. I'm sending over the new traces so you can see the difference. This did leave to one question:

Given that many of the queries access a single partition and specify that in the query, is there a way for you to add that into the Cosmos methods? Right now we're creating an IQueryable via GetItemLinqQueryable<> and then calling the extension method ToStreamIterator().

@j82w
Copy link
Contributor

j82w commented Mar 19, 2021

The latest diagnostics provided shows a 2 to 3x increase in 3.17 SDK networking time vs 3.9 SDK time. Testing the SDK versions in an Azure VM showed they were roughly the same networking latency. Can you try running both SDK versions in the same container against the same Cosmos DB endpoint? At this point it seems likely the difference is caused by something in the environment.

@dpiessens
Copy link
Contributor Author

I attempted to put both version in the same container, but while possible it proved to be more of a challenge than the outcome is worth. At this point we will accept the timing for what it is and scale out to return to our previous throughput.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants