-
Notifications
You must be signed in to change notification settings - Fork 498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design: Query ServiceInterop dependency fallback #3225
Comments
|
The problem is, some customers have solutions that run on Windows x64 and cannot add the ServiceInterop DLL (for whatever reason), in those cases, throwing and pointing to documentation does not help them. |
I don't know if it makes sense or not...Can we identify during client initialization if service interop is available or not (may be by running some metadata query). and then set a flag in SDK which will say, don't even try to load serviceinterop...and in request diagnostics client configuration will say this instance is running without serviceinterop library. |
Exactly the solution 1. In that case, we detect the ServiceInterop is not available before hand, and we don't even attempt to use it, and we fall back to Gateway (automatic fallback) but leave a "stamp" on Diagnostics to explain why. |
I believe the explicit model is the better customer solution.
|
There are scenarios where this configuration option might not be viable to take. Integrations like Logic Apps where the customer simply has no access to them or Functions. Adding an option where the default is the current behavior could potentially break those customers without a workaround.
This is mainly because we did not expose any information about what was going on + V2 SDK had multiple bugs regarding Query Plan (like executing the Query on GW if Query Plan was obtained from GW). I propose adding the Diagnostics to indicate why we went to GW for the Query Plan, Diagnostics can be analyzed by automation and they can also be read and explained by users. |
I'm in favor of automatic fall back with the option to change the behavior for the additional flexibility. It gives customers the ability to verify it's loading correctly without bloating/parsing the diagnostics. |
But even on automatic fallback scenario (if the customer does not change the option) how do they discover why they went to GW? In V2 we had to ask Customers to enable tracing to find out, why not leverage the Diagnostics? |
The option should not really be needed once the query pass through work is done. All the single pk queries won't need or use the service interop. They will just send query to backend. Only cross partition queries will need to get query plan call. My main concern is we don't want customer's parsing and relying on the parsed json as it might have breaking changes. |
I agree, I wouldn't expect them to parse the diagnostics either. But the intent is the same as any other scenario where we use the Diagnostics: If the customer is experiencing higher than expected latency, they share the Diagnostics, we analyze and can say (like the cases where the problem is network latency), by looking at the extra Datum, if they are failing to load the DLL and that is the cause. Similar to what we would do analyzing Traces, but Traces is not a reliable source. |
Currently the V3 SDK, when running on Windows x64, will attempt to use the ServiceInterop.DLL (reference https://docs.microsoft.com/en-us/azure/cosmos-db/sql/performance-tips-query-sdk?tabs=v3&pivots=programming-language-csharp#use-local-query-plan-generation).
If the DLL is not present or one of its dependencies, the SDK throws:
This would normally be fixed by making sure the deployment process correctly copies all DLLs included in the SDK Nuget pacakge.
The problem is, some customers have solutions that run on Windows x64 and cannot add the ServiceInterop DLL (for whatever reason), in those cases, they currently have no work-around.
There are two alternatives to this address this:
Automatic fallback
If the DLL is not available, fallback to Gateway, but in order to allow customers to understand the problem, leave something in the Diagnostics only in the case where the application is running on Windows x64 (not on Linux or x86).
If the app runs on Windows x64, but the DLL cannot be used, a new node in the Diagnostics will help customers understand the problem in latency, and we automatically fallback to Gateway.
Example of Diagnostics node that states that a Query Plan from Gateway was done due to Service Interop not being available:
PROs:
CONs:
Continue to throw but provide options
Provide a
CosmosClientOptions
configuration that users can leverage to select how they want to interact with ServiceInterop. Something similar to what we have in V2ConnectionPolicy.QueryPlanGenerationMode
, reference: https://docs.microsoft.com/en-us/dotnet/api/microsoft.azure.documents.client.connectionpolicy.queryplangenerationmode?view=azure-dotnet#microsoft-azure-documents-client-connectionpolicy-queryplangenerationmode.Default behavior would be to throw, users can opt-in to automatically allow Gateway fallback.
PROs:
CONs:
Related to #2366
The text was updated successfully, but these errors were encountered: