-
Notifications
You must be signed in to change notification settings - Fork 498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cancellation with Cosmos DB Exception properties #371
Comments
This is a rather interesting scenario. Here are the possible solutions that I can think of.
|
|
Here are the trace logs for the v2 SDK. https://github.com/Azure/azure-cosmos-dotnet-v2/blob/master/docs/documentdb-sdk_capture_etl.md |
@j82w we run on linux containers. Doesn't look like this is an option. |
The best fix for this issue is going to be a new feature to make sure this information is accessible. You could try using ConnectionPolicy.RequestTimeout to cancel the process. The one downside to this is it applies to the entire SDK. It's possible the SDK is doing retries for throttling and other common exceptions. Do you see any errors in the portal metrics? |
We have a hard 5 second request limit on all requests, so configuring that as the maximum on the SDK level may be fine. It isn't a complete solution though, since we need some requests to cancel even shorter. Does it apply to TCP/Direct connections? We turn of all retries on the SDK and do retries ourselves using our own invoker implementation. In most incidents we don't see any errors in the portal metrics. |
This was fixed in #1550 |
Is your feature request related to a problem? Please describe.
We sometimes experience long running requests to Cosmos DB. When collaborating with Cosmos DB support to diagnose the issue, the first piece of information that is requested is a stack trace for the exception. This gives information like Activity ID.
Unfortunately, we never have these stack traces because we limit our requests to 5 seconds. This is a requirement of our service. We need to be able to respond back fast, even in error. Therefore, when Cosmos DB has these long running requests, our stack traces are generic TaskCancelledException stack traces caused by the CancellationToken being cancelled.
The request becomes very difficult to track at this point.
Describe the solution you'd like
I would like a way to limit our requests using the cancellation token and still get a relevant cosmos DB exception that contains the activity id, request id, correlation id etc... when the cancellation token has been cancelled.
Describe alternatives you've considered
An alternative, but not as desirable solution would be to allow the client to set a 5 second timeout which would still preserve the detailed exceptions of cosmos DB. This isn't as desirable because a 5 second timeout on the client itself is not as granular of control as passing a cancellation token. Cancellations can be linked and what not.
A workaround of sorts that we thought of would be to set request IDs before talking to cosmos DB and then logging them, but there is no way to set this with the current version of the sdk that we use, plus it requires extra code.
The text was updated successfully, but these errors were encountered: