Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChangeFeedProcessor API and Resource Tokens #3561

Closed
oiqwrwer1 opened this issue Nov 10, 2022 · 39 comments · Fixed by #3566
Closed

ChangeFeedProcessor API and Resource Tokens #3561

oiqwrwer1 opened this issue Nov 10, 2022 · 39 comments · Fixed by #3566
Assignees
Labels
ChangeFeed improvement Change to existing functional behavior (perf, logging, etc.)

Comments

@oiqwrwer1
Copy link

oiqwrwer1 commented Nov 10, 2022

In using the ChangeFeedProcessor.StartAsync(), we have run into a 403 issue "Insufficient permissions provided in the authorization header for the corresponding request. Please retry with another authorization header. ActivityId: 250e7411-d830-4fcf-b82a-0dd654a1c62b, "
The issue is due to the StartAsync() call making a GET request scoped to a CosmosDB database, not collection.
D:\dbs\el\csdb;\Product\Cosmos\RoutingGateway\Runtime\RequestHandlers\AddressRequestHandler.cs:line 57" Argument5="changefeedtokentest" Argument6="AddressFeedQuery" Argument7="GET" TraceSource="DocDBTrace"

My question: is the change feed processor SDK not capable of working with resource tokens? We have thus far been able to use resource tokens to instantiate 1 CosmosClient per token and do operations using 1 CosmosClient per collection. Given that we pass collections via CosmosClient.GetContainer() calls as part of ChangeFeedProcessor's Build(), I would expect operations on the monitored, lease, and target collections to work.

@ealsur
Copy link
Member

ealsur commented Nov 10, 2022

This has nothing to do with Change Feed Processor.

AddressFeedQuery is a core process of the SDK as a whole: https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/sdk-connection-modes#direct-mode

It is the SDK requesting the replica TCP addresses to even start doing operations and it's something done on the SDK regardless of you using the Change Feed Processor or doing ReadItem.

@oiqwrwer1
Copy link
Author

We have been making CRUD operations on collections using resource tokens for months at this point and have never run into a 403. It only occurs when we invoke ChangeFeedProcessor.StartAsync. Does that mean AddressFeedQuery is not invoked on CosmosClient.GetContainer().* where * is the various Container APIs to interact with the collection?

@ealsur
Copy link
Member

ealsur commented Nov 10, 2022

AddressFeedQuery is always used. Remember that for the Change Feed Processor you are working with 2 Containers though, the Monitored and the Lease container and both require TCP addresses to interact with.

Do you also do CRUD to the lease Container?

@oiqwrwer1
Copy link
Author

We don't make explicit calls to the lease container, but my understanding is that as part of the ChangeFeedProcessor.StartAsync(), it maintains (create, patch, delete, etc.) lease documents on the lease container.

Is this a downstream effect of setting up our account as multi-region write? If it's single-region, would the AddressFeedQuery not happen since there are no write replicas?

@ealsur
Copy link
Member

ealsur commented Nov 10, 2022

Then could your problem be that your Resource Token has no permissions to perform operations on the Lease Container?

Like I mentioned, the SDK needs to obtain TCP addresses to interact with a Container. As you correctly mention, the Change Feed Processor needs to maintain the state and it needs to write to the lease container, so it needs the TCP addresses for the lease container. It also reads the Change Feed from the Monitored container, so it also fetches the TCP addresses for that too.

Nothing to do with multi-region. Regional configuration (single or multi) has no effect in this flow.

@oiqwrwer1
Copy link
Author

We do provision a resource token for the lease container. I can try an explicit CRUD method and see if that triggers a 403.

@ealsur
Copy link
Member

ealsur commented Nov 10, 2022

The failure you are getting must be coming from an operation to a particular URL, that URL contains the Container name, that might hint you at which Resource Token might be lacking permissions.

If you believe your Resource Tokens are correctly configured and have enough permissions to interact with both Monitored and Lease containers, my recommendation again is to file a support request. The SDK does not perform validation of the Resource Token, it is just passed along to the service, the service decides if the operation (in this case obtaining Addresses) is allowed or not.

The operations the Change Feed Processor performs on the Monitored Container is a Change Feed read, and on the Lease Container, CRUD operations. Any SDK (regardless of Change Feed Processor) always fetches the Container, Partition, and Address information.

You can do a test with the Lease Container and try to perform a CreateItem with your Resource Token and see if that works.

@oiqwrwer1
Copy link
Author

Thank you for that additional information. I'll try the test on the lease container and go from there.

@oiqwrwer1
Copy link
Author

Update: Outside of the changefeedprocessor API, I tested upserting, reading, and deleting items in the monitored and lease collections using resource tokens.
All of those operations worked.

Given what you mentioned, "AddressFeedQuery is always used. Remember that for the Change Feed Processor you are working with 2 Containers though, the Monitored and the Lease container and both require TCP addresses to interact with.", I don't understand why this ubiquitous request for TCP addresses for each container works fine when I just use CosmosClient.GetContainer().UpsertItemAsync/ ReadItemAsync/ DeleteItemAsync , but when it fails when I call ChangeFeedProcessor.StartAsync() .

@ealsur
Copy link
Member

ealsur commented Nov 10, 2022

Can you share the full callstack or details of the exception?

D:\dbs\el\csdb;\Product\Cosmos\RoutingGateway\Runtime\RequestHandlers\AddressRequestHandler.cs is not a path on the SDK.

Is your client on Gateway or Direct mode?

@ealsur
Copy link
Member

ealsur commented Nov 10, 2022

If it's on Gateway, can you switch to Direct mode and retry CRUD on the 2 containers with the Resource Tokens?

@oiqwrwer1
Copy link
Author

I set CosmosClientOptions ConnectionMode explicitly to Direct and Gateway, and they both worked.
I attached 3 callstacks.
One is for the multi-region-write setup with cosmosclient connectionmode not explicitly set, so whatever is the default.
The others are for single-region-write with cosmosclient connectionmode set to gateway, and direct, respectively.

changefeedprocessortest_multi-region-write-default-cosmosclient-connectionmode.docx
changefeedprocessortest-single-region-write-connectionmode-gateway.docx
changefeedprocessortest-single-region-write-connectionmode-gateway.docx

@ealsur
Copy link
Member

ealsur commented Nov 11, 2022

@oiqwrwer1

Regarding file 1 (multi region), the failure is coming from Gateway. The diagnostics show:

  • It is a ReadAsync operation attempting to read the information from the Database
  • The failure is on the address https://changefeedtokentest.documents.azure.com//addresses/?$resolveFor=dbs&$filter=protocol eq rntbd". This is attempting to fetch the TCP addresses for the Master Partition, not the Data Replicas.
  • One way to repro this outside of CFP is, call client.GetDatabase("the name of the db").ReadAsync
{
    "Summary": {
        "GatewayCalls": {
            "(403, 0)": 1
        }
    },
    "name": "ReadAsync",
    "id": "3f34b5e8-28d2-4358-9938-40ae3a946a28",
    "start time": "07:34:11:337",
    "duration in milliseconds": 270.8726,
    "data": {
        "Client Configuration": {
            "Client Created Time Utc": "2022-11-01T19:34:10.0921096Z",
            "NumberOfClientsCreated": 7,
            "NumberOfActiveClients": 0,
            "User Agent": "cosmos-netstandard-sdk/3.26.1|3.24.1|1|X64|Linux 5.10.16.3-microsoft-stan|.NET 6.0.8|N|F 00000010|",
            "ConnectionConfig": {
                "gw": "(cps:50, urto:10, p:False, httpf: True)",
                "rntbd": "(cto: 5, icto: -1, mrpc: 30, mcpe: 65535, erd: True, pr: ReuseUnicastPort)",
                "other": "(ed:False, be:False)"
            },
            "ConsistencyConfig": "(consistency: NotSet, prgns:[])"
        }
    },
    "children": [
        {
            "name": "Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler",
            "id": "83332228-e8ca-443c-baab-9229ab5b19e3",
            "start time": "07:34:11:348",
            "duration in milliseconds": 253.5496,
            "children": [
                {
                    "name": "Microsoft.Azure.Cosmos.Handlers.DiagnosticsHandler",
                    "id": "9999bd5b-f50e-4f38-886d-8a04e6dfd814",
                    "start time": "07:34:11:359",
                    "duration in milliseconds": 239.9409,
                    "data": {
                        "System Info": {
                            "systemHistory": [
                                {
                                    "dateUtc": "2022-11-01T19:34:10.7547597Z",
                                    "cpu": 0.000,
                                    "memory": 1826720.000,
                                    "threadInfo": {
                                        "isThreadStarving": "no info",
                                        "availableThreads": 32759,
                                        "minThreads": 12,
                                        "maxThreads": 32767
                                    }
                                }
                            ]
                        }
                    },
                    "children": [
                        {
                            "name": "Microsoft.Azure.Cosmos.Handlers.RetryHandler",
                            "id": "575b2e6b-09a1-45a8-a614-b1a3cbb6df8d",
                            "start time": "07:34:11:360",
                            "duration in milliseconds": 237.1163,
                            "children": [
                                {
                                    "name": "Microsoft.Azure.Cosmos.Handlers.RouterHandler",
                                    "id": "a3ef070e-dddf-4f9b-a9df-ee980c0e4e3e",
                                    "start time": "07:34:11:367",
                                    "duration in milliseconds": 214.1938,
                                    "children": [
                                        {
                                            "name": "Microsoft.Azure.Cosmos.Handlers.TransportHandler",
                                            "id": "23378a4e-51ae-4a67-8699-a008df16550d",
                                            "start time": "07:34:11:369",
                                            "duration in milliseconds": 211.7079,
                                            "children": [
                                                {
                                                    "name": "Microsoft.Azure.Documents.ServerStoreModel Transport Request",
                                                    "id": "cb55bb85-0cb2-4c0c-ab79-6d7d76f89f9c",
                                                    "start time": "07:34:11:375",
                                                    "duration in milliseconds": 185.5534,
                                                    "data": {
                                                        "Client Side Request Stats": {
                                                            "Id": "AggregatedClientSideRequestStatistics",
                                                            "ContactedReplicas": [],
                                                            "RegionsContacted": [],
                                                            "FailedReplicas": [],
                                                            "AddressResolutionStatistics": [
                                                                {
                                                                    "StartTimeUTC": "2022-11-01T19:34:11.4579796Z",
                                                                    "EndTimeUTC": "9999-12-31T23:59:59.9999999",
                                                                    "TargetEndpoint": "https://changefeedtokentest.documents.azure.com//addresses/?$resolveFor=dbs&$filter=protocol eq rntbd"
                                                                }
                                                            ],
                                                            "StoreResponseStatistics": [],
                                                            "HttpResponseStats": [
                                                                {
                                                                    "StartTimeUTC": "2022-11-01T19:34:11.4603761Z",
                                                                    "DurationInMs": 47.4651,
                                                                    "RequestUri": "https://changefeedtokentest.documents.azure.com//addresses/?$resolveFor=dbs&$filter=protocol eq rntbd",
                                                                    "ResourceType": "Database",
                                                                    "HttpMethod": "GET",
                                                                    "ActivityId": "4ca2d45a-c35b-4ac2-9deb-bce45a01898d",
                                                                    "StatusCode": "Forbidden",
                                                                    "ReasonPhrase": "Forbidden"
                                                                }
                                                            ]
                                                        },
                                                        "Point Operation Statistics": {
                                                            "Id": "PointOperationStatistics",
                                                            "ActivityId": "4ca2d45a-c35b-4ac2-9deb-bce45a01898d",
                                                            "ResponseTimeUtc": "2022-11-01T19:34:11.5778816Z",
                                                            "StatusCode": 403,
                                                            "SubStatusCode": 0,
                                                            "RequestCharge": 0,
                                                            "RequestUri": "dbs/db",
                                                            "ErrorMessage": "Microsoft.Azure.Documents.DocumentClientException: Insufficient permissions provided in the authorization header for the corresponding request. Please retry with another authorization header.\r\nActivityId: 4ca2d45a-c35b-4ac2-9deb-bce45a01898d, Microsoft.Azure.Documents.Common/2.14.0, Microsoft.Azure.Cosmos.Tracing.TraceData.ClientSideRequestStatisticsTraceDatum, Linux/11 cosmos-netstandard-sdk/3.24.1\n   at Microsoft.Azure.Cosmos.GatewayStoreClient.ParseResponseAsync(HttpResponseMessage responseMessage, JsonSerializerSettings serializerSettings, DocumentServiceRequest request)\n   at Microsoft.Azure.Cosmos.Routing.GatewayAddressCache.GetMasterAddressesViaGatewayAsync(DocumentServiceRequest request, ResourceType resourceType, String resourceAddress, String entryUrl, Boolean forceRefresh, Boolean useMasterCollectionResolver)\n   at Microsoft.Azure.Cosmos.Routing.GatewayAddressCache.ResolveMasterAsync(DocumentServiceRequest request, Boolean forceRefresh)\n   at Microsoft.Azure.Cosmos.Routing.GatewayAddressCache.TryGetAddressesAsync(DocumentServiceRequest request, PartitionKeyRangeIdentity partitionKeyRangeIdentity, ServiceIdentity serviceIdentity, Boolean forceRefreshPartitionAddresses, CancellationToken cancellationToken)\n   at Microsoft.Azure.Cosmos.AddressResolver.ResolveAddressesAndIdentityAsync(DocumentServiceRequest request, Boolean forceRefreshPartitionAddresses, CancellationToken cancellationToken)\n   at Microsoft.Azure.Cosmos.AddressResolver.ResolveAsync(DocumentServiceRequest request, Boolean forceRefreshPartitionAddresses, CancellationToken cancellationToken)\n   at Microsoft.Azure.Cosmos.Routing.GlobalAddressResolver.ResolveAsync(DocumentServiceRequest request, Boolean forceRefresh, CancellationToken cancellationToken)\n   at Microsoft.Azure.Documents.AddressSelector.ResolveAddressesAsync(DocumentServiceRequest request, Boolean forceAddressRefresh)\n   at Microsoft.Azure.Documents.AddressSelector.ResolveAllTransportAddressUriAsync(DocumentServiceRequest request, Boolean includePrimary, Boolean forceRefresh)\n   at Microsoft.Azure.Documents.StoreReader.ReadMultipleReplicasInternalAsync(DocumentServiceRequest entity, Boolean includePrimary, Int32 replicaCountToRead, Boolean requiresValidLsn, Boolean useSessionToken, ReadMode readMode, Boolean checkMinLSN, Boolean forceReadAll)\n   at Microsoft.Azure.Documents.StoreReader.ReadMultipleReplicaAsync(DocumentServiceRequest entity, Boolean includePrimary, Int32 replicaCountToRead, Boolean requiresValidLsn, Boolean useSessionToken, ReadMode readMode, Boolean checkMinLSN, Boolean forceReadAll)\n   at Microsoft.Azure.Documents.QuorumReader.ReadQuorumAsync(DocumentServiceRequest entity, Int32 readQuorum, Boolean includePrimary, ReadMode readMode)\n   at Microsoft.Azure.Documents.QuorumReader.ReadStrongAsync(DocumentServiceRequest entity, Int32 readQuorumValue, ReadMode readMode)\n   at Microsoft.Azure.Documents.ReplicatedResourceClient.<>c__DisplayClass31_0.<<InvokeAsync>b__0>d.MoveNext()\n--- End of stack trace from previous location ---\n   at Microsoft.Azure.Documents.RequestRetryUtility.ProcessRequestAsync[TRequest,IRetriableResponse](Func`1 executeAsync, Func`1 prepareRequest, IRequestRetryPolicy`2 policy, CancellationToken cancellationToken, Func`1 inBackoffAlternateCallbackMethod, Nullable`1 minBackoffForInBackoffCallback)\n   at Microsoft.Azure.Documents.ShouldRetryResult.ThrowIfDoneTrying(ExceptionDispatchInfo capturedException)\n   at Microsoft.Azure.Documents.RequestRetryUtility.ProcessRequestAsync[TRequest,IRetriableResponse](Func`1 executeAsync, Func`1 prepareRequest, IRequestRetryPolicy`2 policy, CancellationToken cancellationToken, Func`1 inBackoffAlternateCallbackMethod, Nullable`1 minBackoffForInBackoffCallback)\n   at Microsoft.Azure.Documents.StoreClient.ProcessMessageAsync(DocumentServiceRequest request, CancellationToken cancellationToken, IRetryPolicy retryPolicy, Func`2 prepareRequestAsyncDelegate)\n   at Microsoft.Azure.Cosmos.Handlers.TransportHandler.ProcessMessageAsync(RequestMessage request, CancellationToken cancellationToken)\n   at Microsoft.Azure.Cosmos.Handlers.TransportHandler.SendAsync(RequestMessage request, CancellationToken cancellationToken)",
                                                            "RequestSessionToken": null,
                                                            "ResponseSessionToken": null,
                                                            "BELatencyInMs": null
                                                        }
                                                    }
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}

@oiqwrwer1
Copy link
Author

Gotcha, is the multi-write exception a red herring then? We had unintentionally tested with multi-region write when setting up the test account. IRL, we'll be using single-region write.
Are the CosmosExceptions encountered using single-region-write change feed processor resolvable while using resource tokens?

@ealsur
Copy link
Member

ealsur commented Nov 11, 2022

The single-region error is similar, the only difference is the error is happening when Gateway attempts to obtain the addresses (instead of your client).

You can see ResourceType: Database, OperationType: Read (which is the ReadAsync on the Database).

There is certainly some permissions issue between the Resource Token and the Master Partition TCP Addresses, I don't know about which are the Permissions available to Resource Tokens, but it looks like whatever it was defined for this token, is not enough to access the Database addresses. The CRUD test would not have caught this because CRUD uses the Data Replica (I suggested this because without the exception, I assumed the problem was obtaining the Data Replica Addresses).

@oiqwrwer1
Copy link
Author

oiqwrwer1 commented Nov 11, 2022

I still don't understand why we do not hit those CosmosExceptions when doing CRUD outside of the change feed processor (CFP) API but do hit them when using CFP.StartAsync. We are using the same resource tokens and operating on the same collections for both.
Is one way to resolve this then for us to modify the permissions on the resource tokens we are using? Our understanding from the MS documentation on resource tokens is that they cannot be scoped to databases, only collections.

@ealsur
Copy link
Member

ealsur commented Nov 11, 2022

Because CRUD does not do Database.ReadAsync operations but CFP.StartAsync does.

CRUD:

  • Needs the Partition/Routing information + Data Replica TCP addresses

StartAsync:

  • Needs to read Database + Container information
  • Whatever CRUD needs

StartAsync requires to obtain certain information from the Database and Container:

@oiqwrwer1
Copy link
Author

Ah got it, so is there a path forward to use the CFP API in conjunction with resource tokens somehow? What recourse do we have?

@ealsur
Copy link
Member

ealsur commented Nov 11, 2022

@oiqwrwer1 Resource Tokens are expected to work with CFP. The gap that you need to identify is, which Permissions are missing from the Resource Token that are affecting the ability of the Token to perform Read Database operations. The Read Database operation is a requirement for CFP, but it is also an operation in itself (you can call database.ReadAsync from your application code). Like any other operation, you can limit access through Permissions. Which permissions affect this one in particular though, I cannot say (SDK does not enforce the permissions).

@oiqwrwer1
Copy link
Author

This link seems to say resource tokens cannot be scoped to the level of database, only to containers: https://learn.microsoft.com/en-us/azure/cosmos-db/secure-access-to-data?tabs=using-primary-key#permissions

is that incorrect?

@ealsur
Copy link
Member

ealsur commented Nov 14, 2022

@oiqwrwer1 Can you perform a database.ReadAsync operation using the resource tokens? Do you get the same error?

@oiqwrwer1
Copy link
Author

I get a 403, which makes sense.
readAsync_cosmosexception.txt

@ealsur
Copy link
Member

ealsur commented Nov 14, 2022

Ok, this is the exact same error you are seeing in CFP, so it means the root cause is found.

Can you share which are the Permissions being defined on the Resource Token?

@oiqwrwer1
Copy link
Author

All of our tokens have the permission "All"

@ealsur ealsur added improvement Change to existing functional behavior (perf, logging, etc.) ChangeFeed and removed needs-more-information labels Nov 14, 2022
@ealsur
Copy link
Member

ealsur commented Nov 14, 2022

@oiqwrwer1 Currently it seems like this is not possible, the Read Database operation is not allowed for Resource Tokens. We can use this Issue to track if we can avoid this call and find the required piece of information through some other mean to unblock resource tokens.

@ealsur
Copy link
Member

ealsur commented Nov 14, 2022

We can workaround this by changing the code a bit.

Changing:

string databaseRid = await ((DatabaseInternal)((ContainerInternal)monitoredContainer).Database).GetRIDAsync(cancellationToken);

for:

ResourceId resourceId = ResourceId.Parse(containerRid);
string databaseRid = = resourceId.DatabaseId.ToString();

Would avoid the Read Database call and fulfill the required information.

@ealsur ealsur moved this to Triage in Azure Cosmos SDKs Nov 14, 2022
@ealsur ealsur assigned NaluTripician and ealsur and unassigned NaluTripician Nov 14, 2022
Repository owner moved this from Triage to Done in Azure Cosmos SDKs Nov 15, 2022
@oiqwrwer1
Copy link
Author

Hi Matias, thank you for fixing this issue.
What version of the SDK will include the fix, and when will it be available?

@ealsur
Copy link
Member

ealsur commented Nov 16, 2022

@oiqwrwer1 The next available version (probably 3.32.0). We don't yet have a fixed date for release, there are release freezes due to holidays and we need to see when it could be a next release window that aligns with the current features being completed.

@oiqwrwer1
Copy link
Author

Ah gotcha. Would it be possible for me to checkout the branch and test this locally to check if there are other downstream issues that were precluded by this issue?
Is there anything more I'd need to do to test locally than to checkout the branch and use my existing configuration (cosmos db uri, resource tokens, etc.)

@ealsur
Copy link
Member

ealsur commented Nov 17, 2022

You can fork the repo (the fix is already in master) and test it, the linked PR already has an end to end test using Resource Token (not mocks) that mirrors the scenario and it's working/passing.

@oiqwrwer1
Copy link
Author

A resolved Github issue isn't really the right venue for my question, so lmk if there's a better place to move this discussion to (e.g. email), but I forked the repo and am referencing the component projects in my code.
When I try to instantiate CosmosClients, I'm hitting a FileNotFoundException:
Exception has occurred: CLR/System.IO.FileNotFoundException An exception of type 'System.IO.FileNotFoundException' occurred in Microsoft.Azure.Cosmos.Client.dll but was not handled in user code: 'Could not load file or assembly 'Microsoft.Azure.Cosmos.Core, Version=2.11.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35'. The system cannot find the file specified.'

The Microsoft.Azure.Cosmos.csproj has <None Include="$(NugetPackageRoot)\Microsoft.HybridRow\$(HybridRowVersion)\lib\netstandard2.0\Microsoft.Azure.Cosmos.Core.dll" Pack="true" IsAssembly="true" PackagePath="lib\netstandard2.0" /> which looks like the assembly that's missing.

Microsoft.Azure.Cosmos.Client.dll!Microsoft.Azure.Cosmos.Tracing.TraceData.ClientConfigurationTraceDatum.GetSerializedDatum() Line 98 (\workspaces\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\Tracing\TraceData\ClientConfigurationTraceDatum.cs:98)
Microsoft.Azure.Cosmos.Client.dll!Microsoft.Azure.Cosmos.Tracing.TraceData.ClientConfigurationTraceDatum.ClientConfigurationTraceDatum(Microsoft.Azure.Cosmos.CosmosClientContext cosmosClientContext, System.DateTime startTime) Line 38 (\workspaces\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\Tracing\TraceData\ClientConfigurationTraceDatum.cs:38)
Microsoft.Azure.Cosmos.Client.dll!Microsoft.Azure.Cosmos.CosmosClient.CosmosClient(string accountEndpoint, Microsoft.Azure.Cosmos.AuthorizationTokenProvider authorizationTokenProvider, Microsoft.Azure.Cosmos.CosmosClientOptions clientOptions) Line 339 (\workspaces\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\CosmosClient.cs:339)
Microsoft.Azure.Cosmos.Client.dll!Microsoft.Azure.Cosmos.CosmosClient.CosmosClient(string accountEndpoint, string authKeyOrResourceToken, Microsoft.Azure.Cosmos.CosmosClientOptions clientOptions) Line 235 (\workspaces\azure-cosmos-dotnet-v3\Microsoft.Azure.Cosmos\src\CosmosClient.cs:235)
Epic.Cloud.UC.Common.dll!Epic.Cloud.UC.Common.CosmosDB.CosmosBroker.CosmosBroker(Epic.Cloud.UC.Common.Encryption.IDatabaseEncryptionManager encryptionManager, Microsoft.Extensions.Options.IOptionsMonitor<Epic.Cloud.UC.Common.CosmosDB.CosmosDBKeyOptions> cosmosDbKeyOptions, Microsoft.Extensions.Options.IOptionsMonitor<Epic.Cloud.UC.Common.CosmosDB.CosmosDBTokenOptions> cosmosDbTokenOptions, Microsoft.Extensions.Options.IOptionsMonitor<Epic.Cloud.UC.Common.CosmosDB.CosmosDBSettings> cosmosDBSettings, Microsoft.Extensions.Logging.ILogger<Epic.Cloud.UC.Common.CosmosDB.CosmosBroker> logger) Line 162 (\workspaces\azure-cosmos-dotnet-v3\src\Epic.Cloud.UC.Common\CosmosDB\CosmosBroker.cs:162)
[External Code] (Unknown Source:0)
Epic.Cloud.UC.Voice.dll!Program.<Main>$(string[] args) Line 109 (\workspaces\azure-cosmos-dotnet-v3\src\Epic.Cloud.UC.Voice\Program.cs:109)
[External Code] (Unknown Source:0)

@ealsur
Copy link
Member

ealsur commented Nov 18, 2022

If you are doing project reference, you can follow the other projects referencing the Microsoft.Azure.Cosmos.csproj, like:

<ProjectReference Include="..\..\..\Microsoft.Azure.Cosmos\src\Microsoft.Azure.Cosmos.csproj" />
<PackageReference Include="Microsoft.Azure.Cosmos.Direct" Version="[$(DirectVersion)]" />
<PackageReference Include="Microsoft.HybridRow" Version="[$(HybridRowVersion)]" />

The versions of the dependencies are coming from: https://github.com/Azure/azure-cosmos-dotnet-v3/blob/master/Directory.Build.props#L7

@oiqwrwer1
Copy link
Author

Great, I tested the fix locally and confirmed it works for our workflows.
Thank you!

@oiqwrwer1
Copy link
Author

@ealsur Checking in on if there's a fixed date for releasing this fix. Thanks!

@ealsur
Copy link
Member

ealsur commented Jan 10, 2023

Holidays tend to stretch release timelines a bit, we are trying to prepare the release with this (and other fixes and features) within 1-2 weeks.

@oiqwrwer1
Copy link
Author

oiqwrwer1 commented Apr 18, 2023

Hi @ealsur , I have a follow-up question due to an issue we're running into.
question: What is the recommended way to handle resource token rotations such that the ChangeFeedProcessor stays valid?

We have a cosmosBroker class that handles disposing of old clients and creating new ones whenever resource tokens change.
We also have a change feed processor service that executes the below code upon program startup.
problem: When the resource tokens have rotated, this ChangeFeedProcessor instance is no longer authorized to communicate with Cosmos DB as the CosmosClient instantiated using the now-defunct token has been disposed.

ChangeFeedProcessor cfp = cosmosBroker.GetContainer(_monitoredContainer)
			.GetChangeFeedProcessorBuilder<dynamic>(_processorName, OnChangesAsync)
			.WithLeaseAcquireNotification(ChangeFeedMonitorLeaseAcquiredDelegate)
			.WithLeaseReleaseNotification(ChangeFeedMonitorLeaseReleasedDelegate)
			.WithErrorNotification(ChangeFeedMonitorErrorDelegate)
			.WithInstanceName(_changeFeedInstanceName)
			.WithLeaseContainer(_cosmosBroker.GetContainer(_leaseContainer))
			.Build();

cfp.StartAsync()

@ealsur
Copy link
Member

ealsur commented Apr 19, 2023

If you Dispose the client, then the ChangeFeedProcessor won't work, the whole stack is disposed.

Just like you disposed the client, because the Change Feed Processor is created out of a client instance, you need to Stop the CFP and create a new one off the new client.

@oiqwrwer1
Copy link
Author

As an alternative to disposing clients and instantiating new ones per token rotation, would initializing our CosmosClients with a TokenCredential including a RenewTokenFunc that can be called to fetch a current token every renewFrequency interval work?
And that way, CosmosClients do not need to be disposed and new ones instantiated?

@ealsur
Copy link
Member

ealsur commented Apr 19, 2023

Yes it might, although I have personally never tested this, curious if this would work (and would make a good documentation article).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ChangeFeed improvement Change to existing functional behavior (perf, logging, etc.)
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants