-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChangeFeedProcessor API and Resource Tokens #3561
Comments
This has nothing to do with Change Feed Processor. AddressFeedQuery is a core process of the SDK as a whole: https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/sdk-connection-modes#direct-mode It is the SDK requesting the replica TCP addresses to even start doing operations and it's something done on the SDK regardless of you using the Change Feed Processor or doing ReadItem. |
We have been making CRUD operations on collections using resource tokens for months at this point and have never run into a 403. It only occurs when we invoke |
Do you also do CRUD to the lease Container? |
We don't make explicit calls to the lease container, but my understanding is that as part of the ChangeFeedProcessor.StartAsync(), it maintains (create, patch, delete, etc.) lease documents on the lease container. Is this a downstream effect of setting up our account as multi-region write? If it's single-region, would the AddressFeedQuery not happen since there are no write replicas? |
Then could your problem be that your Resource Token has no permissions to perform operations on the Lease Container? Like I mentioned, the SDK needs to obtain TCP addresses to interact with a Container. As you correctly mention, the Change Feed Processor needs to maintain the state and it needs to write to the lease container, so it needs the TCP addresses for the lease container. It also reads the Change Feed from the Monitored container, so it also fetches the TCP addresses for that too. Nothing to do with multi-region. Regional configuration (single or multi) has no effect in this flow. |
We do provision a resource token for the lease container. I can try an explicit CRUD method and see if that triggers a 403. |
The failure you are getting must be coming from an operation to a particular URL, that URL contains the Container name, that might hint you at which Resource Token might be lacking permissions. If you believe your Resource Tokens are correctly configured and have enough permissions to interact with both Monitored and Lease containers, my recommendation again is to file a support request. The SDK does not perform validation of the Resource Token, it is just passed along to the service, the service decides if the operation (in this case obtaining Addresses) is allowed or not. The operations the Change Feed Processor performs on the Monitored Container is a Change Feed read, and on the Lease Container, CRUD operations. Any SDK (regardless of Change Feed Processor) always fetches the Container, Partition, and Address information. You can do a test with the Lease Container and try to perform a CreateItem with your Resource Token and see if that works. |
Thank you for that additional information. I'll try the test on the lease container and go from there. |
Update: Outside of the changefeedprocessor API, I tested upserting, reading, and deleting items in the monitored and lease collections using resource tokens. Given what you mentioned, " |
Can you share the full callstack or details of the exception?
Is your client on Gateway or Direct mode? |
If it's on Gateway, can you switch to Direct mode and retry CRUD on the 2 containers with the Resource Tokens? |
I set CosmosClientOptions ConnectionMode explicitly to Direct and Gateway, and they both worked. changefeedprocessortest_multi-region-write-default-cosmosclient-connectionmode.docx |
Regarding file 1 (multi region), the failure is coming from Gateway. The diagnostics show:
|
Gotcha, is the multi-write exception a red herring then? We had unintentionally tested with multi-region write when setting up the test account. IRL, we'll be using single-region write. |
The single-region error is similar, the only difference is the error is happening when Gateway attempts to obtain the addresses (instead of your client). You can see There is certainly some permissions issue between the Resource Token and the Master Partition TCP Addresses, I don't know about which are the Permissions available to Resource Tokens, but it looks like whatever it was defined for this token, is not enough to access the Database addresses. The CRUD test would not have caught this because CRUD uses the Data Replica (I suggested this because without the exception, I assumed the problem was obtaining the Data Replica Addresses). |
I still don't understand why we do not hit those CosmosExceptions when doing CRUD outside of the change feed processor (CFP) API but do hit them when using CFP.StartAsync. We are using the same resource tokens and operating on the same collections for both. |
Because CRUD does not do CRUD:
StartAsync:
StartAsync requires to obtain certain information from the Database and Container:
|
Ah got it, so is there a path forward to use the CFP API in conjunction with resource tokens somehow? What recourse do we have? |
@oiqwrwer1 Resource Tokens are expected to work with CFP. The gap that you need to identify is, which Permissions are missing from the Resource Token that are affecting the ability of the Token to perform Read Database operations. The Read Database operation is a requirement for CFP, but it is also an operation in itself (you can call database.ReadAsync from your application code). Like any other operation, you can limit access through Permissions. Which permissions affect this one in particular though, I cannot say (SDK does not enforce the permissions). |
This link seems to say resource tokens cannot be scoped to the level of database, only to containers: https://learn.microsoft.com/en-us/azure/cosmos-db/secure-access-to-data?tabs=using-primary-key#permissions is that incorrect? |
@oiqwrwer1 Can you perform a |
I get a 403, which makes sense. |
Ok, this is the exact same error you are seeing in CFP, so it means the root cause is found. Can you share which are the Permissions being defined on the Resource Token? |
All of our tokens have the permission "All" |
@oiqwrwer1 Currently it seems like this is not possible, the Read Database operation is not allowed for Resource Tokens. We can use this Issue to track if we can avoid this call and find the required piece of information through some other mean to unblock resource tokens. |
We can workaround this by changing the code a bit. Changing: Line 124 in 01c1cd4
for:
Would avoid the Read Database call and fulfill the required information. |
Hi Matias, thank you for fixing this issue. |
@oiqwrwer1 The next available version (probably 3.32.0). We don't yet have a fixed date for release, there are release freezes due to holidays and we need to see when it could be a next release window that aligns with the current features being completed. |
Ah gotcha. Would it be possible for me to checkout the branch and test this locally to check if there are other downstream issues that were precluded by this issue? |
You can fork the repo (the fix is already in master) and test it, the linked PR already has an end to end test using Resource Token (not mocks) that mirrors the scenario and it's working/passing. |
A resolved Github issue isn't really the right venue for my question, so lmk if there's a better place to move this discussion to (e.g. email), but I forked the repo and am referencing the component projects in my code. The Microsoft.Azure.Cosmos.csproj has
|
If you are doing project reference, you can follow the other projects referencing the Microsoft.Azure.Cosmos.csproj, like: azure-cosmos-dotnet-v3/Microsoft.Azure.Cosmos.Samples/Tools/Benchmark/CosmosBenchmark.csproj Lines 41 to 43 in 3e9bab6
The versions of the dependencies are coming from: https://github.com/Azure/azure-cosmos-dotnet-v3/blob/master/Directory.Build.props#L7 |
Great, I tested the fix locally and confirmed it works for our workflows. |
@ealsur Checking in on if there's a fixed date for releasing this fix. Thanks! |
Holidays tend to stretch release timelines a bit, we are trying to prepare the release with this (and other fixes and features) within 1-2 weeks. |
Hi @ealsur , I have a follow-up question due to an issue we're running into. We have a cosmosBroker class that handles disposing of old clients and creating new ones whenever resource tokens change.
|
If you Dispose the client, then the ChangeFeedProcessor won't work, the whole stack is disposed. Just like you disposed the client, because the Change Feed Processor is created out of a client instance, you need to Stop the CFP and create a new one off the new client. |
As an alternative to disposing clients and instantiating new ones per token rotation, would initializing our CosmosClients with a TokenCredential including a RenewTokenFunc that can be called to fetch a current token every renewFrequency interval work? |
Yes it might, although I have personally never tested this, curious if this would work (and would make a good documentation article). |
In using the ChangeFeedProcessor.StartAsync(), we have run into a 403 issue
"Insufficient permissions provided in the authorization header for the corresponding request. Please retry with another authorization header. ActivityId: 250e7411-d830-4fcf-b82a-0dd654a1c62b, "
The issue is due to the StartAsync() call making a GET request scoped to a CosmosDB database, not collection.
D:\dbs\el\csdb;\Product\Cosmos\RoutingGateway\Runtime\RequestHandlers\AddressRequestHandler.cs:line 57" Argument5="changefeedtokentest" Argument6="AddressFeedQuery" Argument7="GET" TraceSource="DocDBTrace"
My question: is the change feed processor SDK not capable of working with resource tokens? We have thus far been able to use resource tokens to instantiate 1 CosmosClient per token and do operations using 1 CosmosClient per collection. Given that we pass collections via
CosmosClient.GetContainer()
calls as part of ChangeFeedProcessor'sBuild()
, I would expect operations on the monitored, lease, and target collections to work.The text was updated successfully, but these errors were encountered: