-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preview - Change Feed Pull model #1210
Conversation
@ealsur, thanks for all the work on this. I saw you released 3.8.0 yesterday, but this was not included. When can we expect this? This is really blocking us to upgrade from v2... |
@WimVergouwe We are working on adding more features related to this and the API changed, see the latest linked PR. |
@ealsur, any update on this? We're still stuck on v2... |
@WimVergouwe sadly we could not GA just yet, there are new features on the service side which are coming soon (see //Build change feed announcements) so we are making sure our current API model is compatible with them. |
Also waiting for this. A quick look at 3.12.0 indicated still in preview. |
In v2, we have our own custom logic to process change feed across all partitions and we're also handling partition splits our self. This is currently blocking us from moving to v3. |
@zbynek001 The APIs in V3 are split-proof, so when you get the FeedRanges, you can start the iterators and you won't face splits. Reference https://docs.microsoft.com/en-us/azure/cosmos-db/change-feed-pull-model |
Yes, but the FeedRange is not scaling well. If we have at the beginning collection with 1 partition, we get 1 FeedRange. If the collection grows over time to thousands of partitions, we still have that one FeedRange, so we cannot spread the processing over several machines. |
For scaling as the partitions grow, we are having the discussion here #1680 Now for the second point you mention, it's the same behavior with V2 PartitionKeyRangeId, each one can be consumed independently and the continuation is independent. If you have 2 FeedRanges, each one is independent of the other. What you are looking for is probably what we are discussing in the scaling out support. |
I have run into a problem when using the pull model. I initiate the feed change iterator with a specific partition key, but I get changes from other, very similar, partitions. For example, my partitions begin with a 12 character code like B3FAZLSZTZHYJL and are followed by another identifier such as "current." If I ask for only the partition "B3FAZLSZTZHYJLcurrent," I will get other changes that begin with "B3FAZLSZTZHYJL." For example, I get the changes for partition B3FAZLSZTZHYJL0006490603. My guess is that this is due to how the FeedRange interprets the partition key and derives a range. Is there a workaround you could suggest for this? |
@benjifarmer Can you create an Issue for this? This is a PR comment 😄 |
Sure! But I think I may have been mistaken. When I call GetFeedRangesAsync I see that I only have one physical partition, so this might be the expected behavior? Or should defining a partition key on GetChangeFeedIterator only return changes from the specific partition key regardless of how many physical partitions there are? |
It should be returning changes just for that Partition Key value. Similar issue: #1796 |
I can confirm #1796 is a duplicate of my issue. I have reverted to 3.11.0-preview and everything works fine. I'll be patiently awaiting the next preview release. Thank you. |
Change Feed pull model
Description
This PR adds support to consume the Change Feed through a pull model, under the Preview flag.
FeedToken and parallelization
The PR also introduces the
FeedToken
concept as a consumption and parallelization unit. The token represents a range inside a Container (it could be the full Container or part) and it also serves as continuation support.The
FeedToken
can be used to consume the Change Feed and a user can obtain a list of these tokens by calling:This method provides a list of Tokens that can be used to parallelize Change Feed consumption
FeedIterator
. TheFeedIterator
is used to consume the Change Feed for a particularFeedToken
. This practically allows computational distribution of tokens across multiple threads or even machines:The
FeedToken
can be captured from an existing iterator (FeedIterator.FeedToken
) and saved/stored for later use:FeedTokens can also be serialized if needed depending on the storing mechanism:
Consume the entire Container
While the
GetFeedTokensAsync
approach allows parallelization, in some cases the user might just want 1 consumer of the Change Feed, with no parallelization.In that case, it is as simple as creating a FeedIterator without any FeedToken as input:
This iterator also provides the
FeedIterator.FeedToken
to store and resume at a later point in time.Change Feed for a Partition Key
This PR also enables the same construct to read the Change Feed for a particular Partition Key, again, it's an overload of the
GetChangeFeedStreamIterator
/GetChangeFeedIterator
:Support for POCOs
The Change Feed iterator comes in two flavors, one with Stream support (returns
ResponseMessage
onReadNextAsync
) and one for POCO Types (returnsFeedResponse<T> on
ReadNextAsync):Scale support
If a FeedToken represents multiple units of Change Feed consumption, the
IEnumerable<FeedToken> FeedToken.Scale()
can be used to attempt a Scale out.This is useful if we initially start reading the Change Feed for the entire Container, and we later want to try and parallelize the consumption:
Partition Key Ranges
For monitoring purposes, it is also now possible to obtain which are the Partition Key Ranges that a particular FeedToken represents, through the
Container.GetPartitionKeyRangesAsync
method, that takes a FeedToken as parameter:Type of change
Closing issues
Closes #972
Closes #831