Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ChangeFeed] CrossPartitionChangeFeedAsyncEnumerator does not reset HasMoreResults on error #2473

Closed
neildsh opened this issue May 17, 2021 · 1 comment · Fixed by #2474
Closed
Labels
bug Something isn't working ChangeFeed

Comments

@neildsh
Copy link
Contributor

neildsh commented May 17, 2021

Describe the bug
CrossPartitionChangeFeedAsyncEnumerator can recurse to state where it does not have a next page in case of some error / retry conditions like 429, and it does not reset the HasMoreResults, which ends up throwing InvalidOperationException. We should probably raise a more specific error if retries are exceeded etc.

To Reproduce
The problem is not specific to databases. We just called the API function ChangeFeedIterator.ReadNextAsync and we encountered this exception.
We can not reproduce it reliably. It happens sometimes (2~3 times in a month)

Expected behavior
Perhaps throw a more specific exception that lets the customer know that they need to recreate the iterator.

Actual behavior
We encountered "Change Feed should always have a next page" exception while using ChangeFeedIterator:

Encountered unexpected exception:Microsoft.Azure.AISC.Common.ErrorHandling.InternalServerErrorException: Unknown error occurred while processing this request., Microsoft.Azure.AISC.Common/1.0.01570.3085
---> System.InvalidOperationException: ChangeFeed should always have a next page.
at Microsoft.Azure.Cosmos.ChangeFeed.Pagination.CrossPartitionChangeFeedAsyncEnumerator.MoveNextAsync()
at Microsoft.Azure.Cosmos.ChangeFeed.ChangeFeedIteratorCore.ReadNextAsync(CancellationToken cancellationToken)
at Microsoft.Azure.Cosmos.FeedIteratorCore`1.ReadNextAsync(CancellationToken cancellationToken)
at Microsoft.Azure.AISC.StoreProvider.Table.CosmosFeedIterator`1.ReadNextOnceAsync(CancellationToken cancellationToken)
at Microsoft.Azure.AISC.StoreProvider.StoreRequestExecutor.<>c__DisplayClass7_0`1.<<ExecuteAsync>b__0>d.MoveNext() in C:\source\src\subsystems\common\managed_lib\storeprovider\StoreRequestExecutor.cs:line 151
--- End of stack trace from previous location where exception was thrown ---
at Microsoft.Azure.AISC.Common.Utils.Retry.BackoffRetryUtility`1.ExecuteRetryAsync(Func`2 callbackMethod, Func`3 callShouldRetry, Func`1 inBackoffAlternateCallbackMethod, ILogger logger, TimeSpan minBackoffForInBackoffCallback, CancellationToken cancellationToken, Action`1 preRetryCallback)
--- End of inner exception stack trace ---
at Microsoft.Azure.AISC.Common.Utils.Retry.BackoffRetryUtility`1.ExecuteRetryAsync(Func`2 callbackMethod, Func`3 callShouldRetry, Func`1 inBackoffAlternateCallbackMethod, ILogger logger, TimeSpan minBackoffForInBackoffCallback, CancellationToken cancellationToken, Action`1 preRetryCallback) in C:\source\src\subsystems\common\managed_lib\utils\Retry\BackoffRetryUtility.cs:line 173
at Microsoft.Azure.AISC.StoreProvider.StoreRequestExecutor.ExecuteAsync[T](Func`1 callbackMethod, String callerFunctionName, String containerName, String partitionKey, String id, String location, Boolean isWriteEvent, IRetryPolicy retryPolicy, CancellationToken cancellationToken, Boolean isInitializationCall) in C:\source\src\subsystems\common\managed_lib\storeprovider\StoreRequestExecutor.cs:line 181
at Microsoft.Azure.AISC.StoreProvider.Table.CosmosFeedIterator`1.ReadNextAsync(CancellationToken cancellationToken) in C:\source\src\subsystems\common\managed_lib\storeprovider\Table\CosmosFeedIterator.cs:line 76
at Microsoft.Azure.AISC.Scheduler.Regional.Coordinator.RunningResourceChangeFeedProcessor.ProcessAsync(Cursor cursor, CancellationToken cancellationToken) in C:\source\src\subsystems\scheduler\scheduler.regional.coordinator\RunningResourceChangeFeedProcessor.cs:line 137
at Microsoft.Azure.AISC.StoreProvider.Table.VolatilePartitionKeyProcessor.ProcessPartitionKeyAsync(EntityStorePartitionKeyInfo partitionKey, CancellationToken cancellationToken) in C:\source\src\subsystems\common\managed_lib\storeprovider\Table\VolatilePartitionKeyProcessor.cs:line 58
at Microsoft.Azure.AISC.StoreProvider.Table.PartitionKeyProcessor.RunOnceAsync(CancellationToken cancellationToken) in C:\source\src\subsystems\common\managed_lib\storeprovider\Table\PartitionKeyProcessor.cs:line 78
at Microsoft.Azure.AISC.Common.PeriodicTask.RunAsync(CancellationToken cancellationToken) in C:\source\src\subsystems\common\managed_lib\common\PeriodicTask.cs:line 145. Will resume processing...
@neildsh neildsh added ChangeFeed bug Something isn't working labels May 17, 2021
@j82w
Copy link
Contributor

j82w commented May 17, 2021

@neildsh shouldn't the fix be allowing retries on errors? What is the reason for needing to recreate the entire iterator to just retry on a single 429 or other transient failure?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ChangeFeed
Projects
None yet
2 participants