-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for bulk operations incorrect routing on split #21420
Conversation
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
…ValidationTests::splitQueryContinuationToken
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
…eedDocumentsAfterSplit()
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
sdk/cosmos/azure-cosmos/src/test/java/com/azure/cosmos/CosmosBulkGatewayTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - LGTM. I will block Spark Connector GA on this PR being merged
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/batch/BulkExecutor.java
Outdated
Show resolved
Hide resolved
...ure-cosmos/src/main/java/com/azure/cosmos/implementation/batch/BulkOperationRetryPolicy.java
Outdated
Show resolved
Hide resolved
...re-cosmos/src/main/java/com/azure/cosmos/implementation/caches/RxPartitionKeyRangeCache.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do we differentiate between GW vs Direct mode?
isn't only GW mode affected by the bug?
Implementing PR comments Increasing timeout on tests
Only GW mode is effected by this bug. |
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/check-enforcer evaluate |
When a collection split occurs during bulk opertions, the requests were still being retried on the original partition as the partitionkeyrange cache that helps in routing was not being refreshed correctly. The effect is that the user received a failure with a status code 410 and substatus code 1002.
This PR addresses the above issue by refreshing the partitionkeyrange cache when a collection split occurs
Fixes #21126