Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed an issue in QuorumReader when quorum could not be selected even though 1 secondary and Primary are reachable and in sync #38832

Conversation

FabianMeiswinkel
Copy link
Member

Description

When QuorumReader (used for consistency levels Bounded Staleness and Strong) is not able to select a quorum because it only retrieved a response from a single Secondary. The ReadPrimaryAsync call will fail because the replica set size is larger than ReadQuorum. Any retry there-after should include the Primary - so, that Primary and one Secondary can build quorum - otherwise as long as only one Secondary is actually reachable we can't serve the read operations even when 1 Secondary and 1 Primary should be able to build quorum.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

@FabianMeiswinkel
Copy link
Member Author

/azp run java - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@FabianMeiswinkel
Copy link
Member Author

/azp run java - cosmos - spark

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@NaluTripician NaluTripician left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@xinlian12 xinlian12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the quick fix :)

@FabianMeiswinkel
Copy link
Member Author

/azp run java - cosmos - spark

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@FabianMeiswinkel
Copy link
Member Author

/azp run java - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@FabianMeiswinkel
Copy link
Member Author

/azp run java - cosmos - tests

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@FabianMeiswinkel FabianMeiswinkel merged commit b37d95a into Azure:main Feb 20, 2024
65 checks passed
tvaron3 pushed a commit that referenced this pull request Apr 23, 2024
… though 1 secondary and Primary are reachable and in sync (#38832)

* Added overload with CosmosReadManyRequestOptions

* Fixing style errors

* Update CosmosReadManyRequestOptions.java

* Fixing build break

* Avoiding a possibly breaking change by injecting a base class - CosmosQueryRequestOptions was not final.

* Update TransientIOErrorsRetryingIteratorITest.scala

* Addressing code review feedback

* Added fallback to include Primary in QuorumReader when quorum could not be selected.

* Update CHANGELOG.md

* Update CosmosItemTest.java

* Update CosmosItemTest.java
tvaron3 added a commit that referenced this pull request Apr 25, 2024
* Fixed an issue in QuorumReader when quorum could not be selected even though 1 secondary and Primary are reachable and in sync (#38832)

* Added overload with CosmosReadManyRequestOptions

* Fixing style errors

* Update CosmosReadManyRequestOptions.java

* Fixing build break

* Avoiding a possibly breaking change by injecting a base class - CosmosQueryRequestOptions was not final.

* Update TransientIOErrorsRetryingIteratorITest.scala

* Addressing code review feedback

* Added fallback to include Primary in QuorumReader when quorum could not be selected.

* Update CHANGELOG.md

* Update CosmosItemTest.java

* Update CosmosItemTest.java

* increase the version in each module

* changed test to use queryOptions instead of readmany options

* Update sdk/cosmos/azure-cosmos/CHANGELOG.md

* updated date

* Update sdk/cosmos/azure-cosmos/CHANGELOG.md

---------

Co-authored-by: Fabian Meiswinkel <fabianm@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants