Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-24.3: sql/row: fix multi-range reads from PCR standby #133171

Merged
merged 2 commits into from
Oct 30, 2024

Conversation

blathers-crl[bot]
Copy link

@blathers-crl blathers-crl bot commented Oct 22, 2024

Backport 1/1 commits from #132854 and 1/1 commits from #133589 on behalf of @michae2.

/cc @cockroachdb/release


sql/row: fix multi-range reads from PCR standby

Reads from tables with external row data (i.e. reads from a PCR standby cluster) need to use the fixed timestamp specified by the external row data. This timestamp might be different from the transaction timestamp, so we were explicitly setting BatchRequest.Timestamp in kv_batch_fetcher.

The KV API only allows BatchRequest.Timestamp to be set for non-transactional requests (i.e. requests sent with a NonTransactionalSender, which is a CrossRangeTxnWrapperSender in this case). We were using a NonTransactionalSender, but this had two problems:

  1. CrossRangeTxnWrapperSender in turn sends the BatchRequest with a transactional sender, which again does not allow BatchRequest.Timestamp to be set.
  2. CrossRangeTxnWrapperSender uses kv.(*Txn).CommitInBatch, which does not provide the 1-to-1 request-response guarantee required by txnKVFetcher. It is kv.(*Txn).Send which provides this guarantee.

Because of these two problems, whenever the txnKVFetcher would send a multi-range-spanning BatchRequest to CrossRangeTxnWrapperSender, it would either fail with a "transactional request must not set batch timestamp" error or would return an unexpected number of responses, violating the txnKVFetcher's assumed mapping from request to response.

To fix both these problems, instead of using a NonTransactionalSender, change the txnKVFetcher to open a new root transaction with the correct fixed timestamp, and then use txn.Send.

Fixes: #132608

Release note: None


sqlccl: skip TestStandbyRead under duress

Fixes: #133243

Release note: None


Release justification: fix for a GA-blocker bug.

@blathers-crl blathers-crl bot requested a review from a team as a code owner October 22, 2024 17:56
@blathers-crl blathers-crl bot force-pushed the blathers/backport-release-24.3-132854 branch from 6cdb004 to 3fcaeaa Compare October 22, 2024 17:56
@blathers-crl blathers-crl bot requested review from DrewKimball and removed request for a team October 22, 2024 17:56
@blathers-crl blathers-crl bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels Oct 22, 2024
Copy link
Author

blathers-crl bot commented Oct 22, 2024

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Backports should only be created for serious
    issues
    or test-only changes.
  • Backports should not break backwards-compatibility.
  • Backports should change as little code as possible.
  • Backports should not change on-disk formats or node communication protocols.
  • Backports should not add new functionality (except as defined
    here).
  • Backports must not add, edit, or otherwise modify cluster versions; or add version gates.
  • All backports must be reviewed by the owning areas TL. For more information as to how that review should be conducted, please consult the backport
    policy
    .
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters. State changes must be further protected such that nodes running old binaries will not be negatively impacted by the new state (with a mixed version test added).
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.
  • Your backport must be accompanied by a post to the appropriate Slack
    channel (#db-backports-point-releases or #db-backports-XX-X-release) for awareness and discussion.

Also, please add a brief release justification to the body of your PR to justify this
backport.

@blathers-crl blathers-crl bot added the backport Label PR's that are backports to older release branches label Oct 22, 2024
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@michae2
Copy link
Collaborator

michae2 commented Oct 22, 2024

Failure looks like #132638, which I think is unrelated.

@michae2
Copy link
Collaborator

michae2 commented Oct 28, 2024

Note this will also need #133589.

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @mgartner, @michae2, and @nvanbenschoten)

Reads from tables with external row data (i.e. reads from a PCR standby
cluster) need to use the fixed timestamp specified by the external row
data. This timestamp might be different from the transaction timestamp,
so we were explicitly setting BatchRequest.Timestamp in
kv_batch_fetcher.

The KV API only allows BatchRequest.Timestamp to be set for
non-transactional requests (i.e. requests sent with a
NonTransactionalSender, which is a CrossRangeTxnWrapperSender in this
case). We were using a NonTransactionalSender, but this had two
problems:

1. CrossRangeTxnWrapperSender in turn sends the BatchRequest with a
   transactional sender, which again does not allow
   BatchRequest.Timestamp to be set.
2. CrossRangeTxnWrapperSender uses `kv.(*Txn).CommitInBatch`, which does
   not provide the 1-to-1 request-response guarantee required by
   txnKVFetcher. It is `kv.(*Txn).Send` which provides this guarantee.

Because of these two problems, whenever the txnKVFetcher would send a
multi-range-spanning BatchRequest to CrossRangeTxnWrapperSender, it
would either fail with a "transactional request must not set batch
timestamp" error or would return an unexpected number of responses,
violating the txnKVFetcher's assumed mapping from request to response.

To fix both these problems, instead of using a NonTransactionalSender,
change the txnKVFetcher to open a new root transaction with the correct
fixed timestamp, and then use txn.Send.

Fixes: #132608

Release note: None
@michae2 michae2 force-pushed the blathers/backport-release-24.3-132854 branch from 3fcaeaa to cbde46a Compare October 28, 2024 21:57
@michae2 michae2 merged commit 05270cb into release-24.3 Oct 30, 2024
20 of 21 checks passed
@michae2 michae2 deleted the blathers/backport-release-24.3-132854 branch October 30, 2024 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Label PR's that are backports to older release branches blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants