Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[yugabyte/yugabyte-db#17854] Bug while transition from snapshot to streaming #236

Merged
merged 18 commits into from
Jul 3, 2023

Conversation

vaibhav-yb
Copy link
Collaborator

Problem

It was found out that there is a flow where we set the tabletSourceInfo to 0.0 - while snapshotting while streaming snapshot records. This 0.0 came because we do not send any OpId with snapshot records but while accessing the records, when we are calling getTerm or getIndex we are getting the default value 0.0 which is then being used in commit callback as 0.0

The previous flow also resulted in the error of the form:

org.yb.client.CDCErrorException: Server[4ff8d4f19ae24e7aa78c0a43e13bf5ff] INTERNAL_ERROR[code 21]: CDCSDK Trying to fetch already GCed intents for transaction 7efb3374-c9d1-4454-b30e-ddb74baafffe

Solution

A part of the solution was to decouple the offset storage objects which was implemented by #233 - after this, there were other fixes required to resolve issues of invalid checkpoints which are mentioned below and are a part of this PR.

  1. Adding a GetChanges call with snapshot_done_key as the key to mark snapshot completed on service.
  2. Initialize the fromLsn object upon calling the method YugabyteDBOffsetContext#initSourceInfo

Service diff

https://phorge.dev.yugabyte.com/D26336

Test plan

The test YugabyteDBSnapshotTest#snapshotColocatedNonColocatedThenStream started to fail when we started running tests with explicit checkpointing with our frequent Jenkins runs. The test passes with the changes included in the PR.

This PR also closes yugabyte/yugabyte-db#17854

.pollDelay(Duration.ofSeconds(10))
.atMost(Duration.ofSeconds(20))
.pollDelay(Duration.ofSeconds(15))
.atMost(Duration.ofSeconds(65))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of some infra issues in some of the test runs, sometimes the connector is not able to come up within 20 seconds. Increasing the timeout to avoid that.

@vaibhav-yb vaibhav-yb requested review from suranjan and vrajat and removed request for suranjan June 28, 2023 05:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CDCSDK] DBZ: Bug in snapshot mode (Trying to fetch GCed intents)
2 participants