Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] [tx] [branch-2.9] Transaction buffer recover blocked by readNext #18970

Merged
merged 2 commits into from
Dec 19, 2022

Conversation

poorbarcode
Copy link
Contributor

Motivation

Since PR #18833 can not cherry-pick to branch-2.9, create a separate PR.

Context for Transaction Buffer

  • If turn on transactionCoordinatorEnabled, then TransactionBuffer will be initialized when a user topic create.
  • The TransactionBuffer reads the aborted logs of transactions from topic __transaction_buffer_snapshot -- this process is called recovery.
  • During recovery, the reading from that snapshot ledger is done via a Reader; the reader works like this:
while (reader.hasMessageAvailable()){
    reader.readNext();
}

Context for Compaction

  • After pip-14, the consumer that enabled feature read-compacted will read messages from the compacted topic instead of the original topic if the task-compaction is done, and read messages from the original topic if task-compaction is not done.
  • If the data of the last message with key k sent to a topic is null, the compactor will mark all messages for that key as deleted.

Issue

There is a race condition: after executing reader.hasMessageAvailable, the following messages have been deleted by compaction-task, so read next will be blocked because there have no messages to read.


Modifications

  • If hits this issue, do recover again.

Why not just let the client try to load the topic again to retry the recover?

If the topic load is failed, the client will receive an error response. This is a behavior that we can handle, so should not be perceived by the users.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

+ "transactionBufferSnapshot timeout!", topic.getName());
log.error(errorMessage, t);
callBack.recoverExceptionally(
new BrokerServiceException.ServiceUnitNotReadyException(errorMessage, t));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should close the reader right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Already fixed.

@congbobo184 congbobo184 merged commit 46dd1c7 into apache:branch-2.9 Dec 19, 2022
@congbobo184 congbobo184 added the cherry-picked/branch-2.9 Archived: 2.9 is end of life label Dec 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/transaction cherry-picked/branch-2.9 Archived: 2.9 is end of life doc-not-needed Your PR changes do not impact docs release/2.9.5
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants