Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix txn consume group issues leading to undefined behavior #11110

Merged
merged 5 commits into from
Jun 9, 2023

Conversation

rystsov
Copy link
Contributor

@rystsov rystsov commented May 30, 2023

Fixes the issues found during #10588 investigation. It isn't clear yet if they were causing #10588 but it looks like they might.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

  • none

dotnwat
dotnwat previously approved these changes May 31, 2023
src/v/kafka/server/group.cc Show resolved Hide resolved
rystsov added 4 commits June 7, 2023 16:35
An execution of abort_old_txes could span multiple terms so the so the
method could modify new state assuming it's the old state resulting in
undefined behavior
Make group accept term to reduce scope of where reset_tx_state is used
to easier track where the write lock is necessary
When the consumer group log's term change we replay the whole log to
reconstruct the state. We used to merge current and the replayed state
but it's error prone. Reseting the whole txn state to have more deter-
ministic behavior
@rystsov rystsov force-pushed the issue-10588 branch 2 times, most recently from 0eb9b84 to 3d7d933 Compare June 7, 2023 23:41
dotnwat
dotnwat previously approved these changes Jun 8, 2023
Transactions in kafka protocol are stateful: the processing of the
requests depends on the previous commands executed by the same or
even different producer. It makes the situations when the replica-
tion fails with the indecisive errors such as timeout dangerous
because the true state is unknown.

Stepping down to resolve uncertainty by replaying the log
@rystsov rystsov requested a review from dotnwat June 9, 2023 03:28
@piyushredpanda piyushredpanda merged commit 2edd0fe into redpanda-data:dev Jun 9, 2023
@vbotbuildovich
Copy link
Collaborator

/backport v23.1.x

@vbotbuildovich
Copy link
Collaborator

/backport v22.3.x

rystsov added a commit to rystsov/redpanda that referenced this pull request Jun 16, 2023
rystsov added a commit to rystsov/redpanda that referenced this pull request Jun 16, 2023
@rystsov rystsov mentioned this pull request Jun 16, 2023
7 tasks
@redpanda-data redpanda-data deleted a comment from vbotbuildovich Jul 10, 2023
@redpanda-data redpanda-data deleted a comment from vbotbuildovich Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants