Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backport] [v24.1.x] lw heartbeats during inflight appends #20801

Merged
merged 3 commits into from
Jul 1, 2024

Conversation

bharathv
Copy link
Contributor

@bharathv bharathv commented Jul 1, 2024

If there are appends in flight and are stuck (eg: disk pressure on the follower), that may result in a spurious leadership step down due to heartbeat loss as the hbs are suppressed for the duration of the append. This commit switches to using lw heartbeats when there are inflight appends to avoid this scenario.

Fixes #20591

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

Bug Fixes

  • Prevents unnecessary leadership changes. This change avoids situations where the leader mistakenly steps down due to slow follower responses (e.g., slow disk) to appends. This happens because heartbeats were paused while waiting for append entries, leading the leader to misinterpret the delay as a heartbeat violation.

bharathv added 2 commits June 30, 2024 23:00
With the upcoming change, we don't necessarily suppress hearbeats if
appends are in flight, so the current naming schema may confuse the
readers. No logical changes.

(cherry picked from commit 5fde255)
If the inflight appends are stuck, we donot want the leader to
mistakenly step down with majority heartbeat loss. Instead this commit
switches to using lw heartbeats in that window.

(cherry picked from commit 6619817)
Simulates stuck append entries which supress heartbeats resulting
in a leader step down.

(cherry picked from commit 4e92622)
@bharathv bharathv requested review from ztlpn and mmaslankaprv July 1, 2024 15:17
@bharathv bharathv merged commit 5547d15 into redpanda-data:v24.1.x Jul 1, 2024
18 checks passed
@bharathv bharathv deleted the 241x-append-timeout branch July 1, 2024 21:55
@BenPope
Copy link
Member

BenPope commented Jul 17, 2024

@bharathv
Copy link
Contributor Author

@bharathv I think the release note could be a bit improved: https://github.com/redpanda-data/redpanda/releases/tag/untagged-e8ce2cd9a5a987a31e3f

Done.

@BenPope BenPope added this to the v24.1.10 milestone Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants