-
Notifications
You must be signed in to change notification settings - Fork 604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v24.2.x] Fix stepping down on timeout #24708
Merged
mmaslankaprv
merged 10 commits into
redpanda-data:v24.2.x
from
mmaslankaprv:manual-backport-24590-v24.2.x-643
Jan 21, 2025
Merged
[v24.2.x] Fix stepping down on timeout #24708
mmaslankaprv
merged 10 commits into
redpanda-data:v24.2.x
from
mmaslankaprv:manual-backport-24590-v24.2.x-643
Jan 21, 2025
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The `raft::reply_result::follower_busy` is indicating that the follower was unable to process the heartbeat fast enough to generate a response. Renaming the reply from `timeout` will make it less confusing for the reader and differentiate the error code from an RPC timeout. Signed-off-by: Michał Maślanka <michal@redpanda.com> (cherry picked from commit 6a1e34b)
6a30603
to
3259d02
Compare
Retry command for Build#60349please wait until all jobs are finished before running the slash command
|
60cf6da
to
683d30a
Compare
ztlpn
reviewed
Jan 14, 2025
ztlpn
reviewed
Jan 14, 2025
ztlpn
reviewed
Jan 14, 2025
if (id == leader.get_vnode().id()) { | ||
node->on_dispatch([](model::node_id, raft::msg_type mt) { | ||
if (mt == raft::msg_type::append_entries) { | ||
throw std::runtime_error("dropping_append_entries"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm should this be committed to dev as well?
Made leader election timeout and heartbeat interval properties runtime configurable in Raft fixture tests. Signed-off-by: Michał Maślanka <michal@redpanda.com>
Signed-off-by: Michał Maślanka <michal@redpanda.com> (cherry picked from commit 95a29db)
Wired raft RPC service handler into Raft fixture to make the tests more accurate and cover the service code with tests. Signed-off-by: Michał Maślanka <michal@redpanda.com> (cherry picked from commit 5f69d9b)
Propagating timeout to the node sending RPC request is crucial for accurate testing of Raft implementation. Signed-off-by: Michał Maślanka <michal@redpanda.com> (cherry picked from commit 7d33bb5)
Added a wrapper around the `storage::log` allowing us to inject storage layer failures in Raft fixture tests. Signed-off-by: Michał Maślanka <michal@redpanda.com> (cherry picked from commit f04995a)
When follower is busy it may fail fast processing full heartbeat requests sent by the leader. In this case a follower RPC handler sets the `follower_busy` result in heartbeat_reply. Leader should still treat a follower replica as online in this case. The replica hosting node must be online to reply with the `follower_busy` error. This way we prevent to eager leader step downs when follower replicas are slow. Signed-off-by: Michał Maślanka <michal@redpanda.com> (cherry picked from commit 8b57b42)
Signed-off-by: Michał Maślanka <michal@redpanda.com> (cherry picked from commit 67e7c6e)
1) added a utility function to raft_fixture to execute a testing coro in retry_with_leader 2) made both monitor_test_fixture tests use it (cherry picked from commit d1ecf0e)
Fixed previously unstable test. Now the test simply blocks append entry requests from leader instead of relying on uncertain timeouts. Added waiting for enqueue of replicate requests to make sure the requests landed in the buffer before the leadership changed. Signed-off-by: Michał Maślanka <michal@redpanda.com>
6588510
to
681c790
Compare
ztlpn
approved these changes
Jan 21, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport of PR #24590
Fixes: #24668