-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft/rafttest: TestPause failed #132992
Comments
Duplicate of #132205. This test expects a stable leader, and we see from the logs that the loader changed:
It's similar in flavor to #121745 (comment), which is fixed. There might be a slightly different reason for the leadership change here, but it's most likely another variant of the Assigning P3 and removing the release blocker label, similar to #121745. |
+1 to what @miraradeva said. The test basically proposes many entries to different nodes, and assumes that there was a stable leader, so therefore all entries that were proposed got committed, which is not true if there are leadership changes. Some suggestions to make the test more stable:
|
I am able to reproduce the failure by running:
|
We sporadically see that some raft node_test tests fail due to the leader not being stable. This commit should reduce the chances of that happening by increasing the election timeout to 50ms (instead of 10ms). I couldn't reproduce the bug locally with this change. If the bug still happens, we can try to force leadership to make it more deterministic. Fixes: cockroachdb#132992 Release note: None
I took a step back when I noticed that we have multiple of similar problems that started showing up in the last 2 weeks. I was able to bisect the issue to this: d3f4d01#diff-2a745fc78de353dd2e58c332ca186b00e887630d719ba3831fd6282633306acb |
Talked offline with @arulajmani, the culprit commit didn't do anything weird. However, it introduced a mutex that might have helped show this bug more often due to timing issues |
119035: sql, sem, opt, explain, memo, kv: audit bit-flag-checking helpers r=DrewKimball,mgartner a=michae2 `@mgartner's` [comment](#118931 (review)) on #118931 inspired me to audit all the other helper functions that check whether a flag is set in a bitfield. I might have found a couple bugs. See individual commits for details. Epic: None Release note: None 133270: raft: deflake non-determinisctic raft node tests r=iskettaneh a=iskettaneh We sporadically see that some raft node_test tests fail due to the leader not being stable. This commit should reduce the chances of that happening by increasing the election timeout to 250ms (instead of 50ms). I couldn't reproduce the bug locally with this change. If the bug still happens, we can try to force leadership to make it more deterministic. Fixes: #132992, #131676, #132205, #133048 Release note: None Co-authored-by: Michael Erickson <michae2@cockroachlabs.com> Co-authored-by: Ibrahim Kettaneh <ibrahim.kettaneh@cockroachlabs.com>
Based on the specified backports for linked PR #133270, I applied the following new label(s) to this issue: branch-release-24.1, branch-release-24.2. Please adjust the labels as needed to match the branches actually affected by this issue, including adding any known older branches. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
We sporadically see that some raft node_test tests fail due to the leader not being stable. This commit should reduce the chances of that happening by increasing the election timeout to 250ms (instead of 50ms). I couldn't reproduce the bug locally with this change. If the bug still happens, we can try to force leadership to make it more deterministic. Fixes: #132992 Release note: None
We sporadically see that some raft node_test tests fail due to the leader not being stable. This commit should reduce the chances of that happening by increasing the election timeout to 250ms (instead of 50ms). I couldn't reproduce the bug locally with this change. If the bug still happens, we can try to force leadership to make it more deterministic. Fixes: #132992 Release note: None
We sporadically see that some raft node_test tests fail due to the leader not being stable. This commit should reduce the chances of that happening by increasing the election timeout to 250ms (instead of 50ms). I couldn't reproduce the bug locally with this change. If the bug still happens, we can try to force leadership to make it more deterministic. Fixes: #132992 Release note: None
raft/rafttest.TestPause failed on release-24.3 @ 4cbedefd790c75cb0f21f77ed8d917c8528a7d15:
Parameters:
attempt=1
run=25
shard=1
Help
See also: How To Investigate a Go Test Failure (internal)
Same failure on other branches
This test on roachdash | Improve this report!
Jira issue: CRDB-43390
The text was updated successfully, but these errors were encountered: