Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft/rafttest: TestBasicProgress failed #131676

Closed
cockroach-teamcity opened this issue Oct 1, 2024 · 3 comments
Closed

raft/rafttest: TestBasicProgress failed #131676

cockroach-teamcity opened this issue Oct 1, 2024 · 3 comments
Labels
A-testing Testing tools and infrastructure branch-master Failures and bugs on the master branch. branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. P-3 Issues/test failures with no fix SLA T-kv KV Team

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Oct 1, 2024

raft/rafttest.TestBasicProgress failed with artifacts on master @ 91bc1a739e364db7444dc5ce77df4c37fbc990e8:

raft2024/10/01 16:59:39 INFO: 3 [term: 9] ignored a MsgFortifyLeader message with lower term from 2 [term: 3]
raft2024/10/01 16:59:39 INFO: 5 [logterm: 3, index: 106, vote: 2] ignored MsgVote from 1 [logterm: 3, index: 6] at term 3: supporting fortified leader 2 at epoch 1
raft2024/10/01 16:59:39 INFO: 1 [term: 10] ignored a MsgFortifyLeader message with lower term from 2 [term: 3]
raft2024/10/01 16:59:39 INFO: 4 [term: 10] ignored a MsgHeartbeat message with lower term from 2 [term: 3]
raft2024/10/01 16:59:39 INFO: 1 received MsgVoteResp from 4 at term 10
raft2024/10/01 16:59:39 INFO: 1 has received 2 MsgVoteResp votes and 0 vote rejections
raft2024/10/01 16:59:39 INFO: 4 [term: 10] ignored a MsgFortifyLeader message with lower term from 2 [term: 3]
raft2024/10/01 16:59:39 INFO: 3 [term: 9] received a MsgVote message with higher term from 1 [term: 10]
raft2024/10/01 16:59:39 INFO: 3 became follower at term 10
raft2024/10/01 16:59:39 INFO: 3 [logterm: 1, index: 5, vote: 0] cast MsgVote for 1 [logterm: 3, index: 6] at term 10
raft2024/10/01 16:59:40 INFO: 2 [logterm: 3, index: 106, vote: 2] ignored MsgVote from 1 [logterm: 3, index: 6] at term 3: supporting fortified leader 2 at epoch 1
raft2024/10/01 16:59:40 INFO: 3 [term: 10] ignored a MsgHeartbeat message with lower term from 2 [term: 3]
raft2024/10/01 16:59:40 INFO: 1 received MsgVoteResp from 3 at term 10
raft2024/10/01 16:59:40 INFO: 1 has received 3 MsgVoteResp votes and 0 vote rejections
raft2024/10/01 16:59:40 INFO: 1 became leader at term 10
raft2024/10/01 16:59:40 INFO: raft.node: 1 elected leader 1 at term 10
raft2024/10/01 16:59:40 INFO: raft.node: 3 elected leader 1 at term 10
raft2024/10/01 16:59:40 INFO: 5 [term: 3] received a MsgFortifyLeader message with higher term from 1 [term: 10]
raft2024/10/01 16:59:40 INFO: 5 became follower at term 10
raft2024/10/01 16:59:40 INFO: raft.node: 5 changed leader from 2 to 1 at term 10
raft2024/10/01 16:59:40 INFO: 1 [term: 10] ignored a MsgFortifyLeader message with lower term from 2 [term: 3]
raft2024/10/01 16:59:40 INFO: 1 [term: 10] ignored a MsgFortifyLeader message with lower term from 2 [term: 3]
raft2024/10/01 16:59:40 INFO: 4 [term: 10] ignored a MsgFortifyLeader message with lower term from 2 [term: 3]
raft2024/10/01 16:59:40 INFO: found conflict at index 7 [existing term: 3, conflicting term: 10]
raft2024/10/01 16:59:40 INFO: replace the unstable entries from index 7
raft2024/10/01 16:59:40 INFO: 3 [term: 10] ignored a MsgFortifyLeader message with lower term from 2 [term: 3]
raft2024/10/01 16:59:40 INFO: 1 [term: 10] ignored a MsgHeartbeat message with lower term from 2 [term: 3]
raft2024/10/01 16:59:40 INFO: 4 [term: 10] ignored a MsgHeartbeat message with lower term from 2 [term: 3]
raft2024/10/01 16:59:40 INFO: 1 [term: 10] ignored a MsgFortifyLeader message with lower term from 2 [term: 3]
raft2024/10/01 16:59:40 INFO: raft.node: 4 elected leader 1 at term 10
raft2024/10/01 16:59:40 INFO: 5 [term: 10] ignored a MsgHeartbeat message with lower term from 2 [term: 3]
raft2024/10/01 16:59:40 INFO: 2 [term: 3] received a MsgFortifyLeader message with higher term from 1 [term: 10]
raft2024/10/01 16:59:40 INFO: 2 became follower at term 10
raft2024/10/01 16:59:40 INFO: raft.node: 2 changed leader from 2 to 1 at term 10
raft2024/10/01 16:59:40 INFO: 3 [term: 10] ignored a MsgHeartbeat message with lower term from 2 [term: 3]
raft2024/10/01 16:59:40 INFO: 4 [term: 10] ignored a MsgFortifyLeader message with lower term from 2 [term: 3]
raft2024/10/01 16:59:40 INFO: 5 [term: 10] ignored a MsgHeartbeat message with lower term from 2 [term: 3]
raft2024/10/01 16:59:40 INFO: 3 [term: 10] ignored a MsgHeartbeat message with lower term from 2 [term: 3]
raft2024/10/01 16:59:40 INFO: found conflict at index 7 [existing term: 3, conflicting term: 10]
raft2024/10/01 16:59:40 INFO: replace the unstable entries from index 7
raft2024/10/01 16:59:40 INFO: 3 [term: 10] ignored a MsgFortifyLeader message with lower term from 2 [term: 3]
raft2024/10/01 16:59:40 INFO: 1 [term: 10] ignored a MsgHeartbeat message with lower term from 2 [term: 3]
raft2024/10/01 16:59:40 INFO: 4 [term: 10] ignored a MsgHeartbeat message with lower term from 2 [term: 3]
    node_test.go:47: commits failed to converge!
I241001 16:59:44.592072 52 (gostd) node.go:104  [-] 1  raft.1: stop
I241001 16:59:44.592206 54 (gostd) node.go:104  [-] 2  raft.2: stop
I241001 16:59:44.592222 56 (gostd) node.go:104  [-] 3  raft.3: stop
I241001 16:59:44.592239 58 (gostd) node.go:104  [-] 4  raft.4: stop
I241001 16:59:44.592254 60 (gostd) node.go:104  [-] 5  raft.5: stop
--- FAIL: TestBasicProgress (5.32s)
Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/kv

This test on roachdash | Improve this report!

Jira issue: CRDB-42651

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-kv KV Team labels Oct 1, 2024
@kvoli kvoli added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-testing Testing tools and infrastructure P-3 Issues/test failures with no fix SLA and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Oct 2, 2024
@kvoli
Copy link
Collaborator

kvoli commented Oct 2, 2024

Looks rather similar to #130244, there's leader changes and conflicting indices. I'm going to assign the same labels.

craig bot pushed a commit that referenced this issue Oct 23, 2024
119035: sql, sem, opt, explain, memo, kv: audit bit-flag-checking helpers r=DrewKimball,mgartner a=michae2

`@mgartner's` [comment](#118931 (review)) on #118931 inspired me to audit all the other helper functions that check whether a flag is set in a bitfield. I might have found a couple bugs.

See individual commits for details.

Epic: None
Release note: None

133270: raft: deflake non-determinisctic raft node tests r=iskettaneh a=iskettaneh

We sporadically see that some raft node_test tests fail due to the leader not being stable. This commit should reduce the chances of that happening by increasing the election timeout to 250ms (instead of 50ms).

I couldn't reproduce the bug locally with this change.

If the bug still happens, we can try to force leadership to make it more deterministic.

Fixes: #132992, #131676, #132205, #133048

Release note: None

Co-authored-by: Michael Erickson <michae2@cockroachlabs.com>
Co-authored-by: Ibrahim Kettaneh <ibrahim.kettaneh@cockroachlabs.com>
Copy link

blathers-crl bot commented Oct 23, 2024

Based on the specified backports for linked PR #133270, I applied the following new label(s) to this issue: branch-release-24.1, branch-release-24.2, branch-release-24.3. Please adjust the labels as needed to match the branches actually affected by this issue, including adding any known older branches.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl bot added branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 labels Oct 23, 2024
@iskettaneh
Copy link
Contributor

Should be less flaky with #133270

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing Testing tools and infrastructure branch-master Failures and bugs on the master branch. branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. P-3 Issues/test failures with no fix SLA T-kv KV Team
Projects
None yet
Development

No branches or pull requests

3 participants