-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flake TestMaxLearnerInCluster #16078
Comments
I'm not sure whether this is an idempotency issue ( Almost certainly, it shouldn't get stuck on the select in Also not easy to repro locally, even with limiting to a single core like this:
|
similar case in https://github.com/etcd-io/etcd/actions/runs/5140910673/jobs/9252990421#step:5:7330
m1 seems to completely lock up here, similar to the above case where it's stuck on the select. |
Hi @tjungblu, I think the test failure symptom is similar to #15528 (comment). Could you please try manually inject the sleep(1s) before I will push the fix further to be merged == |
thanks for the pointer @chaochn47, I've added a sleep statement right before this line: etcd/server/etcdserver/raft.go Line 310 in da49157
I hope this is what you mean? If yes, then it's not reproducing either in about 100 runs so far. I can see the sleep in effect though, there's plenty of heartbeat delays in the logs. But it really sounds like it could be the culprit. I try to repro this a bit further, otherwise I'm having all eyes on your PR :) EDIT: spoke to early, it just happened on one run. With 10ms it's much more reproducible than 1s. Great, one thing less to worry about! I reckon we'll close here in favor of #15528? |
I think we can leave the issue open until the fix is merged. No strong preference though. |
Fixed by #15708 |
Which github workflows are flaking?
test (linux-amd64-integration-2-cpu)
Which tests are flaking?
TestMaxLearnerInCluster
Github Action link
https://github.com/etcd-io/etcd/actions/runs/5264795327/jobs/9516550972
Reason for failure (if possible)
15m test timeout reached, client test code seems stuck on
MemberAdd
.Server side seems stuck on:
with log message:
The text was updated successfully, but these errors were encountered: