-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
integration: make read/write io timeout longer #5350
Conversation
LGTM. Thanks. |
@AkihiroSuda I am not exactly sure this is the right fix. Can you try to reproduce the issue with this patch? Thanks! |
Hi @xiang90, sorry for late. I tested your PR 77fde6a with a printf-debug patch #5022 (comment), and still getting an error (Note that the error strings seems changed!):
When I hit the issue on my 4-core Xeon(E3-1220 v3 @ 3.10GHz) machine, I did not do anything special nor running heavy applications (although some daemons and X11 were running). I'll try to add more debug prints and investigate the cause. |
@AkihiroSuda interesting. Do you have logs of etcd servers? |
I'm not sure how to get server logs for integration testsuites (`go test -v` doesn't show any logs) --- FAIL: TestIssue2746 (6.67s) cluster_test.go:356: create on http://127.0.0.1:21564 error: client: etcd cluster is unavailable or misconfigured(detail: error #0: client: etcd member http://127.0.0.1:21564 has no leader: {"errorCode":300,"message":"Raft Internal Error","cause":"etcdserver: request timed out, possibly due to previous leader failure","index":0} )
More detailed error message (
Printf-debug patch: AkihiroSuda@546fed2 @mitake |
@AkihiroSuda @mitake I'm getting full logs by changing |
This line seems almost specific to failed runs
https://github.com/coreos/etcd/blob/master/raft/log.go#L120 |
@AkihiroSuda I'm seeing the same thing. I think the conflict is from an election and the key create call is failing because of a lost leader: failure1:
failure2:
|
Is returning an error in no leader duration invalid? I think the test failure doesn't mean a bug of etcd because Raft is a leader based consensus algorithm and its unavailability in no leader duration seems to be a valid behavior. How about just retrying requests from client for handling the case (maybe more detailed error code is required?) ? |
I created a PR #5395 . The PR reduces test failures according to the test of @AkihiroSuda . But the fix would be partial because of the reason I described above. For removing the failure completely, client side retry would be needed, I think. |
…rorCode":300,"message":"Raft Internal Error","cause":"etcdserver: request timed out","index":0}
Fix #5022