Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integration: call waitLeader() before sending requests in TestIssue2746 #5395

Closed
wants to merge 1 commit into from

Conversation

mitake
Copy link
Contributor

@mitake mitake commented May 19, 2016

Because of leader absence, TestIssue2746 fails occasionally. For
fixing the problem, this commit lets the test call waitLeader() before
sending requests.

The test failure is fixed partially. It is because the campaign can
happen during testing (not initialization phase). For handling it,
we would need to let clients retry the request.

Partially fixes #5022

Because of leader absence, TestIssue2746 fails occasionally. For
fixing the problem, this commit lets the test call waitLeader() before
sending requests.

The test failure is fixed partially. It is because the campaign can
happen during testing (not initialization phase). For handling it,
we would need to let clients retry the request.

Partially fixes etcd-io#5022
@mitake
Copy link
Contributor Author

mitake commented May 19, 2016

The problem can be reproduced only on @AkihiroSuda 's environment currently. @AkihiroSuda , could you share the effectiveness of this commit?

@AkihiroSuda
Copy link
Contributor

Still failing (pr5395-err.txt)

--- FAIL: TestIssue2746 (6.78s)
    cluster_test.go:362: create on http://127.0.0.1:21782 error: client: etcd cluster is unavailable or misconfigured(detail: error #0: client: etcd member http://127.0.0.1:21782 has no leader: {"errorCode":300,"message":"Raft Internal Error","cause":"etcdserver: request timed out, possibly due to previous leader failure","index":0}

        )

The tested version is f13eaac + pr5395-debugprint.patch.txt

@AkihiroSuda
Copy link
Contributor

Note: I feel the reproducibility became low after this PR

@mitake
Copy link
Contributor Author

mitake commented May 26, 2016

@AkihiroSuda could you share the detailed numbers of your experiments?

@AkihiroSuda
Copy link
Contributor

@mitake Tested about 2500 times.
Observed "cause":"etcdserver: request timed out" 7 times, and "cause":"etcdserver: request timed out, possibly due to previous leader failure" 2 times.

@heyitsanthony
Copy link
Contributor

Is this more effective than just adding a delay with time.Sleep(time.Second)? I don't feel this actually fixes the problem...

@mitake
Copy link
Contributor Author

mitake commented May 31, 2016

@heyitsanthony maybe adding a delay with sleep and this PR wouldn't differ so much. We couldn't observe meaningful difference of failure rate with this PR. I'm closing it.

@mitake mitake closed this May 31, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

test: TestIssue2746
3 participants