-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft node notifies configure when confChanged #15708
raft node notifies configure when confChanged #15708
Conversation
20a9dba
to
eba61de
Compare
Can #15528 be easily reproduced? If not, then we need to add an integration test; we can intentionally inject a sleep failpoint before r.Advanced() |
Not easily. Inject a sleep should do the trick. |
It's hard to reproduce in few requests. It needs to send request one by one without pause. package verify
import (
"context"
"strings"
"testing"
"time"
integration2 "go.etcd.io/etcd/tests/v3/framework/integration"
"go.etcd.io/etcd/tests/v3/integration"
)
var lazyCluster = integration.NewLazyClusterWithConfig(
integration2.ClusterConfig{
Size: 3,
WatchProgressNotifyInterval: 200 * time.Millisecond,
DisableStrictReconfigCheck: true})
func TestVerify(t *testing.T) {
defer lazyCluster.Terminate()
clus := lazyCluster.Cluster()
leaderIdx := clus.WaitLeader(t)
members := clus.Members
followerIdx := (leaderIdx + 1) % 3
cli := clus.Client(leaderIdx)
// promoting any of the voting members in cluster should fail
expectedErrKeywords := "can only promote a learner member"
var err error
for i := 0; i < 10000; i++ {
ctx, cancel := context.WithTimeout(context.TODO(), 1*time.Minute)
_, err = cli.MemberPromote(ctx, uint64(members[followerIdx].ID()))
cancel()
if err == nil {
t.Fatalf("expect promoting voting member to fail, got no error")
}
if strings.Contains(err.Error(), expectedErrKeywords) {
continue
}
t.Fatalf("unexpected error: %v", err)
}
} |
This PR not only changes MemberPromote, but introduces a big change for all membership changes. We should double check that all configuration changes are properly tested. If not we should add a large set of e2e tests that:
I would not want to rush this change, so I would recommend #15528 independently. |
Hi @ahrtr could you please help clarify how would integration test inject a sleep failpoint? I know with etcd binary built with gofail enabled it is achievable. etcd/tests/framework/e2e/etcd_process.go Lines 298 to 318 in b754994
|
Please refer to the followings, |
Hi @serathius I audited all the membership reconfiguration tests.
Number 1 & 2 tests does not expose issue mentioned in #15528 because it applies configuration change only once. common tests cover all the e2e test cases so there are some duplicates. Test case Number 4 example test cases share a lazy cluster that could have back to back membership reconfigurations. So that's why #12983 and #14040 is flaky. So My proposal is to enhance tests/integration/clientv3/cluster_test.go with failpoint injected with sleep 100ms on etcd/server/etcdserver/raft.go Line 310 in d81d3c3
MemberAdd , MemberUpdate , MemberRemove , MemberPromote repeatedly, what do you think? @ahrtr @serathius
|
eba61de
to
cab837c
Compare
… to client Signed-off-by: Benjamin Wang <wachao@vmware.com>
1f4c4b8
to
2d4c8ec
Compare
1. rename confChangeCh to raftAdvancedC 2. rename waitApply to confChanged 3. add comments and test assertion Signed-off-by: Chao Chen <chaochn@amazon.com>
2d4c8ec
to
6cdc9ae
Compare
Ping @serathius @ahrtr @fuweid @tjungblu @jmhbnz for review ~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment to respond to review ping. I've read through this pr a few times and at a code level it looks good, however I'm abstaining from hitting approve as I don't understand overall system implications well enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm (non-binding)
Thanks for finally fixing this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks!
Fix #15528
The PR is built on top of #15703
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.
PTAL @ahrtr @serathius, thanks!