-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds support for WAN soft fail and join flooding. #2801
Conversation
c2e0469
to
2f49089
Compare
This has the next wave of RTT integration with the router and also factors some common RTT-related helpers out to lib. While we were in here we also got rid of the coordinate disable config so we don't need to deal with the complexity in the router (there was never a user-visible way to disable coordinates).
5788263
to
79e192f
Compare
Should be in good shape for a review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good, just a couple comments
api/operator_area.go
Outdated
// when creating a new area. | ||
ID string | ||
|
||
// PeeerDatacenter is the peer Consul datacenter that will make up the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo, should be PeerDatacenter
api/operator_area.go
Outdated
Joined bool | ||
|
||
// If we couldn't join, this is the message with information. | ||
What string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this just be called Error
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's probably better - that was a C++ carryover.
api/operator_area.go
Outdated
// PeeerDatacenter is the peer Consul datacenter that will make up the | ||
// other side of this network area. Network areas always involve a pair | ||
// of datacenters: the datacenter where the area was created, and the | ||
// peer datacenter. This is required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a period here
return false, nil | ||
raft_vsn := 0 | ||
raft_vsn_str, ok := m.Tags["raft_vsn"] | ||
if ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch with this, I wouldn't have realized to do this until testing alongside an older consul binary
consul/server_test.go
Outdated
return len(s3.WANMembers()) == 3, nil | ||
}, func(err error) { | ||
t.Fatalf("bad len") | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this section be shortened to:
for i, s := range []*Server{s1, s2, s3} {
testutil.WaitForResult(func() (bool, error) {
return len(s.WANMembers()) == 3, nil
}, func(err error) {
t.Fatalf("bad len for server %d", i)
})
}
WAN Soft Fail
This makes request routing between servers in the WAN more robust by treating Serf failures as advisory but not final. This means that if servers are having issues between some subset of the servers in the WAN, we will still be able to route RPC requests as long as RPCs are actually still working. Prior to WAN soft fail, any DCs having connectivity problems on the WAN would mean that all DCs might potentially stop sending RPCs to those DCs.
WAN Join Flooding
We added a routine that looks for Consul servers in the LAN and makes sure that they are joined into the WAN as well. This catches up up newly-added servers onto the WAN as soon as they join the LAN.