Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for WAN soft fail and join flooding. #2801

Merged
merged 24 commits into from
Mar 20, 2017
Merged

Conversation

slackpad
Copy link
Contributor

@slackpad slackpad commented Mar 15, 2017

WAN Soft Fail

This makes request routing between servers in the WAN more robust by treating Serf failures as advisory but not final. This means that if servers are having issues between some subset of the servers in the WAN, we will still be able to route RPC requests as long as RPCs are actually still working. Prior to WAN soft fail, any DCs having connectivity problems on the WAN would mean that all DCs might potentially stop sending RPCs to those DCs.

WAN Join Flooding

We added a routine that looks for Consul servers in the LAN and makes sure that they are joined into the WAN as well. This catches up up newly-added servers onto the WAN as soon as they join the LAN.

@slackpad slackpad requested a review from kyhavlov March 20, 2017 15:21
@slackpad
Copy link
Contributor Author

Should be in good shape for a review.

Copy link
Contributor

@kyhavlov kyhavlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good, just a couple comments

// when creating a new area.
ID string

// PeeerDatacenter is the peer Consul datacenter that will make up the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo, should be PeerDatacenter

Joined bool

// If we couldn't join, this is the message with information.
What string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just be called Error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's probably better - that was a C++ carryover.

// PeeerDatacenter is the peer Consul datacenter that will make up the
// other side of this network area. Network areas always involve a pair
// of datacenters: the datacenter where the area was created, and the
// peer datacenter. This is required
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a period here

return false, nil
raft_vsn := 0
raft_vsn_str, ok := m.Tags["raft_vsn"]
if ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch with this, I wouldn't have realized to do this until testing alongside an older consul binary

return len(s3.WANMembers()) == 3, nil
}, func(err error) {
t.Fatalf("bad len")
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this section be shortened to:

for i, s := range []*Server{s1, s2, s3} {
	testutil.WaitForResult(func() (bool, error) {
		return len(s.WANMembers()) == 3, nil
	}, func(err error) {
		t.Fatalf("bad len for server %d", i)
	})
}

@slackpad slackpad merged commit 3b3cb0d into master Mar 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants