-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster config fixes and removal of meta.peers config field #3638
Conversation
The env var overrides panic if we if peers was set in the config. The type of that value is a []string and NumFields is not a valid for that type. We don't currently support this slice fields via the env variable settings. This particular one can be set with the -join flag already so we just skip these fields for now.
Adding a new peer must happen via the -join flag.
@@ -31,7 +31,7 @@ type Config struct { | |||
Dir string `toml:"dir"` | |||
Hostname string `toml:"hostname"` | |||
BindAddress string `toml:"bind-address"` | |||
Peers []string `toml:"peers"` | |||
Peers []string `toml:"-"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean? Ignore in older config files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It's ignored in old config files and not-settable via toml configs.
+1 |
Cluster config fixes and removal of meta.peers config field
Honestly, not sure this makes much sense. How are we supposed to control having a leader + follower set for when clustering is open to 100s or 1000s of nodes? Does this mean we'll always need to start one machine as the effective 'leader', then just have a -join for all other machines in the cluster? What if the leader fails and we need to restart that node, should they start back up with -join or without? If we look to something like hdfs, this is handled via zookeeper and leader election is seamless to the process (ie. all nodes start up the same way with leader election handled behind the scenes). This change puts a lot of that work onto the designer of the cluster, and effectively creates a leader + follower structure that doesn't really work when you get out to scale. |
When you start a cluster you can have them all start with the same Adding additional nodes just requires joining another node in an existing cluster. It does not need to join a member of the raft cluster. Any node is fine. Once a node has joined a cluster, the |
You can also create a cluster by incrementally joining new nodes to existing ones but you don't have to do it this way. For example, start one, then start new nodes passing |
Deleted prior comment, my understanding is actually incorrect. |
@beckettsean, @jwilder: thanks for the feedback. Obviously the documentation is still under review and there are a few kinks to work out...completely understood. It makes a little more sense now that we can effectively daisy-chain members into the cluster as necessary (according to @jwilder). The concern we had internally was that the -join logic was unclear...do we just specify the three nodes for raft consensus across all machines that would join the cluster? How would node 1000 work when specifying the same raft consensus set? What happens to the 1000 nodes in the cluster if the three in the raft consensus died? We'll keep testing these features out and document any issues that come up. |
@sebito91 You can specify any nodes to join. Using the same 3 nodes every time is fine even with larger clusters. If a raft node dies and will never come back, it would need to be replaced. If it just goes offline, it'll join the raft group when it restarts. With 3 raft nodes in the cluster, the cluster can support one raft node offline before availability is affected. With 5, it can support 2 failed. To replace a failed raft node, it's a manual process right now which still needs to be documented. |
This PR fixes a panic and a regression with the
-hostname
flag as well as removes the[meta].peers
config option to avoid confusion. The supported method for joining a node to a cluster is using the-join
flag.