-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added user doc for GCE HA master #1810
Conversation
Part of kubernetes/enhancements#48 |
CC @kubernetes/sig-cluster-lifecycle @kubernetes/sig-cluster-ops |
|
||
The sample command to set up the HA-compatible cluster: | ||
|
||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you mark this as ```shell does it format nicer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
$ MULTIZONE=true KUBE_GCE_ZONE=europe-west1-b ENABLE_ETCD_QUORUM_READS=true ./cluster/kube-up.sh | ||
``` | ||
|
||
Please note that execution of the comments above will create a cluster with one master, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/comments/commands
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
master. | ||
|
||
* `KUBE_GCE_ZONE=zone` - zone where the master replica will run. | ||
Should be in the same region as other replicas' zones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the script enforce this? Or is it just recommended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't support different regions. I'll add check to kube-up script.
|
||
### Deployment best practices | ||
|
||
* Try to place master replicas in different zones. During a zone failure, all master placed inside the zone will fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/master/masters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
### Deployment best practices | ||
|
||
* Try to place master replicas in different zones. During a zone failure, all master placed inside the zone will fail. | ||
o prevent zone failure, also place nodes in multiple zones |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/o/To
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
So, both replicas are needed and a failure of any replica turns cluster into majority failure state. | ||
Such two replica setup is worse in terms of HA than a single replica setup. | ||
|
||
* During addition of a master replica, cluster state (etcd) is copied to a new instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to track the current status of the data migration to see when it completes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know any easy way to track it. Maybe it is somehow reflected in etcd logs.
|
||
When starting the second master replica, a load balancer containing the two replicas will be created | ||
and the IP address of the first replica will be promoted to IP address of load balancer. | ||
Similarly, when after removal of a master replica, only one replica will remain, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"... after removal of the penultimate master replica, the load balancer..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
* Master certificates | ||
|
||
Master TLS certificates will be generated for the external public IP (of the load balancer) and local IP of each replica. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove "of the load balancer". I think the external ip is described sufficiently well above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
Similarly, the external IP will be used by kubelets to communicate with master. | ||
|
||
* Master certificates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be a sub-heading instead of a bullet point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There will be no certs for ephemeral public IP of replicas. | ||
So, accessing them using ephemeral public IP will be possible only when skipping TLS verification. | ||
|
||
* Clustering etcd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a heading too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Should you add a pointer from |
Yes, but I plan to update docs/admin/high-availability in another PR |
89a58ef
to
a108a57
Compare
Comments applied, PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minor points, after which you can self-apply the lgtm label.
To allow etcd clustering, ports needed to communicate between etcd instances will be opened (for inside cluster communication). | ||
To make such deployment secure, communication between etcd instances is authorized using SSL. | ||
|
||
## Future reading |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Future/Additional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* `KUBE_GCE_ZONE=zone` - zone where the master replica will run. | ||
Must be in the same region as other replicas' zones. | ||
|
||
* you don't need to set `MULTIZONE` or `ENABLE_ETCD_QUORUM_READS` flags as they values will be inherited from already running clusters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a paragraph instead of a bullet point, since it isn't a flag to set but rather guidance about flags not to set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Applying LGTM |
I took off the lgtm because I'm not sure if we need a docs lgtm in addition to a technical lgtm (which is what the spreadsheet implies). We should clarify that and then get this merged. |
We need both Tech LGTM and Docs LGTM. I usually interpret the regular "lgtm" label as Tech LGTM. Doing docs review now. |
thanks @devin-donnelly. i've added back the lgtm label. |
|
||
## Introduction | ||
|
||
In kubernetes version 1.5, we added alpha support for replication of kubernetes masters in kube-up/down scripts for GCE. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid "we" constructs.
Suggested rephrase: "Kubernetes version 1.5 adds alpha support for replicating Kubernetes masters in kube-up
or kube-down
scripts for Google Container Engine."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
## Introduction | ||
|
||
In kubernetes version 1.5, we added alpha support for replication of kubernetes masters in kube-up/down scripts for GCE. | ||
This document describes how to use kube-up/down scripts to manage highly available (HA) masters and how HA masters are implemented for GCE case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"implemented for GCE case" -> "implmented for use with GCE."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
## Running HA cluster on GCE | ||
|
||
### Starting HA-compatible cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid nesting headings directly beneath one another; that's usually indicative of structural problems.
In this case, "Running HA Cluster on GCE" is uncessary and doesn't add anything. Move the subsequent headers one level up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
### Starting HA-compatible cluster | ||
|
||
When creating a new HA cluster, two flags need to be set for kube-up script: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"To create a new HA cluster, you must set the following flags in your kube-up
script:"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
If true, reads will be directed to leader etcd replica. | ||
Setting this value to true is optional: reads will be more reliable but will also be slower. | ||
|
||
In addition, we may specify in which GCE zone the first master replica will be created by setting: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Optionally, you can specify a GCE zone where the first master replica is to be created. Set the the following flag:"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
(see [multiple-zones](http://kubernetes.io/docs/admin/multiple-zones/) for details). | ||
|
||
* Do not use cluster with two master replicas. Consensus on a two replica cluster requires both replicas running when changing persistent state. | ||
So, both replicas are needed and a failure of any replica turns cluster into majority failure state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"So", "As a result,"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
* Do not use cluster with two master replicas. Consensus on a two replica cluster requires both replicas running when changing persistent state. | ||
So, both replicas are needed and a failure of any replica turns cluster into majority failure state. | ||
Such two replica setup is worse in terms of HA than a single replica setup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"A two-replica cluster is thus inferior, in terms of HA, to a single replica cluster."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
So, both replicas are needed and a failure of any replica turns cluster into majority failure state. | ||
Such two replica setup is worse in terms of HA than a single replica setup. | ||
|
||
* During addition of a master replica, cluster state (etcd) is copied to a new instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"During addition of a master replica," -> "When you add a master replica,"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
### Master service & kubelets | ||
|
||
Instead of trying to keep up-to-date list of kubernetes apiserver in kubernetes service, we will direct all traffic to the external IP: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"We" constuct. If "We" is in this case the Kubernetes system, say so.
"Instead of trying to keep an up-to-date list of Kubernetes apiserver in the Kubernetes service, the system directs all traffic to the external IP:"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
Master TLS certificates will be generated for the external public IP and local IP of each replica. | ||
There will be no certs for ephemeral public IP of replicas. | ||
So, accessing them using ephemeral public IP will be possible only when skipping TLS verification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid "so."
Avoid the the "them" pronoun; as written, "them" refers to "ephemeral public IPs" when it looks like you mean "master replicas. Be explicit. Also try to avoid future tense and passive voice.
Example rephrasing:
"Kubernetes generates Master TLS certificates for the external public IP and local IP for each replica. There are no certificates for the ephemeral public IP for replicas; to access a replica via its ephemeral public IP, you must skip TLS verification."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@devin-donnelly |
Kubernetes version 1.5 adds alpha support for replicating Kubernetes masters in kube-up or kube-down scripts for Google Compute Engine. | ||
This document describes how to use kube-up/down scripts to manage highly available (HA) masters and how HA masters are implemented for use with GCE. | ||
|
||
## Starting HA-compatible cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Starting an HA-compatible cluster"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
## Starting HA-compatible cluster | ||
|
||
To create a new HA-compatible cluster, you must set the following flags in your kube-up script: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code format kube-up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
## Adding a new master replica | ||
|
||
After you have created an HA-compatible cluster, you can add master replicas to it. | ||
You add master replicas by using a kube-up script with the following flags: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code format kube-up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
$ KUBE_GCE_ZONE=europe-west1-c KUBE_REPLICATE_EXISTING_MASTER=true ./cluster/kube-up.sh | ||
``` | ||
|
||
## Removing master replica |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Removing a master replica"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* `KUBE_GCE_ZONE=zone` - zone where the master replica will run. | ||
Must be in the same region as other replicas' zones. | ||
|
||
You don't need to set the `MULTIZONE` or `ENABLE_ETCD_QUORUM_READS` flags, as those values are inherited from already running cluster, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three clauses is probably one too many for this sentence. :)
"You don't need to set the MULTIZONE
or ENABLE_ETCD_QUORUM_READS
flags, as those are inherited from when you started your HA-compatible cluster."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Awesome, thanks. Just a few more changes. |
@devin-donnelly - have the content changed enough that i should take another pass once you are finished reviewing? |
Added user doc for GCE HA master.
@devin-donnelly |
@roberthbailey , all my comments are on doc organization and wording; the same things get said, but they may use fewer pronouns, active voice, or be said in a slightly different order. I think your LGTM should still stand. |
@devin-donnelly - thanks. |
Thanks, @jszczepkowski ! This is good to go. |
Added user doc for GCE HA master.
This change is