Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added user doc for GCE HA master #1810

Merged
merged 1 commit into from
Dec 2, 2016

Conversation

jszczepkowski
Copy link
Contributor

@jszczepkowski jszczepkowski commented Nov 29, 2016

Added user doc for GCE HA master.


This change is Reviewable

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 29, 2016
@jszczepkowski jszczepkowski added this to the 1.5 milestone Nov 29, 2016
@jszczepkowski
Copy link
Contributor Author

Part of kubernetes/enhancements#48

@jszczepkowski
Copy link
Contributor Author

CC @kubernetes/sig-cluster-lifecycle @kubernetes/sig-cluster-ops


The sample command to set up the HA-compatible cluster:

```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you mark this as ```shell does it format nicer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

$ MULTIZONE=true KUBE_GCE_ZONE=europe-west1-b ENABLE_ETCD_QUORUM_READS=true ./cluster/kube-up.sh
```

Please note that execution of the comments above will create a cluster with one master,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/comments/commands

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

master.

* `KUBE_GCE_ZONE=zone` - zone where the master replica will run.
Should be in the same region as other replicas' zones.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the script enforce this? Or is it just recommended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't support different regions. I'll add check to kube-up script.


### Deployment best practices

* Try to place master replicas in different zones. During a zone failure, all master placed inside the zone will fail.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/master/masters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

### Deployment best practices

* Try to place master replicas in different zones. During a zone failure, all master placed inside the zone will fail.
o prevent zone failure, also place nodes in multiple zones
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/o/To

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

So, both replicas are needed and a failure of any replica turns cluster into majority failure state.
Such two replica setup is worse in terms of HA than a single replica setup.

* During addition of a master replica, cluster state (etcd) is copied to a new instance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to track the current status of the data migration to see when it completes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know any easy way to track it. Maybe it is somehow reflected in etcd logs.


When starting the second master replica, a load balancer containing the two replicas will be created
and the IP address of the first replica will be promoted to IP address of load balancer.
Similarly, when after removal of a master replica, only one replica will remain,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... after removal of the penultimate master replica, the load balancer..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


* Master certificates

Master TLS certificates will be generated for the external public IP (of the load balancer) and local IP of each replica.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove "of the load balancer". I think the external ip is described sufficiently well above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


Similarly, the external IP will be used by kubelets to communicate with master.

* Master certificates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a sub-heading instead of a bullet point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

There will be no certs for ephemeral public IP of replicas.
So, accessing them using ephemeral public IP will be possible only when skipping TLS verification.

* Clustering etcd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a heading too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@davidopp
Copy link
Member

Should you add a pointer from
http://kubernetes.io/docs/admin/high-availability/
to this doc?

@jszczepkowski
Copy link
Contributor Author

Should you add a pointer from
http://kubernetes.io/docs/admin/high-availability/
to this doc?

Yes, but I plan to update docs/admin/high-availability in another PR

@jszczepkowski
Copy link
Contributor Author

Comments applied, PTAL

Copy link
Contributor

@roberthbailey roberthbailey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor points, after which you can self-apply the lgtm label.

To allow etcd clustering, ports needed to communicate between etcd instances will be opened (for inside cluster communication).
To make such deployment secure, communication between etcd instances is authorized using SSL.

## Future reading
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Future/Additional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* `KUBE_GCE_ZONE=zone` - zone where the master replica will run.
Must be in the same region as other replicas' zones.

* you don't need to set `MULTIZONE` or `ENABLE_ETCD_QUORUM_READS` flags as they values will be inherited from already running clusters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a paragraph instead of a bullet point, since it isn't a flag to set but rather guidance about flags not to set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jszczepkowski
Copy link
Contributor Author

A couple of minor points, after which you can self-apply the lgtm label.

Applying LGTM

@roberthbailey
Copy link
Contributor

I took off the lgtm because I'm not sure if we need a docs lgtm in addition to a technical lgtm (which is what the spreadsheet implies). We should clarify that and then get this merged.

@devin-donnelly
Copy link
Contributor

We need both Tech LGTM and Docs LGTM. I usually interpret the regular "lgtm" label as Tech LGTM. Doing docs review now.

@roberthbailey
Copy link
Contributor

thanks @devin-donnelly. i've added back the lgtm label.


## Introduction

In kubernetes version 1.5, we added alpha support for replication of kubernetes masters in kube-up/down scripts for GCE.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid "we" constructs.

Suggested rephrase: "Kubernetes version 1.5 adds alpha support for replicating Kubernetes masters in kube-up or kube-down scripts for Google Container Engine."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

## Introduction

In kubernetes version 1.5, we added alpha support for replication of kubernetes masters in kube-up/down scripts for GCE.
This document describes how to use kube-up/down scripts to manage highly available (HA) masters and how HA masters are implemented for GCE case.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"implemented for GCE case" -> "implmented for use with GCE."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


## Running HA cluster on GCE

### Starting HA-compatible cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid nesting headings directly beneath one another; that's usually indicative of structural problems.

In this case, "Running HA Cluster on GCE" is uncessary and doesn't add anything. Move the subsequent headers one level up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


### Starting HA-compatible cluster

When creating a new HA cluster, two flags need to be set for kube-up script:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"To create a new HA cluster, you must set the following flags in your kube-up script:"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

If true, reads will be directed to leader etcd replica.
Setting this value to true is optional: reads will be more reliable but will also be slower.

In addition, we may specify in which GCE zone the first master replica will be created by setting:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Optionally, you can specify a GCE zone where the first master replica is to be created. Set the the following flag:"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

(see [multiple-zones](http://kubernetes.io/docs/admin/multiple-zones/) for details).

* Do not use cluster with two master replicas. Consensus on a two replica cluster requires both replicas running when changing persistent state.
So, both replicas are needed and a failure of any replica turns cluster into majority failure state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"So", "As a result,"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


* Do not use cluster with two master replicas. Consensus on a two replica cluster requires both replicas running when changing persistent state.
So, both replicas are needed and a failure of any replica turns cluster into majority failure state.
Such two replica setup is worse in terms of HA than a single replica setup.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"A two-replica cluster is thus inferior, in terms of HA, to a single replica cluster."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

So, both replicas are needed and a failure of any replica turns cluster into majority failure state.
Such two replica setup is worse in terms of HA than a single replica setup.

* During addition of a master replica, cluster state (etcd) is copied to a new instance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"During addition of a master replica," -> "When you add a master replica,"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


### Master service & kubelets

Instead of trying to keep up-to-date list of kubernetes apiserver in kubernetes service, we will direct all traffic to the external IP:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"We" constuct. If "We" is in this case the Kubernetes system, say so.

"Instead of trying to keep an up-to-date list of Kubernetes apiserver in the Kubernetes service, the system directs all traffic to the external IP:"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


Master TLS certificates will be generated for the external public IP and local IP of each replica.
There will be no certs for ephemeral public IP of replicas.
So, accessing them using ephemeral public IP will be possible only when skipping TLS verification.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid "so."
Avoid the the "them" pronoun; as written, "them" refers to "ephemeral public IPs" when it looks like you mean "master replicas. Be explicit. Also try to avoid future tense and passive voice.

Example rephrasing:
"Kubernetes generates Master TLS certificates for the external public IP and local IP for each replica. There are no certificates for the ephemeral public IP for replicas; to access a replica via its ephemeral public IP, you must skip TLS verification."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jszczepkowski
Copy link
Contributor Author

@devin-donnelly
Comments applied, PTAL

Kubernetes version 1.5 adds alpha support for replicating Kubernetes masters in kube-up or kube-down scripts for Google Compute Engine.
This document describes how to use kube-up/down scripts to manage highly available (HA) masters and how HA masters are implemented for use with GCE.

## Starting HA-compatible cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Starting an HA-compatible cluster"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


## Starting HA-compatible cluster

To create a new HA-compatible cluster, you must set the following flags in your kube-up script:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code format kube-up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

## Adding a new master replica

After you have created an HA-compatible cluster, you can add master replicas to it.
You add master replicas by using a kube-up script with the following flags:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code format kube-up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

$ KUBE_GCE_ZONE=europe-west1-c KUBE_REPLICATE_EXISTING_MASTER=true ./cluster/kube-up.sh
```

## Removing master replica
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Removing a master replica"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* `KUBE_GCE_ZONE=zone` - zone where the master replica will run.
Must be in the same region as other replicas' zones.

You don't need to set the `MULTIZONE` or `ENABLE_ETCD_QUORUM_READS` flags, as those values are inherited from already running cluster,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three clauses is probably one too many for this sentence. :)

"You don't need to set the MULTIZONE or ENABLE_ETCD_QUORUM_READS flags, as those are inherited from when you started your HA-compatible cluster."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@devin-donnelly
Copy link
Contributor

Awesome, thanks. Just a few more changes.

@roberthbailey
Copy link
Contributor

@devin-donnelly - have the content changed enough that i should take another pass once you are finished reviewing?

Added user doc for GCE HA master.
@jszczepkowski
Copy link
Contributor Author

@devin-donnelly
comments applied, PTAL

@devin-donnelly
Copy link
Contributor

@roberthbailey , all my comments are on doc organization and wording; the same things get said, but they may use fewer pronouns, active voice, or be said in a slightly different order. I think your LGTM should still stand.

@roberthbailey
Copy link
Contributor

@devin-donnelly - thanks.

@devin-donnelly
Copy link
Contributor

Thanks, @jszczepkowski ! This is good to go.

@devin-donnelly devin-donnelly merged commit f9b4854 into kubernetes:release-1.5 Dec 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants