diff --git a/etcd/etcd-live-cluster-reconfiguration.md b/etcd/etcd-live-cluster-reconfiguration.md index 9c0b225f7..92722f6e1 100644 --- a/etcd/etcd-live-cluster-reconfiguration.md +++ b/etcd/etcd-live-cluster-reconfiguration.md @@ -1,236 +1,350 @@ # etcd cluster runtime reconfiguration on CoreOS Container Linux -This document describes the reconfiguration or recovery of an etcd cluster running on Container Linux, using a combination of `systemd` features and `etcdctl` commands. +This document describes the reconfiguration and recovery of an etcd cluster running on Container Linux, using a combination of `systemd` features and `etcdctl` commands. The examples given in this document show the configuration for a three-node Container Linux cluster. -## Change etcd cluster size +## Configuring etcd using Container Linux Config -When [a Container Linux Config][cl-configs] is used to configure an etcd member on a Container Linux node, it compiles a special `/etc/systemd/system/etcd-member.service.d/20-clct-etcd-member.conf` [drop-in unit file][drop-in]. That is, the Container Linux Config below: +When a [Container Linux Config][cl-configs] is used for configuring an etcd member on a Container Linux node, it compiles a special `/etc/systemd/system/etcd-member.service.d/20-clct-etcd-member.conf` [drop-in unit file][drop-in]. For example: ```yaml container-linux-config etcd: + name: demo-etcd-1 + listen_client_urls: https://10.240.0.1:2379,http://0.0.0.0:4001 advertise_client_urls: http://:2379 + listen_peer_urls: http://0.0.0.0:2380 initial_advertise_peer_urls: http://:2380 - listen_client_urls: http://0.0.0.0:2379,http://0.0.0.0:4001 - listen_peer_urls: http://0.0.0.0:2380 - discovery: https://discovery.etcd.io/ + initial_cluster: demo-etcd-1=https://0.0.0.1:2380,demo-etcd-2=https://0.0.0.2:2380,demo-etcd-3=https://0.0.0.3:2380 + initial_cluster_token: demo-etcd-token + initial_cluster_state: new ``` -will generate the following [drop-in][drop-in]: +The config file is first validated and transformed into a machine-readable form, which is then sent directly to a Container Linux provisioning target. The [drop-in][drop-in] generated from the example config file is given below: ```ini [Service] -Environment="ETCD_IMAGE_TAG=v3.1.4" ExecStart= ExecStart=/usr/lib/coreos/etcd-wrapper $ETCD_OPTS \ - --advertise-client-urls: http://:2379 \ - --initial-advertise-peer-urls: http://:2380 \ - --listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001 \ - --listen-peer-urls: http://0.0.0.0:2380 \ - --discovery: https://discovery.etcd.io/ + --name="demo-etcd-1" \ + --listen-peer-urls="http://0.0.0.0:2380" \ + --listen-client-urls="https://10.240.0.1:2379,http://0.0.0.0:4001" \ + --initial-advertise-peer-urls="http://:2380" \ + --initial-cluster="demo-etcd-1=https://0.0.0.1:2380,demo-etcd-2=https://0.0.0.2:2380,demo-etcd-3=https://0.0.0.3:2380" \ + --initial-cluster-state="new" \ + --initial-cluster-token="demo-etcd-token" \ + --advertise-client-urls="http://:2379" ``` -If the etcd cluster is secured with TLS, use `https://` instead of `http://` in the command examples below. +If the etcd cluster is secured with TLS, use `https://` instead of `http://` in the config files. If the peer addresses for the initial cluster are known at time of spinning up the cluster, `--discovery="https://discovery.etcd.io/` is not required. -Assume that you have created a five-node Container Linux cluster, but did not specify cluster size in the [discovery][etcd-discovery] URL. Since the default discovery cluster size is 3, the remaining two nodes were configured as proxies. You would like to promote these proxies to full etcd cluster members, without bootstrapping a new etcd cluster. +### Change etcd cluster size -The existing cluster can be reconfigured. Run `etcdctl member add node4 http://10.0.1.4:2380`. Later steps will use information from the output of this command, so it's a good idea to copy and paste it somewhere convenient. The output of a successful member addition will look like this: +Changing the size of an etcd cluster is as simple as adding a new member, and using the output of the member addition, such as name of the new etcd member, member IDs, state and URLs of the cluster, to the config file for provisioning on the Container Linux node. -``` -added member 9bf1b35fc7761a23 to cluster +1. Run the `etcdctl member add` command. -ETCD_NAME="node4" -ETCD_INITIAL_CLUSTER="1dc800dbf6a732d8839bc71d0538bb99=http://10.0.1.1:2380,f961e5cb1b0cb8810ea6a6b7a7c8b5cf=http://10.0.1.2:2380,8982fae69ad09c623601b68c83818921=http://10.0.1.3:2380,node4=http://10.0.1.4:2380" -ETCD_INITIAL_CLUSTER_STATE=existing -``` + For example: -The `ETCD_DISCOVERY` environment variable defined in `20-cloudinit.conf` conflicts with the `ETCD_INITIAL_CLUSTER` setting needed for these steps, so the first step is clearing it by overriding `20-cloudinit.conf` with a new drop-in, `99-restore.conf`. `99-restore.conf` contains an empty `Environment="ETCD_DISCOVERY="` string. + ```sh + $ etcdctl member add node4 http://0.0.0.4:2380 + ``` -The complete example looks like this. On the `node4` Container Linux host, create a temporary systemd drop-in, `/run/systemd/system/etcd2.service.d/99-restore.conf` with the contents below, filling in the information from the output of the `etcd member add` command we ran previously: + The output of a successful member addition is given below: -```ini -[Service] -# remove previously created proxy directory -ExecStartPre=/usr/bin/rm -rf /var/lib/etcd2/proxy -# NOTE: use this option if you would like to re-add broken etcd member into cluster -# Don't forget to make a backup before -#ExecStartPre=/usr/bin/rm -rf /var/lib/etcd2/member /var/lib/etcd2/proxy -# here we clean previously defined ETCD_DISCOVERY environment variable, we don't need it as we've already bootstrapped etcd cluster and ETCD_DISCOVERY conflicts with ETCD_INITIAL_CLUSTER environment variable -Environment="ETCD_DISCOVERY=" -Environment="ETCD_NAME=node4" -# We use ETCD_INITIAL_CLUSTER variable value from previous step ("etcdctl member add" output) -Environment="ETCD_INITIAL_CLUSTER=node1=http://10.0.1.1:2380,node2=http://10.0.1.2:2380,node3=http://10.0.1.3:2380,node4=http://10.0.1.4:2380" -Environment="ETCD_INITIAL_CLUSTER_STATE=existing" -``` + ```sh + added member 9bf1b35fc7761a23 to cluster -Run `sudo systemctl daemon-reload` to parse the new and edited units. Check whether the new [drop-in][drop-in] is valid by checking the service's journal: `sudo journalctl _PID=1 -e -u etcd2`. If everything is ok, run `sudo systemctl restart etcd2` to activate your changes. You will see that the former proxy node has become a cluster member: + ETCD_NAME="node4" + ETCD_INITIAL_CLUSTER="1dc800dbf6a732d8839bc71d0538bb99=http://10.0.1.1:2380,f961e5cb1b0cb8810ea6a6b7a7c8b5cf=http://10.0.1.2:2380,8982fae69ad09c623601b68c83818921=http://10.0.1.3:2380,node4=http://10.0.1.4:2380" + ETCD_INITIAL_CLUSTER_STATE=existing + ``` +2. Store the output of this command for later use. -``` -etcdserver: start member 9bf1b35fc7761a23 in cluster 36cce781cb4f1292 -``` +3. On the `node4` Container Linux host, create a temporary systemd drop-in, `/run/systemd/system/etcd-member.service.d/99-restore.conf`. + +4. Add the contents below to the `etcd-member.service.d/99-restore.conf` drop-in, filling in the information from the output of the `etcd member add` command ran previously: + + ```yaml container-linux-config + etcd: + name: node4 + listen_client_urls: https://0.0.0.4:2379,http://0.0.0.4:4001 + advertise_client_urls: http://:2379 + listen_peer_urls: http://0.0.0.0:2380 + initial_advertise_peer_urls: http://:2380 + initial_cluster: 1dc800dbf6a732d8839bc71d0538bb99=http://10.0.1.1:2380,f961e5cb1b0cb8810ea6a6b7a7c8b5cf=http://10.0.1.2:2380,8982fae69ad09c623601b68c83818921=http://10.0.1.3:2380,node4=http://10.0.1.4:2380 + initial_cluster_state: existing + ``` + +5. Apply the changes to the cluster: + + `$ systemctl daemon-reload` -Once your new member node is up and running, and `etcdctl cluster-health` shows a healthy cluster, remove the temporary drop-in file and reparse the services: `sudo rm /run/systemd/system/etcd2.service.d/99-restore.conf && sudo systemctl daemon-reload`. +6. Start the etcd member servie: -## Replace a failed etcd member on CoreOS Container Linux + `$ sudo systemctl start etcd-member.service` -This section provides instructions on how to recover a failed etcd member. It is important to know that an etcd cluster cannot be restored using only a discovery URL; the discovery URL is used only once during cluster bootstrap. +7. Check whether the new member node is up and running: -In this example, we use a 3-member etcd cluster with one failed node, that is still running and has maintained [quorum][majority]. An etcd member node might fail for several reasons: out of disk space, an incorrect reboot, or issues on the underlying system. Note that this example assumes you used [a Container Linux Config][cl-configs] with an etcd [discovery URL][etcd-discovery] to bootstrap your cluster, with the following default options: + ```sh + $ etcdctl cluster-health + + member 9bf1b35fc7761a23 is healthy: got healthy result from http://10.0.1.2:2379 + cluster is healthy + ``` + +If your cluster has healthy state, etcd successfully writes cluster configuration into the `/var/lib/etcd` directory. + +### Replace a failed etcd member on CoreOS Container Linux + +An etcd member node might fail for several reasons: out of disk space, an incorrect reboot, or issues on the underlying system. This section provides instructions on how to recover a failed etcd member. + +Consider a scenario where a member is failed in a three-member cluster. The cluster is still running and has maintained [quorum][majority]. The example assumes [a Container Linux Config][cl-configs] is used with the following default options: ```yaml container-linux-config etcd: + name: demo-etcd-1 + listen_client_urls: https://10.240.0.1:2379,http://0.0.0.0:4001 advertise_client_urls: http://:2379 + listen_peer_urls: http://0.0.0.0:2380 initial_advertise_peer_urls: http://:2380 - listen_client_urls: http://0.0.0.0:2379,http://0.0.0.0:4001 - listen_peer_urls: http://0.0.0.0:2380 - discovery: https://discovery.etcd.io/ + initial_cluster: demo-etcd-1=https://0.0.0.1:2380,demo-etcd-2=https://0.0.0.2:2380,demo-etcd-3=https://0.0.0.3:2380 + initial_cluster_token: demo-etcd-token + initial_cluster_state: new ``` If the etcd cluster is protected with TLS, use `https://` instead of `http://` in the examples below. -Let's assume that your etcd cluster has a faulty member `10.0.1.2`: +Assume that the given etcd cluster has a faulty member `0.0.0.2`: ```sh $ etcdctl cluster-health -member fe2f75dd51fa5ff is healthy: got healthy result from http://10.0.1.1:2379 -failed to check the health of member 1609b5a3a078c227 on http://10.0.1.2:2379: Get http://10.0.1.2:2379/health: dial tcp 10.0.1.2:2379: connection refused -member 1609b5a3a078c227 is unreachable: [http://10.0.1.2:2379] are all unreachable -member 60e8a32b09dc91f1 is healthy: got healthy result from http://10.0.1.3:2379 +member fe2f75dd51fa5ff is healthy: got healthy result from http://0.0.0.1:2379 +failed to check the health of member 1609b5a3a078c227 on http://0.0.0.2:2379: Get http://0.0.0.2:2379/health: dial tcp 0.0.0.2:2379: connection refused +member 1609b5a3a078c227 is unreachable: [http://0.0.0.2:2379] are all unreachable +member 60e8a32b09dc91f1 is healthy: got healthy result from http://0.0.0.3:2379 cluster is healthy ``` -Run `etcdctl` from a working node, or use the [`ETCDCTL_ENDPOINT`][etcdctl-endpoint] environment variable or command line option to point `etcdctl` at any healthy member node. +1. Perform one of the following: -[Remove the failed member][etcdctl-member-remove] `10.0.1.2` from the etcd cluster. The remove subcommand informs all other cluster nodes that a human has determined this node is dead and not available for connections: + * Run `etcdctl` from a working node + * Use the [`ETCDCTL_ENDPOINT`][etcdctl-endpoint] environment variable + * Use the command line option to point `etcdctl` at any healthy member node. -```sh -$ etcdctl member remove 1609b5a3a078c227 -Removed member 1609b5a3a078c227 from cluster -``` +2. [Remove the failed member][etcdctl-member-remove] `0.0.0.2` from the etcd cluster. -Then, on the failed node (`10.0.1.2`), stop the etcd2 service: + ```sh + $ etcdctl member remove 1609b5a3a078c227 + Removed member 1609b5a3a078c227 from cluster + ``` +The remove subcommand informs all other cluster nodes that a human has determined this node is dead and not available for connections: -```sh -$ sudo systemctl stop etcd2 -``` +3. Stop the etcd-member service on the failed node (`0.0.0.2`): -Clean up the `/var/lib/etcd2` directory: + ```sh + $ sudo systemctl stop etcd-member.service + ``` -```sh -$ sudo rm -rf /var/lib/etcd2/* -``` +4. Clean up the `/var/lib/etcd` directory: -Check that the `/var/lib/etcd2/` directory exists and is empty. If you removed this directory accidentally, you can recreate it with the proper modes by using: + ```sh + $ sudo rm -rf /var/lib/etcd/* + ``` -```sh -$ sudo systemd-tmpfiles --create /usr/lib64/tmpfiles.d/etcd2.conf -``` +5. Check that the `/var/lib/etcd/` directory exists and is empty. -Next, reinitialize the failed member. Note that `10.0.1.2` is an example IP address. Use the IP address corresponding to your failed node: + If you removed this directory accidentally, recreate it with the proper modes by using: -```sh -$ etcdctl member add node2 http://10.0.1.2:2380 -Added member named node2 with ID 4fb77509779cac99 to cluster + ```sh + $ sudo systemd-tmpfiles --create /usr/lib64/tmpfiles.d/etcd-wrapper.conf + ``` -ETCD_NAME="node2" -ETCD_INITIAL_CLUSTER="52d2c433e31d54526cf3aa660304e8f1=http://10.0.1.1:2380,node2=http://10.0.1.2:2380,2cb7bb694606e5face87ee7a97041758=http://10.0.1.3:2380" -ETCD_INITIAL_CLUSTER_STATE="existing" -``` +6. Reinitialize the failed member. -With the new node added, create a systemd [drop-in][drop-in] `/run/systemd/system/etcd2.service.d/99-restore.conf`, replacing the node data with the appropriate information from the output of the `etcdctl member add` command executed in the last step. + Note that `0.0.0.2` is an example IP address. Use the IP address corresponding to your failed node: -```ini -[Service] -# here we clean previously defined ETCD_DISCOVERY environment variable, we don't need it as we've already bootstrapped etcd cluster and ETCD_DISCOVERY conflicts with ETCD_INITIAL_CLUSTER environment variable -Environment="ETCD_DISCOVERY=" -Environment="ETCD_NAME=node2" -# We use ETCD_INITIAL_CLUSTER variable value from previous step ("etcdctl member add" output) -Environment="ETCD_INITIAL_CLUSTER=52d2c433e31d54526cf3aa660304e8f1=http://10.0.1.1:2380,node2=http://10.0.1.2:2380,2cb7bb694606e5face87ee7a97041758=http://10.0.1.3:2380" -Environment="ETCD_INITIAL_CLUSTER_STATE=existing" -``` + ```sh + $ etcdctl member add node2 http://0.0.0.2:2380 + Added member named node2 with ID 4fb77509779cac99 to cluster -**Note:** Make sure to remove the excess double quotes just after `ETCD_INITIAL_CLUSTER=` entry. + ETCD_NAME="node2" + ETCD_INITIAL_CLUSTER="52d2c433e31d54526cf3aa660304e8f1=http://0.0.0.1:2380,node2=http://0.0.0.2:2380,2cb7bb694606e5face87ee7a97041758=http://0.0.0.3:2380" + ETCD_INITIAL_CLUSTER_STATE="existing" + ``` -Parse the new drop-in: +7. Modify the existing systemd drop-in, `/etc/systemd/system/etcd-member.service.d/20-clct-etcd-member.conf` by replacing the node data with the appropriate information from the output of the `etcdctl member add` command executed in the last step. -```sh -$ sudo systemctl daemon-reload -``` + ```ini + [Service] + ExecStart= + ExecStart=/usr/lib/coreos/etcd-wrapper $ETCD_OPTS \ + --name="node2" \ + --listen-peer-urls="http://0.0.0.0:2380" \ + --listen-client-urls="https://0.0.0.2:2379,http://0.0.0.0:4001" \ + --initial-advertise-peer-urls="http://:2380" \ + --initial-cluster="52d2c433e31d54526cf3aa660304e8f1=http://0.0.0.1:2380,node2=http://0.0.0.2:2380,2cb7bb694606e5face87ee7a97041758=http://0.0.0.3:2380" \ + --initial-cluster-state="existing" \ + --initial-cluster-token="demo-etcd-token" \ + --advertise-client-urls="http://:2379" + ``` -Check whether the new [drop-in][drop-in] is valid: +8. Apply the changes to the cluster: -```sh -sudo journalctl _PID=1 -e -u etcd2 -``` + `$ systemctl daemon-reload` -And finally, if everything is ok start the `etcd2` service: +9. Start the etcd member servie: -```sh -$ sudo systemctl start etcd2 -``` + `$ sudo systemctl start etcd-member.service` -Check cluster health: +10. Check the cluster health: -```sh -$ etcdctl cluster-health -``` + ```sh + $ etcdctl cluster-health + + member e6c2bda2aa1f2dcf is healthy: got healthy result from http://0.0.0.2:2379 + cluster is healthy + ``` + +If your cluster has healthy state, etcd successfully writes cluster configuration into the `/var/lib/etcd` directory. + +### Recovering etcd on CoreOS Container Linux + +#### etcd v3 + +1. Download `etcdctl` from the [etcd Release page][etcd-release] and install, for example, into `/opt/bin`. + +2. Create a backup directory: -If your cluster has healthy state, etcd successfully wrote cluster configuration into the `/var/lib/etcd2` directory. Now it is safe to remove the temporary `/run/systemd/system/etcd2.service.d/99-restore.conf` drop-in file. + `$ sudo mkdir /var/lib/etcd_backup` -## etcd disaster recovery on CoreOS Container Linux +3. Save the snapshot of the backup file: + + `$ sudo ETCDCTL_API=3 /opt/bin/etcdctl snapshot save /var/lib/etcd_backup/backup.db` + +4. Restore the snapshot file: + + `$ ETCDCTL_API=3 etcdctl snapshot restore snapshot.db \ + --name m1 \ + --initial-cluster m1=http://host1:2380 + --initial-cluster-token etcd-cluster-1 \ + --initial-advertise-peer-urls http://host1:2380` + +5. Remove the obsolete directory: + + `$ sudo rm -rf /var/lib/etcd` + +6. Move the backup file into `/var/lib/etcd`: + + `$ sudo mv /var/lib/etcd_backup/backup.db /var/lib/etcd` + +7. Set the etcd user permissions: + + `$ sudo chown etcd -R /var/lib/etcd` + +8. Start the etcd member service: + + `$ sudo systemctl start etcd-member.service` + +9. Check the node health: + + `$ etcdctl cluster-health` + +10. Add the new member on each nodes: + + `etcdctl member add node3 http://0.0.1.4:2380`. + + +#### etcd v2 If a cluster is totally broken and [quorum][majority] cannot be restored, all etcd members must be reconfigured from scratch. This procedure consists of two steps: * Initialize a one-member etcd cluster using the initial [data directory][data-dir] -* Resize this etcd cluster by adding new etcd members by following the steps in the [change the etcd cluster size][change-cluster-size] section, above. +* Resize this etcd cluster by adding new etcd members by following the steps in the [change the etcd cluster size][change-cluster-size] section. This document is an adaptation for Container Linux of the official [etcd disaster recovery guide][disaster-recovery], and uses systemd [drop-ins][drop-in] for convenience. -Let's assume a 3-node cluster with no living members. First, stop the `etcd2` service on all the members: +Consider a three-node cluster with two permanently lost members. -```sh -$ sudo systemctl stop etcd2 -``` +1. Stop the `etcd-member` service on all the members: -If you have etcd proxy nodes, they should update members list automatically according to the [`--proxy-refresh-interval`][proxy-refresh] configuration option. + ```sh + $ sudo systemctl stop etcd-member.service + ``` -Then, on one of the *member* nodes, run the following command to backup the current [data directory][data-dir]: + If you have etcd proxy nodes, they should update members list automatically according to the [`--proxy-refresh-interval`][proxy-refresh] configuration option. -```sh -$ sudo etcdctl backup --data-dir /var/lib/etcd2 --backup-dir /var/lib/etcd2_backup -``` +2. On one of the *member* nodes, run the following command to backup the current [data directory][data-dir]: -Now that we've made a backup, we tell etcd to start a one-member cluster. Create the `/run/systemd/system/etcd2.service.d/98-force-new-cluster.conf` [drop-in][drop-in] file with the following contents: + ```sh + $ sudo etcdctl backup --data-dir /var/lib/etcd --backup-dir /var/lib/etcd_backup + ``` -```ini -[Service] -Environment="ETCD_FORCE_NEW_CLUSTER=true" -``` + Now that a backup has been created, start a single-member cluster. -Then run `sudo systemctl daemon-reload`. Check whether the new [drop-in][drop-in] is valid by looking in its journal for errors: `sudo journalctl _PID=1 -e -u etcd2`. If everything is ok, start the `etcd2` daemon: `sudo systemctl start etcd2`. +3. Create the `/run/systemd/system/etcd-member.service.d/98-force-new-cluster.conf` [drop-in][drop-in] file with the following contents: -Check the cluster state: + ```ini + [Service] + Environment="ETCD_FORCE_NEW_CLUSTER=true" + ``` -```sh -$ etcdctl member list -e6c2bda2aa1f2dcf: name=1be6686cc2c842db035fdc21f56d1ad0 peerURLs=http://10.0.1.2:2380 clientURLs=http://10.0.1.2:2379 -$ etcdctl cluster-health -member e6c2bda2aa1f2dcf is healthy: got healthy result from http://10.0.1.2:2379 -cluster is healthy +4. Run `sudo systemctl daemon-reload`. + +5. Check whether the new [drop-in][drop-in] is valid by looking in its journal for errors: + + `sudo journalctl _PID=1 -e -u etcd-member.service`. + +6. If everything is ok, start the `etcd-member` daemon: + + `sudo systemctl start etcd-member.service`. + +7. Check the cluster state: + + ```sh + $ etcdctl member list + e6c2bda2aa1f2dcf: name=1be6686cc2c842db035fdc21f56d1ad0 peerURLs=http://10.0.1.2:2380 clientURLs=http://10.0.1.2:2379 + $ etcdctl cluster-health + member e6c2bda2aa1f2dcf is healthy: got healthy result from http://10.0.1.2:2379 + cluster is healthy + ``` + +8. If the output contains no errors, remove the `98-force-new-cluster.conf` drop-in file. + + `rm -rf /run/systemd/system/etcd-member.service.d/98-force-new-cluster.conf` + +9. Reload systemd services: + + `sudo systemctl daemon-reload`. + + It is not necessary to restart the `etcd-member` service after reloading the systemd services. + +10. Spin up new nodes. Follow the instruction given in section [add-new-node]. Ensure that the version is given in the config file: For example: + +```yaml container-linux-config +etcd: + version: 2.3.7 + name: demo-etcd-1 + listen_client_urls: https://10.240.0.1:2379,http://0.0.0.0:4001 + advertise_client_urls: http://:2379 + listen_peer_urls: http://0.0.0.0:2380 + initial_advertise_peer_urls: http://:2380 + initial_cluster: demo-etcd-1=https://0.0.0.1:2380,demo-etcd-2=https://0.0.0.2:2380,demo-etcd-3=https://0.0.0.3:2380 + initial_cluster_token: demo-etcd-token + initial_cluster_state: existing ``` -If the output contains no errors, remove the `/run/systemd/system/etcd2.service.d/98-force-new-cluster.conf` drop-in file, and reload systemd services: `sudo systemctl daemon-reload`. It is not necessary to restart the `etcd2` service after this step. -The next steps are those described in the [Change etcd cluster size][change-cluster-size] section, with one difference: Remove the `/var/lib/etcd2/member` directory as well as `/var/lib/etcd2/proxy`. [change-cluster-size]: #change-etcd-cluster-size [cl-configs]: ../os/provisioning.md [data-dir]: https://github.com/coreos/etcd/blob/master/Documentation/op-guide/configuration.md#-data-dir [disaster-recovery]: https://github.com/coreos/etcd/blob/master/Documentation/op-guide/recovery.md#disaster-recovery +[disaster-recovery-doc]: https://coreos.com/etcd/docs/latest/op-guide/recovery.html [drop-in]: ../os/using-systemd-drop-in-units.md [etcd-discovery]: https://github.com/coreos/etcd/blob/master/Documentation/op-guide/clustering.md#lifetime-of-a-discovery-url [etcdctl-endpoint]: https://github.com/coreos/etcd/tree/master/etcdctl#--endpoint [etcdctl-member-remove]: https://github.com/coreos/etcd/blob/master/Documentation/op-guide/runtime-configuration.md#remove-a-member +[etcd-release]: https://github.com/coreos/etcd/releases/ [machine-id]: http://www.freedesktop.org/software/systemd/man/machine-id.html [majority]: https://github.com/coreos/etcd/blob/master/Documentation/v2/admin_guide.md#fault-tolerance-table [proxy-refresh]: https://github.com/coreos/etcd/blob/master/Documentation/op-guide/configuration.md#--proxy-refresh-interval diff --git a/etcd/etcd-live-http-to-https-migration.md b/etcd/etcd-live-http-to-https-migration.md index f61928a7c..8aa6f5bb4 100644 --- a/etcd/etcd-live-http-to-https-migration.md +++ b/etcd/etcd-live-http-to-https-migration.md @@ -10,21 +10,20 @@ By default, etcd communicates with clients over two ports: 2379, the current and If you've configured flannel, fleet, or other components to use custom ports, or 2379 only, they will be reconfigured to use port 4001. -If etcd isn't listening on port 4001, it must also be reconfigured. If you used a Container Linux Config to spin up your machines, you can retrieve the `ETCD_LISTEN_CLIENT_URLS` value from `/etc/systemd/system/etcd-member.service.d/20-clct-etcd-member.conf` to verify the etcd ports: +If etcd isn't listening on port 4001, it must also be reconfigured. If you used a Container Linux Config to spin up your machines, you can retrieve the `--listen-client-urls` value from `/etc/systemd/system/etcd-member.service.d/20-clct-etcd-member.conf` to verify the etcd ports: ```sh -$ grep ETCD_LISTEN_CLIENT_URLS /run/systemd/system/etcd-member.service.d/20-clct-etcd-member.conf -Environment="ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379" +$ grep listen-client-urls /run/systemd/system/etcd-member.service.d/20-clct-etcd-member.conf + --listen-client-urls="http://0.0.0.0:2379" \ ``` -In this case etcd is listening only on port 2379. We'll add port 4001 with a systemd [drop-in][drop-ins] unit file. Create the file `/etc/systemd/system/etcd2.service.d/25-insecure_localhost.conf`. In this file, write an excerpt that appends the new URL on port 4001 to the existing value we retrieved in the step above: +In this case etcd is listening only on port 2379. Add port 4001 with a systemd [drop-in][drop-ins] unit file. Edit the line that starts with `--listen-client-urls` in the `/etc/systemd/system/etcd-member.service.d/20-clct-etcd-member.conf` file and append the new URL on port 4001 to the existing value retrieved in the previous step: ``` -[Service] -Environment="ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379,http://127.0.0.1:4001" +--listen-client-urls="http://0.0.0.0:2379,http://127.0.0.1:4001" ``` -Run `systemctl daemon-reload` followed by `systemctl restart etcd2` to restart etcd. Check cluster status using the [`etcdctl`][etcdctl] commands: +Run `systemctl daemon-reload` followed by `systemctl restart etcd-member.service` to restart etcd. Check cluster status using the [`etcdctl`][etcdctl] commands: ```sh $ etcdctl member list @@ -67,7 +66,7 @@ It is also necessary to modify your systemd [unit files][systemd-unit-file] or [ ## Configure etcd key pair -Now we will configure etcd to use the new certificates. Create a `/etc/systemd/system/etcd2.service.d/30-certs.conf` [drop-in][drop-ins] file with the following contents: +Now we will configure etcd to use the new certificates. Create a `/etc/systemd/system/etcd-member.service.d/30-certs.conf` [drop-in][drop-ins] file with the following contents: ``` [Service] @@ -81,7 +80,7 @@ Environment="ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/etcd/ca.pem" Environment="ETCD_PEER_CLIENT_CERT_AUTH=true" ``` -Reload systemd configs with `systemctl daemon-reload` then restart etcd by invoking `systemctl restart etcd2`. Check cluster health: +Reload systemd configs with `systemctl daemon-reload` then restart etcd by invoking `systemctl restart etcd-member.service`. Check cluster health: ```sh $ etcdctl member list @@ -92,7 +91,7 @@ Repeat this step on the rest of the cluster members. ### Configure etcd proxy key pair -If proxying etcd connections as discussed above, create a systemd [drop-in][drop-ins] unit file named `/etc/systemd/system/etcd2.service.d/30-certs.conf` with the following contents: +If proxying etcd connections as discussed above, create a systemd [drop-in][drop-ins] unit file named `/etc/systemd/system/etcd-member.service.d/30-certs.conf` with the following contents: ``` [Service] @@ -106,7 +105,7 @@ Environment="ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/etcd/ca.pem" Environment="ETCD_LISTEN_CLIENT_URLS=http://127.0.0.1:2379,http://127.0.0.1:4001" ``` -Reload systemd configs with `systemctl daemon-reload`, then restart etcd with `systemctl restart etcd2`. Check proxy status with, e.g.: +Reload systemd configs with `systemctl daemon-reload`, then restart etcd with `systemctl restart etcd-member.service`. Check proxy status with, e.g.: ```sh $ curl http://127.0.0.1:4001/v2/stats/self @@ -167,16 +166,15 @@ Apply the changes in the same manner described above, by running each of the pri ## Change etcd client URLs -Create a [drop-in][drop-ins] file named `/etc/systemd/system/etcd2.service.d/40-tls.conf` and write the following there: +Edit the lines that start with `--listen-client-urls`, `--advertise-client-urls`, and `--listen-peer-urls` in the `/etc/systemd/system/etcd-member.service.d/20-clct-etcd-member.conf` file and append the new URL on port 4001 to the existing value retrieved in the previous step: ``` -[Service] -Environment="ETCD_ADVERTISE_CLIENT_URLS=https://172.16.0.101:2379" -Environment="ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379,http://127.0.0.1:4001" -Environment="ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380" +--advertise-client-urls: https://172.16.0.101:2379,http://0.0.0.0:4001 \ +--listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001 \ +--listen-peer-urls: http://0.0.0.0:2380,http://0.0.0.0:4001 \ ``` -Reload systemd configs with `systemctl daemon-reload` and restart etcd by issuing `systemctl restart etcd2`. Check that HTTPS connections are working properly with, e.g.: +Reload systemd configs with `systemctl daemon-reload` and restart etcd by issuing `systemctl restart etcd-member.service`. Check that HTTPS connections are working properly with, e.g.: ```sh $ curl --cacert /etc/ssl/etcd/ca.pem --cert /etc/ssl/etcd/server1.pem --key /etc/ssl/etcd/server1-key.pem https://172.16.0.101:2379/v2/stats/self @@ -202,8 +200,9 @@ $ etcdctl cluster-health Check etcd status and availability of the insecure port on the loopback interface: ```sh -$ systemctl status etcd2 -$ curl http://127.0.0.1:4001/v2/stats/self +$ systemctl status etcd-member.service +$ curl http://127.0.0.1:4001/metrics +$ curl http://127.0.0.1:4001/health ``` Check fleet and flannel: