Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update etcdMembersDown Runbook #171

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 12 additions & 6 deletions alerts/cluster-etcd-operator/etcdMembersDown.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,19 @@ Login to the cluster. Check health of master nodes if any of them is in
`NotReady` state or not.

```console
$ oc get nodes -l node-role.kubernetes.io/master=
oc get nodes -l node-role.kubernetes.io/master=
```

Check if an upgrade is in progress.

```console
$ oc adm upgrade
oc adm upgrade
```

You can also check if an upgrade is in progress by viewing resources in the openshift-managed-upgrade-operator namespace.

```console
oc get upgrade -n openshift-managed-upgrade-operator
```

In case there is no upgrade going on, but there is a change in the
Expand All @@ -39,7 +45,7 @@ the master nodes. This is the case when the [machine-config-operator
(MCO)](https://github.com/openshift/machine-config-operator) is working on it.

```console
$ oc get nodes -l node-role.kubernetes.io/master= -o template --template='{{range .items}}{{"===> node:> "}}{{.metadata.name}}{{"\n"}}{{range $k, $v := .metadata.annotations}}{{println $k ":" $v}}{{end}}{{"\n"}}{{end}}'
oc get nodes -l node-role.kubernetes.io/master= -o template --template='{{range .items}}{{"===> node:> "}}{{.metadata.name}}{{"\n"}}{{range $k, $v := .metadata.annotations}}{{println $k ":" $v}}{{end}}{{"\n"}}{{end}}'
```

### General etcd health
Expand All @@ -48,19 +54,19 @@ To run `etcdctl` commands, we need to `rsh` into the `etcdctl` container of any
etcd pod.

```console
$ oc rsh -c etcdctl -n openshift-etcd $(oc get pod -l app=etcd -oname -n openshift-etcd | awk -F"/" 'NR==1{ print $2 }')
oc rsh -c etcdctl -n openshift-etcd $(oc get pod -l app=etcd -oname -n openshift-etcd | awk -F"/" 'NR==1{ print $2 }')
```

Validate that the `etcdctl` command is available:

```console
$ etcdctl version
etcdctl version
```

Run the following command to get the health of etcd:

```console
$ etcdctl endpoint health -w table
etcdctl endpoint health -w table
```

## Mitigation
Expand Down