Kubeadm reset command whether remove etcd member on the master #1211

pytimer · 2018-11-03T16:15:19Z

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST

Versions

kubeadm version (use kubeadm version): kubeadm master branch

Environment:

Kubernetes version (use kubectl version):
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release):

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Kernel (e.g. uname -a):
Linux master1 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Others:

What happened?

Hi, i use kubeadm reset command on the one of masters, it not remove etcd member in the etcd cluster. I use local etcd in the init.

What you expected to happen?

I look at the master branch code, but i'm not find about this. I hope if reset on the master, kubeadm can remove etcd member.

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

The text was updated successfully, but these errors were encountered:

fabriziopandini · 2018-11-05T07:09:53Z

@pytimer could you kindly provide more info about your cluster (how it was created, the kubeadm-gce-master.yaml)

pytimer · 2018-11-27T01:30:03Z

@fabianofranz sorry, some things during this time.

I use 1.13.0-beta.2 on the virtual machine to test this issue. I init and join control plane successfully. But when i run kubeadm reset -f on the control plane, the cluster is not work.

Etcd container on the first init node always restart, and i found logs output this etcd still connect reset node etcd member.

etcd logs:

2018-11-27 09:22:12.828159 W | rafthttp: health check for peer fa6cc2324326d403 could not connect: dial tcp 10.33.46.213:2380: getsockopt: connection refused

reproduce

kubeadm init --config kubeadm.yaml
kubeadm apply -f flannel.yaml
run kubeadm join --experimental-control-plane --config kubeadm.yaml on the other node.
kubectl get nodes

[root@master213 ~]# kubectl get nodes
NAME        STATUS   ROLES    AGE   VERSION
master212   Ready    master   14h   v1.13.0-beta.2
master213   Ready    master   13h   v1.13.0-beta.2

run kubeadm reset -f on the master213

kubeadm init yaml:

apiVersion: kubeadm.k8s.io/v1beta1
kind: InitConfiguration
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
localAPIEndpoint:
  advertiseAddress: 0.0.0.0
  bindPort: 6443
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  name: master212
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: "10.33.46.215"
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    serverCertSANs:
    - "10.33.46.215"
    extraArgs:
      cipher-suites: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
    dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kubernetesVersion: v1.13.0-beta.2
networking:
  dnsDomain: cluster.local
  podSubnet: "10.244.0.0/16"
  serviceSubnet: 10.96.0.0/12

kubeadm join yaml:

apiVersion: kubeadm.k8s.io/v1beta1
kind: JoinConfiguration
caCertPath: /etc/kubernetes/pki/ca.crt
discovery:
  bootstrapToken:
    apiServerEndpoint: 10.33.46.215:6443
    token: 1jvhzl.37osma939vn5q1uh
    unsafeSkipCAVerification: true
  timeout: 5m0s
  tlsBootstrapToken: 1jvhzl.37osma939vn5q1uh
controlPlane:
  localAPIEndpoint:
    advertiseAddress: 0.0.0.0
    bindPort: 6443
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  name: master213

fabriziopandini · 2018-11-27T09:05:33Z

@pytimer
If I got it right, you create two control plane instances with a local etcd, so you get an etcd cluster with two etcd members. Then you reset one of the instances and the remaining etcd gets stuck.

The reason behind that is that the etcd cluster loses quorum

@timothysc opinions about if/how to handle this use case?

pytimer · 2018-11-27T11:13:35Z

Yes, you said right.

So i think if run kubeadm reset on the hosting control plane node, kubeadm should delete member from etcd cluster and then do other things.

neolit123 · 2018-11-29T20:57:56Z

we should try to make the remaining etcd nodes to not get stuck.
otherwise this breaks our HA guaranties.

edit: also having related tests in the future would be great.

pytimer · 2018-12-23T05:46:27Z

I add remove etcd member feature when reset the control plane node, it's works for me.
It is my fork repository commit Remove etcd member when reset the control plane node.

I am not sure if this workflow should join the kubeadm reset?

yagonobre · 2019-02-19T07:27:01Z

/remove-priority awaiting-more-evidence
/priority important-soon
/lifecycle active

fabriziopandini added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. triage/needs-information Indicates an issue needs more information in order to work on it. labels Nov 5, 2018

timothysc modified the milestones: Next, v1.14 Jan 7, 2019

timothysc added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. area/HA labels Jan 7, 2019

fabriziopandini mentioned this issue Jan 14, 2019

Kubeadm reset should cleanup control-plane nodes properly #1312

Closed

fabriziopandini mentioned this issue Jan 30, 2019

Tracking issue for Certificates copy for join --control-plane #1373

Closed

25 tasks

timothysc assigned ereslibre Feb 13, 2019

pytimer mentioned this issue Feb 15, 2019

kubeadm: Remove etcd members from the etcd cluster when reset the nodes kubernetes/kubernetes#74112

Merged

k8s-ci-robot closed this as completed in kubernetes/kubernetes#74112 Feb 22, 2019

This was referenced Mar 1, 2019

kubeadm reset success but this node ip still in kubeadm-config configmap #1300

Closed

kubeadm join is not fault tolerant to etcd endpoint failures #1432

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubeadm reset command whether remove etcd member on the master #1211

Kubeadm reset command whether remove etcd member on the master #1211

pytimer commented Nov 3, 2018

fabriziopandini commented Nov 5, 2018

pytimer commented Nov 27, 2018

fabriziopandini commented Nov 27, 2018

pytimer commented Nov 27, 2018

neolit123 commented Nov 29, 2018 •

edited

Loading

pytimer commented Dec 23, 2018

yagonobre commented Feb 19, 2019

Kubeadm reset command whether remove etcd member on the master #1211

Kubeadm reset command whether remove etcd member on the master #1211

Comments

pytimer commented Nov 3, 2018

Is this a BUG REPORT or FEATURE REQUEST?

Versions

What happened?

What you expected to happen?

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

fabriziopandini commented Nov 5, 2018

pytimer commented Nov 27, 2018

reproduce

fabriziopandini commented Nov 27, 2018

pytimer commented Nov 27, 2018

neolit123 commented Nov 29, 2018 • edited Loading

pytimer commented Dec 23, 2018

yagonobre commented Feb 19, 2019

neolit123 commented Nov 29, 2018 •

edited

Loading