Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to etcd 3.4 #888

Closed
lazzarello opened this issue Jan 2, 2020 · 12 comments
Closed

Upgrade to etcd 3.4 #888

lazzarello opened this issue Jan 2, 2020 · 12 comments

Comments

@lazzarello
Copy link
Contributor

I'm attempting to upgrade my fork to etcd v3.4.3 and I've hit a strange error coming from the go code in flanneld. Checking here before I move this upstream.

Jan 02 15:23:43 henry microk8s.daemon-flanneld[2277]: timed out
Jan 02 15:23:43 henry microk8s.daemon-flanneld[2277]: E0102 15:23:43.181121    2277 main.go:382] Couldn't fetch network config: client: response is invalid json. The endpoint is probably not valid etcd cluster endpoint.

I'm not sure what the query is in the code but the value at that key is definitely valid JSON, as output by a query with etcdctl

$ /snap/microk8s/current/etcdctl --endpoints="${etcd_endpoints}" --cert="${cert_file}" --key="${key_file}" --cacert="${ca_file}" get "/coreos.com/network/config"
/coreos.com/network/config
{"Network": "10.1.0.0/16", "Backend": {"Type": "vxlan"}}

Help debugging would be awesome.

@balchua
Copy link
Collaborator

balchua commented Jan 3, 2020

@lazzarello i dont think flannel supports ETCD V3 API.

What you need to do on top of what you have done is to set the environment export ETCDCTL_API=2 in run-flanneld-with-args.

Since using the API v2, you need to revert back the commands you changed in this file.

if ! "${SNAP}/etcdctl" --endpoint "${etcd_endpoints}" --cert-file "${cert_file}" --key-file "${key_file}" --ca-file "${ca_file}" rm "/coreos.com/network/config"; then
  echo "/coreos.com/network/config is not in etcd. Probably a first time run."
fi

"${SNAP}/etcdctl" --endpoint "${etcd_endpoints}"  --cert-file "${cert_file}" --key-file "${key_file}" --ca-file "${ca_file}" set "/coreos.com/network/config" "$data"

Example:

$ sudo ETCDCTL_API=2 /snap/microk8s/current/etcdctl --endpoints "https://127.0.0.1:12379" --cert-file "/var/snap/microk8s/x1/certs/server.crt" --key-file "/var/snap/microk8s/x1/certs/server.key" --ca-file "/var/snap/microk8s/x1/certs/ca.crt" get "/coreos.com/network/config"
{"Network": "10.1.0.0/16", "Backend": {"Type": "vxlan"}}

@balchua
Copy link
Collaborator

balchua commented Jan 3, 2020

Can be another option is to use flannel's --kube-subnet-mgr args and --net-config-path=$SNAP_DATA/args/flannel-network-mgr-config, perhaps this way we dont have to set etcd api version to 2.

Checkout flannel's configuration docs. https://github.com/coreos/flannel/blob/master/Documentation/configuration.md

@balchua
Copy link
Collaborator

balchua commented Jan 3, 2020

I spoke too soon, the arg --net-config-path is not yet available in flannel 0.11. the only option for now is to use ECTCDCTL_API set to version 2.

@lazzarello
Copy link
Contributor Author

I discovered from the release notes that etcd 3.4 disables API v2 by default.

I have a new build with that arg set to true. Testing today.

@balchua
Copy link
Collaborator

balchua commented Jan 3, 2020

@lazzarello its not only the enable-v2 you need to set, the data that you push to etcd using etcdctl must also be pushed to v2.
Therefore you need to set the env ETCDCTL_API to 2.

@lazzarello
Copy link
Contributor Author

Just discovered this. Crazy that there are duplicate keys with the same name for different API versions.

@balchua
Copy link
Collaborator

balchua commented Jan 3, 2020

Wanted to relieve flannel from directly using etcd with the net-conf-path. The code is merged but not released. Have asked the flannel team for the nxt release. flannel-io/flannel#1231

@lazzarello
Copy link
Contributor Author

lazzarello commented Jan 4, 2020

looks like this could be helpful but in the mean time, flannel is throwing another error after I did the bootstrapping with ETCDCTL_API=2 in my fork Where else do I have to force this version?

update: these two errors in flanneld don't seem to effect cluster operations. I can create an nginx pod and expose it. Everything works.

Jan 03 16:13:58 henry microk8s.daemon-flanneld[4590]: I0103 16:13:58.039527    4590 main.go:321] Running backend.
Jan 03 16:13:58 henry microk8s.daemon-flanneld[4590]: I0103 16:13:58.039731    4590 vxlan_network.go:60] watching for new subnet leases
Jan 03 16:13:58 henry microk8s.daemon-flanneld[4590]: I0103 16:13:58.040240    4590 main.go:429] Waiting for 22h59m59.997083855s to renew lease
Jan 03 16:13:58 henry microk8s.daemon-flanneld[4590]: I0103 16:13:58.043820    4590 iptables.go:145] Some iptables rules are missing; deleting and recreating rules
Jan 03 16:13:58 henry microk8s.daemon-flanneld[4590]: I0103 16:13:58.043890    4590 iptables.go:167] Deleting iptables rule: -s 10.1.0.0/16 -j ACCEPT
Jan 03 16:13:58 henry microk8s.daemon-flanneld[4590]: I0103 16:13:58.044799    4590 iptables.go:167] Deleting iptables rule: -d 10.1.0.0/16 -j ACCEPT
Jan 03 16:13:58 henry microk8s.daemon-flanneld[4590]: I0103 16:13:58.045598    4590 iptables.go:155] Adding iptables rule: -s 10.1.0.0/16 -j ACCEPT
Jan 03 16:13:58 henry microk8s.daemon-flanneld[4590]: I0103 16:13:58.047352    4590 iptables.go:155] Adding iptables rule: -d 10.1.0.0/16 -j ACCEPT
Jan 03 16:14:06 henry microk8s.daemon-flanneld[4590]: E0103 16:14:06.440880    4590 watch.go:43] Watch subnets: client: etcd cluster is unavailable or misconfigured; error #0: unexpected EOF
Jan 03 16:14:06 henry microk8s.daemon-flanneld[4590]: E0103 16:14:06.440896    4590 watch.go:171] Subnet watch failed: client: etcd cluster is unavailable or misconfigured; error #0: unexpected EOF
Jan 03 16:14:07 henry etcd[5606]: published {Name:default ClientURLs:[https://192.168.0.111:12379]} to cluster cdf818194e3a8c32
Jan 03 16:14:07 henry etcd[5606]: ready to serve client requests
Jan 03 16:14:07 henry etcd[5606]: serving client requests on [::]:12379
Jan 03 16:14:08 henry etcd[5606]: rejected connection from "127.0.0.1:38344" (error "EOF", ServerName "")

@balchua
Copy link
Collaborator

balchua commented Jan 4, 2020

I see these "errors" even before doing the upgrade.

@lazzarello
Copy link
Contributor Author

Good news! I think I have a working build with etcd 3.4. I'll get a PR together from my fork.

@lazzarello
Copy link
Contributor Author

It's up at #894 I added a shell script to do the build via lxc.

@lazzarello
Copy link
Contributor Author

closed with above PR merged into master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants