-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
using pxe boot, the etcd always "heartbeat near election timeout" #963
Comments
any suggestion ? |
May you check the machine list? |
check from both machines: core@localhost ~ $ curl http://127.0.0.1:7001/v2/admin/machines I can set and get on leader, only can get on follwer. |
yesterday, I use -name whit IP address, like -name 192.168.1.4. I change the it as -name 4, Now the but when I set some message on follower, still get error. |
and log on follower. core@localhost ~ $ journalctl -fu config-etcd.service log on leader, core@localhost ~ $ journalctl -fu config-etcd.service |
etcdctl uses clientURL to connect to each machine. |
it works correctly, thanks!!! |
I setup PXE server in a subnet, use it to boot coreOS. Since "$public_ipv4" dosen't support in PXE,
so I use could-config to start up etcd. My cloud-config file is as below:
cloud-config
coreos:
units:
- name: etcd.service
command: stop
- name: config-etcd.service
command: start
content: |
[Unit]
Description=Config Etcd
After=etcd.service
ssh_authorized_keys:
in could-config I download a script, use it to get IP address and start etcd, the script is as below.
#!/bin/bash
export publicIP=
ifconfig eno1 | sed -n 2p | awk '{ print $2 }'
systemctl stop etcd
etcd -name $publicIP -peer-addr $publicIP:7001 -addr 127.0.0.1:4001 -discovery http://192.168.1.2:4001/v2/keys/0000040 -peer- election-timeout=5000 -peer-heartbeat-interval=5000 snapshot=true -v
after the coreOS is boot up, the script is downloaded, and it work as expecting. But the etcd always
get timeout. I have two nodes, the follower always get timeout, I mean no matter which is become
follower, and the leader is work fine.
core@localhost ~ $ journalctl -fu config-etcd.service
-- Logs begin at Wed 2014-08-27 08:19:35 UTC. --
Aug 27 08:20:02 localhost startEtcd.sh[597]: [etcd] Aug 27 08:20:02.753 INFO | 192.168.1.5: warning: heartbeat near election timeout: 4.999946418s
Aug 27 08:20:07 localhost startEtcd.sh[597]: [etcd] Aug 27 08:20:07.753 INFO | 192.168.1.5: warning: heartbeat near election timeout: 4.999720814s
Aug 27 08:20:12 localhost startEtcd.sh[597]: [etcd] Aug 27 08:20:12.753 INFO | 192.168.1.5: warning: heartbeat near election timeout: 4.999857421s
Aug 27 08:20:17 localhost startEtcd.sh[597]: [etcd] Aug 27 08:20:17.753 INFO | 192.168.1.5: warning: heartbeat near election timeout: 4.999935035s
Aug 27 08:20:22 localhost startEtcd.sh[597]: [etcd] Aug 27 08:20:22.753 INFO | 192.168.1.5: warning: heartbeat near election timeout: 5.000014421s
Aug 27 08:20:27 localhost startEtcd.sh[597]: [etcd] Aug 27 08:20:27.753 INFO | 192.168.1.5: warning: heartbeat near election timeout: 4.999861909s
Aug 27 08:20:32 localhost startEtcd.sh[597]: [etcd] Aug 27 08:20:32.753 INFO | 192.168.1.5: warning: heartbeat near election timeout: 4.999855009s
Aug 27 08:20:37 localhost startEtcd.sh[597]: [etcd] Aug 27 08:20:37.753 INFO | 192.168.1.5: warning: heartbeat near election timeout: 4.999945439s
Aug 27 08:20:42 localhost startEtcd.sh[597]: [etcd] Aug 27 08:20:42.753 INFO | 192.168.1.5: warning: heartbeat near election timeout: 4.999893319s
Aug 27 08:20:47 localhost startEtcd.sh[597]: [etcd] Aug 27 08:20:47.753 INFO | 192.168.1.5: warning: heartbeat near election timeout: 4.99987582s
Aug 27 08:20:52 localhost startEtcd.sh[597]: [etcd] Aug 27 08:20:52.753 INFO | 192.168.1.5: warning: heartbeat near election timeout: 4.999934057s
if I set something on the follower node, will occur error, see below,
core@localhost ~ $ etcdctl set /message test
Error: 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]
I set something on the leader, everything is working fine, the follower can also get the the message
using etcdctl.
I tried all version list in page https://coreos.com/docs/running-coreos/bare-metal/booting-with-pxe/,
stable, beta, alpha, all have the same issue. But no matter what version I use to boot, when login
it always show CoreOS (beta).
I also tuned the parameters with etcd "election-timeout=5000 -peer-heartbeat-interval=5000", but
the issue still there.
googled, there are some similar issue like #868, #594, #915, I am not sure if these bug is the same
with mine.
I also run 5 coreOS nodes cluser with vagrant and virtualbox, everything works fine. But when longin
it show CoreOS(alpha).
Did I do anything wrong ? And how I make it work ?
The text was updated successfully, but these errors were encountered: