Split nomad cluster into two clusters 0.9.3 #5917

jozef-slezak · 2019-07-03T17:13:42Z

Rebooting machines with 3 node cluster cause cluster split (once, but many times without any problems).
It would be great if Nomad would have automated CI related to restarts.

Nomad version

0.9.3

Operating system and Environment details

Linux

Issue

Reproduction steps

Running 3 nomad servers and 47 nomad clients
Quick sudo reboot on 3 nomad servers

nomad server members shows one leader and no follower on one server:

nomad server members
Name   Address       Port  Status  Leader  Protocol  Build  Datacenter  Region
name1 ip1 4648  alive   true    2         0.9.3  dc2         global

My concern is that one node can be a leader even without a quorum. I am not sure the discovery/search continues (https://www.nomadproject.io/docs/configuration/consul.html#server_auto_join).

nomad server members shows two followers and error no leader on next two servers:

nomad server members
Name    Address       Port  Status  Leader  Protocol  Build  Datacenter  Region
name2  ip2 4648  alive   false   2         0.9.3  dc2         global
name3  ip3 4648  alive   false   2         0.9.3  dc2         global

After restarting one follower again all three servers joined the cluster.

Could Nomad do some retries on its own? Or should we configure something? Maybe autopilot? How non_voting_servers would help (would they also help/minimize nomad client job restarts?)

Nomad Server logs (if appropriate)

The text was updated successfully, but these errors were encountered:

cgbaker · 2019-07-03T21:11:12Z

If possible, can you post the server config?

jozef-slezak · 2019-07-04T08:59:37Z

Nomad config (nomad.hcl)

addresses = {
  http = "0.0.0.0"
}
advertise = {
  http = "172.16.23.67"
  rpc = "172.16.23.67"
  serf = "172.16.23.67"
}
bind_addr = "172.16.23.67"
client = {
  enabled = true
  network_interface = "bond0"
  options = {
    driver.raw_exec.enable = 1
    driver.raw_exec.no_cgroups = 1
  }
  meta = {
    "abis-manager" = true
  }
}
enable_syslog = true
data_dir = "/var/lib/nomad"
datacenter = "dc2"
disable_update_check = true
log_level = "INFO"
server = {
  bootstrap_expect = 3
  enabled = true
  encrypt = "JrVVHZY9wTMQvp107LpLAA=="
  }

Nomad systemd service file (nomad.service)

[Unit]
Description=HashiCorp Nomad
After=network-online.target
Requires=network-online.target

Wants=consul.service
After=consul.service

[Service]
Type=simple
ExecStart=/usr/sbin/nomad agent -config=/etc/nomad.d
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
LimitNOFILE=65536
LimitNPROC=65536

[Install]
WantedBy=multi-user.target

Consul config (consul.hcl)

bind_addr = "172.16.23.67"
client_addr = "0.0.0.0"
data_dir = "/var/lib/consul"
datacenter = "dc2"
disable_update_check = true
encrypt = "XeK8LHcwhHGf54lk8M4dpw=="
encrypt_verify_incoming = false
encrypt_verify_outgoing = false
log_level = "INFO"
retry_join = [  "172.16.23.51","172.16.23.67","172.16.23.83" ]
rejoin_after_leave = true
ui = true
server = true
bootstrap_expect = 3

Consul systemd file (consul.service)

[Unit]
Description=HashiCorp Consul
After=network-online.target
Requires=network-online.target

[Service]
Type=simple
ExecStart=/usr/sbin/consul agent -config-dir=/etc/consul.d
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
User=consul

[Install]
WantedBy=multi-user.target

jozef-slezak · 2019-07-04T14:55:45Z

we were able to simulate the same behaviour by calling

systemctl restart nomad on all three servers

jozef-slezak · 2019-07-04T19:41:45Z

Does it make sense to workaround this issue and disable server discovery using consul and enumerate server ipaddress (fixed ip addresses on physical infrastructure). I am thinking about:

consul {
  server_auto_join    = false
}
server {
  enabled          = true
  bootstrap_expect = 3
  server_join {
    retry_join = [ "1.1.1.1", "2.2.2.2" ]
    retry_max = 0 # infinite
    retry_interval = "3s"
  }
}

lfarnell · 2019-07-05T13:39:59Z

@jozef-slezak Are you running systemctl restart nomad on all 3 servers at the same time?

jozef-slezak · 2019-07-05T15:36:11Z

Yes, I am running systemctl restart nomad nearly at the same time.

lfarnell · 2019-07-05T17:08:19Z

So I believe the problem you are facing is due to the fact that you are effectively breaking the consensus between the server nodes by restarting all the processes at the same time. If you need to restart server nodes you generally restart them one at a time so that another server has the ability to become the leader and other nodes to continue as a follower and allow the state to be replicated safely for durability. This is why it is advised to run odd number of servers to avoid the scenario that you have described in this issue. Hope this helps.

jozef-slezak · 2019-07-05T17:18:16Z

I understand, restarting all servers at the same time simulates a power outage. I believe that implementation is meant to work properly even in this scenario (bootstrap expect 3 servers).

From my point of we reproduced a bug: one node is a leader without a quorum (see nomad server members at the beginning of the issue). At least two nodes must be there for having a leader.

cgbaker · 2019-07-05T17:23:26Z

@jozef-slezak , thanks for the report. I've tried reproducing this without any success, I will continue to look into it. Furthermore, I'll bring this up with the team.

jozef-slezak · 2019-07-05T19:43:44Z

Best way how to reproduce: automate cluster restarts and repeat untill it breaks.

cgbaker · 2019-07-05T21:05:13Z

Okay, just saw it with the latest build of Nomad (11afd99). We will take a deeper look at this. Thanks for the report!

cgbaker added the stage/waiting-reply label Jul 3, 2019

stale bot removed the stage/waiting-reply label Jul 4, 2019

This was referenced Jul 4, 2019

Dead service after Nomad cluster restart #5919

Closed

Constraint/count is not respected after Nomad cluster restart (previously failed allocs) #5921

Open

jozef-slezak closed this as completed Jul 4, 2019

jozef-slezak reopened this Jul 4, 2019

cgbaker self-assigned this Jul 5, 2019

cgbaker added stage/needs-investigation theme/raft labels Jul 5, 2019

cgbaker removed their assignment Jul 5, 2019

cgbaker added type/bug and removed stage/needs-investigation labels Jul 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split nomad cluster into two clusters 0.9.3 #5917

Split nomad cluster into two clusters 0.9.3 #5917

jozef-slezak commented Jul 3, 2019 •

edited

Loading

cgbaker commented Jul 3, 2019

jozef-slezak commented Jul 4, 2019 •

edited

Loading

jozef-slezak commented Jul 4, 2019

jozef-slezak commented Jul 4, 2019 •

edited

Loading

lfarnell commented Jul 5, 2019

jozef-slezak commented Jul 5, 2019

lfarnell commented Jul 5, 2019

jozef-slezak commented Jul 5, 2019

cgbaker commented Jul 5, 2019

jozef-slezak commented Jul 5, 2019

cgbaker commented Jul 5, 2019 •

edited

Loading

Split nomad cluster into two clusters 0.9.3 #5917

Split nomad cluster into two clusters 0.9.3 #5917

Comments

jozef-slezak commented Jul 3, 2019 • edited Loading

Nomad version

Operating system and Environment details

Issue

Reproduction steps

Nomad Server logs (if appropriate)

cgbaker commented Jul 3, 2019

jozef-slezak commented Jul 4, 2019 • edited Loading

jozef-slezak commented Jul 4, 2019

jozef-slezak commented Jul 4, 2019 • edited Loading

lfarnell commented Jul 5, 2019

jozef-slezak commented Jul 5, 2019

lfarnell commented Jul 5, 2019

jozef-slezak commented Jul 5, 2019

cgbaker commented Jul 5, 2019

jozef-slezak commented Jul 5, 2019

cgbaker commented Jul 5, 2019 • edited Loading

jozef-slezak commented Jul 3, 2019 •

edited

Loading

jozef-slezak commented Jul 4, 2019 •

edited

Loading

jozef-slezak commented Jul 4, 2019 •

edited

Loading

cgbaker commented Jul 5, 2019 •

edited

Loading