Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd unit does not use systemd notify, documentation doesn't provide example of using systemd notify #16844

Closed
drawks opened this issue Mar 31, 2023 · 12 comments

Comments

@drawks
Copy link
Contributor

drawks commented Mar 31, 2023

Overview of the Issue

While briefly noted in the online documentation It seems like at some point the deployment guide example systemd service unit dropped Type=notify in spite of it having been present at one point. Likewise the default systemd service unit as shipped in linux packages does not specify Type=notify. The end result is that what is arguably the most appropriate model of running consul under the most common init system on the most common operating system is not the default nor is it advertised as being the standard/best practice.

The documentation examples and packaged systemd unit should be updated to utilize Type=notify.


Reproduction Steps

  1. Install rpm or deb packaged consul from hashicorp package repository on any supported linux operating system.
  2. Observe that systemctl start consul returns before the consul agent has joined the cluster and synced state
  3. Further observe that other systemd unit which depend on consul sometimes fail to connect to the consul agent as it is not yet listening on its API port or hasn't yet synced state sufficiently to allow acls/tokens to work.
  4. Modify the systemd service unit consul.service to include the following line it its [Service] section
Type=notify
  1. Reload the modified service unit via sudo systemctl daemon-reload
  2. Restart consul via sudo systemctl restart consul
  3. Observe that consul start blocks until it is actually ready and once it is ready other units which depend on it are reliably able to start use consul. Conversely observe that if consul fails to join and sync with the cluster that dependent services are not started.

Operating system and Environment details

Any major linux distro using rpm or deb packages and any version of consul released since June 2017

drawks added a commit to drawks/consul that referenced this issue Mar 31, 2023
* updates `consul.service` systemd service unit to use `Type=notify` to
  resolve issue hashicorp#16844
@drawks drawks changed the title In spite of supporting systemd notify for 6~ years the packaged systemd unit does not use the feature systemd unit does not use systemd notify, documentation doesn't provide example of using systemd notify Mar 31, 2023
@drawks
Copy link
Contributor Author

drawks commented Jun 1, 2023

This is still an issue and causes startup ordering problems on systemd based distros that use the official rpm/deb packages. I've got an on PR to solve this, but no reviewer has commented, labelled or otherwise interacted with the PR.

drawks added a commit to drawks/consul that referenced this issue Jun 2, 2023
* updates `consul.service` systemd service unit to use `Type=notify` to
  resolve issue hashicorp#16844
* add changelog update to match
drawks added a commit to drawks/consul that referenced this issue Jun 2, 2023
* updates `consul.service` systemd service unit to use `Type=notify` to
  resolve issue hashicorp#16844
* add changelog update to match
loshz pushed a commit that referenced this issue Jun 2, 2023
* updates `consul.service` systemd service unit to use `Type=notify` to
  resolve issue #16844
* add changelog update to match
@loshz
Copy link
Contributor

loshz commented Jun 2, 2023

Fixed with #16845

@JSurf
Copy link

JSurf commented Jul 4, 2023

This change is causing all kinds of startup issues with single node instances and nodes that have not (yet) joined a cluster.
Not sure it was the right choice to make this behavior the default...at least backporting it also to all previous versions was a terrible choice...

We tried to re-initialize a 3-Node cluster from scratch, but did not manage to do so because of systemd timeout failures.

Some related items:
#4380
#7137
https://support.hashicorp.com/hc/en-us/articles/4417724350867-Using-Type-notify-or-Type-simple-for-consul-service

We think consul should either implement #4380, revert this change or document a reliable way to startup and join new nodes and bootstrap single node clusters. We did not find a reliable way to start a new node with systemd and join it to a cluster without getting timeouts from systemd.

For everyone searching for a quick solution to fix their broken consul instances, we used a systemd override to change the startup type from "notify" to "simple":
Create an systemd overide for consul:
Run: systemctl edit consul

In the text editor write the following content:

[Service]
Type=simple

Save the change, then restart the consul service
systemctl restart consul

and consul will finally start normally

@loshz
Copy link
Contributor

loshz commented Jul 4, 2023

Hi @JSurf - thanks for bringing this to our attention.

Could you provide more details on your environment, specifically OS and systemd version as well as the config you're using to start the cluster (if possible)?

I can try and replicate and debug in a simulated environment.

@JSurf
Copy link

JSurf commented Jul 5, 2023

We are on RedHat 8.8
The systemd rpm package version is currently: systemd-239-74.el8_8.2.x86_64
We also tried with one of our previous images based on RHEL8.7 which shows same bahaviour.

With the old default with systemd "Type=simple" we could just start the node and then use a "consul join " to join the node

The simplest way to get the systemd startup to hang with the new Type=notify, is to just install the consul rpm and then run
systemctl start consul

The command will hang for a minute and then display:
Job for consul.service failed because the control process exited with error code.
See "systemctl status consul.service" and "journalctl -xe" for details.

But systemd never completes the startup process and stays in "activating" state, restarting/timing out the process every minute:

● consul.service - "HashiCorp Consul - A service mesh solution"
   Loaded: loaded (/usr/lib/systemd/system/consul.service; disabled; vendor pre>
   Active: activating (start) since Wed 2023-07-05 05:08:52 EDT; 28s ago
     Docs: https://www.consul.io/
 Main PID: 114590 (consul)
    Tasks: 9 (limit: 101708)
   Memory: 26.9M
   CGroup: /system.slice/consul.service
           └─114590 /usr/bin/consul agent -config-dir=/etc/consul.d/

Consul itself seems to startup just fine but gets killed and restarted by systemd every 1 minute:

Jul  5 05:14:42 someserver systemd[1]: Starting "HashiCorp Consul - A service mesh solution"...
Jul  5 05:14:42 someserver consul[115132]: ==> Starting Consul agent...
Jul  5 05:14:42 someserver consul[115132]:               Version: '1.16.0'
Jul  5 05:14:42 someserver consul[115132]:            Build Date: '2023-06-26 20:07:11 +0000 UTC'
Jul  5 05:14:42 someserver consul[115132]:               Node ID: '0f16565d-d897-2e7d-5975-ba4be6770d16'
Jul  5 05:14:42 someserver consul[115132]:             Node name: 'someserver'
Jul  5 05:14:42 someserver consul[115132]:            Datacenter: 'dc1' (Segment: '')
Jul  5 05:14:42 someserver consul[115132]:                Server: false (Bootstrap: false)
Jul  5 05:14:42 someserver consul[115132]:           Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: -1, gRPC-TLS: -1, DNS: 8600)
Jul  5 05:14:42 someserver consul[115132]:          Cluster Addr: 10.1.1.71 (LAN: 8301, WAN: 8302)
Jul  5 05:14:42 someserver consul[115132]:     Gossip Encryption: false
Jul  5 05:14:42 someserver consul[115132]:      Auto-Encrypt-TLS: false
Jul  5 05:14:42 someserver consul[115132]:           ACL Enabled: false
Jul  5 05:14:42 someserver consul[115132]:    ACL Default Policy: allow
Jul  5 05:14:42 someserver consul[115132]:             HTTPS TLS: Verify Incoming: false, Verify Outgoing: false, Min Version: TLSv1_2
Jul  5 05:14:42 someserver consul[115132]:              gRPC TLS: Verify Incoming: false, Min Version: TLSv1_2
Jul  5 05:14:42 someserver consul[115132]:      Internal RPC TLS: Verify Incoming: false, Verify Outgoing: false (Verify Hostname: false), Min Version: TLSv1_2
Jul  5 05:14:42 someserver consul[115132]: ==> Log data will now stream in as it occurs:
Jul  5 05:14:42 someserver consul[115132]: 2023-07-05T05:14:42.340-0400 [WARN]  agent: skipping file /etc/consul.d/consul.env, extension must be .hcl or .json, or config format must be set
Jul  5 05:14:42 someserver consul[115132]: 2023-07-05T05:14:42.350-0400 [WARN]  agent.auto_config: skipping file /etc/consul.d/consul.env, extension must be .hcl or .json, or config format must be set
Jul  5 05:14:42 someserver consul[115132]: 2023-07-05T05:14:42.350-0400 [INFO]  agent.client.serf.lan: serf: EventMemberJoin: someserver 10.1.1.71
Jul  5 05:14:42 someserver consul[115132]: 2023-07-05T05:14:42.351-0400 [INFO]  agent.router: Initializing LAN area manager
Jul  5 05:14:42 someserver consul[115132]: 2023-07-05T05:14:42.351-0400 [WARN]  agent.client.serf.lan: serf: Failed to re-join any previously known node
Jul  5 05:14:42 someserver consul[115132]: 2023-07-05T05:14:42.351-0400 [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=udp
Jul  5 05:14:42 someserver consul[115132]: 2023-07-05T05:14:42.351-0400 [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=tcp
Jul  5 05:14:42 someserver consul[115132]: 2023-07-05T05:14:42.351-0400 [INFO]  agent: Starting server: address=127.0.0.1:8500 network=tcp protocol=http
Jul  5 05:14:42 someserver consul[115132]: 2023-07-05T05:14:42.351-0400 [INFO]  agent: started state syncer
Jul  5 05:14:42 someserver consul[115132]: 2023-07-05T05:14:42.351-0400 [INFO]  agent: Consul agent running!
Jul  5 05:14:42 someserver consul[115132]: 2023-07-05T05:14:42.351-0400 [WARN]  agent.router.manager: No servers available
Jul  5 05:14:42 someserver consul[115132]: 2023-07-05T05:14:42.351-0400 [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
Jul  5 05:14:57 someserver consul[115132]: 2023-07-05T05:14:57.791-0400 [WARN]  agent.router.manager: No servers available
Jul  5 05:14:57 someserver consul[115132]: 2023-07-05T05:14:57.791-0400 [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
Jul  5 05:15:13 someserver consul[115132]: 2023-07-05T05:15:13.890-0400 [WARN]  agent.router.manager: No servers available
Jul  5 05:15:13 someserver consul[115132]: 2023-07-05T05:15:13.890-0400 [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
Jul  5 05:15:38 someserver consul[115132]: 2023-07-05T05:15:38.096-0400 [WARN]  agent.router.manager: No servers available
Jul  5 05:15:38 someserver consul[115132]: 2023-07-05T05:15:38.096-0400 [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
Jul  5 05:15:55 someserver consul[115132]: 2023-07-05T05:15:55.579-0400 [WARN]  agent.router.manager: No servers available
Jul  5 05:15:55 someserver consul[115132]: 2023-07-05T05:15:55.579-0400 [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
Jul  5 05:16:12 someserver systemd[1]: consul.service: start operation timed out. Terminating.
Jul  5 05:16:12 someserver consul[115132]: 2023-07-05T05:16:12.344-0400 [INFO]  agent: Caught: signal=terminated
Jul  5 05:16:12 someserver consul[115132]: 2023-07-05T05:16:12.344-0400 [INFO]  agent: Gracefully shutting down agent...
Jul  5 05:16:12 someserver consul[115132]: 2023-07-05T05:16:12.344-0400 [INFO]  agent.client: client starting leave
Jul  5 05:16:12 someserver consul[115132]: 2023-07-05T05:16:12.344-0400 [INFO]  agent.client.serf.lan: serf: EventMemberLeave: someserver 10.1.1.71
Jul  5 05:16:15 someserver consul[115132]: 2023-07-05T05:16:15.345-0400 [INFO]  agent: Graceful exit completed
Jul  5 05:16:15 someserver consul[115132]: 2023-07-05T05:16:15.345-0400 [INFO]  agent: Requesting shutdown
Jul  5 05:16:15 someserver consul[115132]: 2023-07-05T05:16:15.346-0400 [INFO]  agent.client: shutting down client
Jul  5 05:16:15 someserver consul[115132]: 2023-07-05T05:16:15.346-0400 [INFO]  agent: consul client down
Jul  5 05:16:15 someserver consul[115132]: 2023-07-05T05:16:15.346-0400 [INFO]  agent: shutdown complete
Jul  5 05:16:15 someserver consul[115132]: 2023-07-05T05:16:15.346-0400 [INFO]  agent: Stopping server: protocol=DNS address=127.0.0.1:8600 network=tcp
Jul  5 05:16:15 someserver consul[115132]: 2023-07-05T05:16:15.346-0400 [INFO]  agent: Stopping server: protocol=DNS address=127.0.0.1:8600 network=udp
Jul  5 05:16:15 someserver consul[115132]: 2023-07-05T05:16:15.347-0400 [INFO]  agent: Stopping server: address=127.0.0.1:8500 network=tcp protocol=http
Jul  5 05:16:15 someserver consul[115132]: 2023-07-05T05:16:15.347-0400 [INFO]  agent: Waiting for endpoints to shut down
Jul  5 05:16:15 someserver consul[115132]: 2023-07-05T05:16:15.347-0400 [INFO]  agent: Endpoints down
Jul  5 05:16:15 someserver consul[115132]: 2023-07-05T05:16:15.347-0400 [INFO]  agent: Exit code: code=0
Jul  5 05:16:15 someserver systemd[1]: consul.service: Failed with result 'timeout'.
Jul  5 05:16:15 someserver systemd[1]: Failed to start "HashiCorp Consul - A service mesh solution".
Jul  5 05:16:15 someserver systemd[1]: consul.service: Service RestartSec=100ms expired, scheduling restart.
Jul  5 05:16:15 someserver systemd[1]: consul.service: Scheduled restart job, restart counter is at 1.
Jul  5 05:16:15 someserver systemd[1]: Stopped "HashiCorp Consul - A service mesh solution".
Jul  5 05:16:15 someserver systemd[1]: Starting "HashiCorp Consul - A service mesh solution"...
Jul  5 05:16:15 someserver consul[115285]: ==> Starting Consul agent...
Jul  5 05:16:15 someserver consul[115285]:               Version: '1.16.0'
Jul  5 05:16:15 someserver consul[115285]:            Build Date: '2023-06-26 20:07:11 +0000 UTC'
Jul  5 05:16:15 someserver consul[115285]:               Node ID: '0f16565d-d897-2e7d-5975-ba4be6770d16'
Jul  5 05:16:15 someserver consul[115285]:             Node name: 'someserver'
Jul  5 05:16:15 someserver consul[115285]:            Datacenter: 'dc1' (Segment: '')
Jul  5 05:16:15 someserver consul[115285]:                Server: false (Bootstrap: false)
Jul  5 05:16:15 someserver consul[115285]:           Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: -1, gRPC-TLS: -1, DNS: 8600)
Jul  5 05:16:15 someserver consul[115285]:          Cluster Addr: 10.1.1.71 (LAN: 8301, WAN: 8302)
Jul  5 05:16:15 someserver consul[115285]:     Gossip Encryption: false
Jul  5 05:16:15 someserver consul[115285]:      Auto-Encrypt-TLS: false
Jul  5 05:16:15 someserver consul[115285]:           ACL Enabled: false
Jul  5 05:16:15 someserver consul[115285]:    ACL Default Policy: allow
Jul  5 05:16:15 someserver consul[115285]:             HTTPS TLS: Verify Incoming: false, Verify Outgoing: false, Min Version: TLSv1_2
Jul  5 05:16:15 someserver consul[115285]:              gRPC TLS: Verify Incoming: false, Min Version: TLSv1_2
Jul  5 05:16:15 someserver consul[115285]:      Internal RPC TLS: Verify Incoming: false, Verify Outgoing: false (Verify Hostname: false), Min Version: TLSv1_2
Jul  5 05:16:15 someserver consul[115285]: ==> Log data will now stream in as it occurs:
Jul  5 05:16:15 someserver consul[115285]: 2023-07-05T05:16:15.697-0400 [WARN]  agent: skipping file /etc/consul.d/consul.env, extension must be .hcl or .json, or config format must be set
Jul  5 05:16:15 someserver consul[115285]: 2023-07-05T05:16:15.703-0400 [WARN]  agent.auto_config: skipping file /etc/consul.d/consul.env, extension must be .hcl or .json, or config format must be set
Jul  5 05:16:15 someserver consul[115285]: 2023-07-05T05:16:15.704-0400 [INFO]  agent.client.serf.lan: serf: EventMemberJoin: someserver 10.1.1.71
Jul  5 05:16:15 someserver consul[115285]: 2023-07-05T05:16:15.704-0400 [INFO]  agent.router: Initializing LAN area manager
Jul  5 05:16:15 someserver consul[115285]: 2023-07-05T05:16:15.704-0400 [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=tcp
Jul  5 05:16:15 someserver consul[115285]: 2023-07-05T05:16:15.704-0400 [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=udp
Jul  5 05:16:15 someserver consul[115285]: 2023-07-05T05:16:15.704-0400 [INFO]  agent: Starting server: address=127.0.0.1:8500 network=tcp protocol=http
Jul  5 05:16:15 someserver consul[115285]: 2023-07-05T05:16:15.704-0400 [INFO]  agent: started state syncer
Jul  5 05:16:15 someserver consul[115285]: 2023-07-05T05:16:15.704-0400 [INFO]  agent: Consul agent running!
Jul  5 05:16:15 someserver consul[115285]: 2023-07-05T05:16:15.704-0400 [WARN]  agent.router.manager: No servers available
Jul  5 05:16:15 someserver consul[115285]: 2023-07-05T05:16:15.704-0400 [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
Jul  5 05:16:32 someserver consul[115285]: 2023-07-05T05:16:32.570-0400 [WARN]  agent.router.manager: No servers available
Jul  5 05:16:32 someserver consul[115285]: 2023-07-05T05:16:32.570-0400 [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
Jul  5 05:17:01 someserver consul[115285]: 2023-07-05T05:17:01.714-0400 [WARN]  agent.router.manager: No servers available
Jul  5 05:17:01 someserver consul[115285]: 2023-07-05T05:17:01.714-0400 [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
Jul  5 05:17:23 someserver consul[115285]: 2023-07-05T05:17:23.201-0400 [WARN]  agent.router.manager: No servers available
Jul  5 05:17:23 someserver consul[115285]: 2023-07-05T05:17:23.201-0400 [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
Jul  5 05:17:45 someserver systemd[1]: consul.service: start operation timed out. Terminating.
Jul  5 05:17:45 someserver consul[115285]: 2023-07-05T05:17:45.844-0400 [INFO]  agent: Caught: signal=terminated
Jul  5 05:17:45 someserver consul[115285]: 2023-07-05T05:17:45.844-0400 [INFO]  agent: Gracefully shutting down agent...

Obviously it warns about missing server in this scenario, but "consul join" is supposed to be called after successfully starting the service via systemd...

We also tried to configure a single node server adding a file /etc/consul.d/single-node.hcl

server=true
bootstrap_expect=1

Which we think should be enough to get a single node system running but this hangs with systemd also
This shows the same behavior than a simple agent, with slightly different output

Jul  5 05:19:42 someserver systemd[1]: Starting "HashiCorp Consul - A service mesh solution"...
Jul  5 05:19:42 someserver consul[115599]: ==> Starting Consul agent...
Jul  5 05:19:42 someserver consul[115599]:               Version: '1.16.0'
Jul  5 05:19:42 someserver consul[115599]:            Build Date: '2023-06-26 20:07:11 +0000 UTC'
Jul  5 05:19:42 someserver consul[115599]:               Node ID: '0f16565d-d897-2e7d-5975-ba4be6770d16'
Jul  5 05:19:42 someserver consul[115599]:             Node name: 'someserver'
Jul  5 05:19:42 someserver consul[115599]:            Datacenter: 'dc1' (Segment: '<all>')
Jul  5 05:19:42 someserver consul[115599]:                Server: true (Bootstrap: true)
Jul  5 05:19:42 someserver consul[115599]:           Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: -1, gRPC-TLS: 8503, DNS: 8600)
Jul  5 05:19:42 someserver consul[115599]:          Cluster Addr: 10.1.1.71 (LAN: 8301, WAN: 8302)
Jul  5 05:19:42 someserver consul[115599]:     Gossip Encryption: false
Jul  5 05:19:42 someserver consul[115599]:      Auto-Encrypt-TLS: false
Jul  5 05:19:42 someserver consul[115599]:           ACL Enabled: false
Jul  5 05:19:42 someserver consul[115599]:     Reporting Enabled: false
Jul  5 05:19:42 someserver consul[115599]:    ACL Default Policy: allow
Jul  5 05:19:42 someserver consul[115599]:             HTTPS TLS: Verify Incoming: false, Verify Outgoing: false, Min Version: TLSv1_2
Jul  5 05:19:42 someserver consul[115599]:              gRPC TLS: Verify Incoming: false, Min Version: TLSv1_2
Jul  5 05:19:42 someserver consul[115599]:      Internal RPC TLS: Verify Incoming: false, Verify Outgoing: false (Verify Hostname: false), Min Version: TLSv1_2
Jul  5 05:19:42 someserver consul[115599]: ==> Log data will now stream in as it occurs:
Jul  5 05:19:42 someserver consul[115599]: 2023-07-05T05:19:42.957-0400 [WARN]  agent: skipping file /etc/consul.d/consul.env, extension must be .hcl or .json, or config format must be set
Jul  5 05:19:42 someserver consul[115599]: 2023-07-05T05:19:42.957-0400 [WARN]  agent: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
Jul  5 05:19:42 someserver consul[115599]: 2023-07-05T05:19:42.957-0400 [WARN]  agent: bootstrap = true: do not enable unless necessary
Jul  5 05:19:42 someserver consul[115599]: 2023-07-05T05:19:42.970-0400 [WARN]  agent.auto_config: skipping file /etc/consul.d/consul.env, extension must be .hcl or .json, or config format must be set
Jul  5 05:19:42 someserver consul[115599]: 2023-07-05T05:19:42.970-0400 [WARN]  agent.auto_config: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
Jul  5 05:19:42 someserver consul[115599]: 2023-07-05T05:19:42.970-0400 [WARN]  agent.auto_config: bootstrap = true: do not enable unless necessary
Jul  5 05:19:42 someserver consul[115599]: 2023-07-05T05:19:42.974-0400 [INFO]  agent.server.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:0f16565d-d897-2e7d-5975-ba4be6770d16 Address:10.1.1.71:8300}]"
Jul  5 05:19:42 someserver consul[115599]: 2023-07-05T05:19:42.974-0400 [INFO]  agent.server.raft: entering follower state: follower="Node at 10.1.1.71:8300 [Follower]" leader-address= leader-id=
Jul  5 05:19:42 someserver consul[115599]: 2023-07-05T05:19:42.974-0400 [INFO]  agent.server.serf.wan: serf: EventMemberJoin: someserver.dc1 10.1.1.71
Jul  5 05:19:42 someserver consul[115599]: 2023-07-05T05:19:42.974-0400 [WARN]  agent.server.serf.wan: serf: Failed to re-join any previously known node
Jul  5 05:19:42 someserver consul[115599]: 2023-07-05T05:19:42.975-0400 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: someserver 10.1.1.71
Jul  5 05:19:42 someserver consul[115599]: 2023-07-05T05:19:42.975-0400 [INFO]  agent.router: Initializing LAN area manager
Jul  5 05:19:42 someserver consul[115599]: 2023-07-05T05:19:42.975-0400 [INFO]  agent.server: Handled event for server in area: event=member-join server=someserver.dc1 area=wan
Jul  5 05:19:42 someserver consul[115599]: 2023-07-05T05:19:42.975-0400 [INFO]  agent.server: Adding LAN server: server="someserver (Addr: tcp/10.1.1.71:8300) (DC: dc1)"
Jul  5 05:19:42 someserver consul[115599]: 2023-07-05T05:19:42.975-0400 [INFO]  agent.server.autopilot: reconciliation now disabled
Jul  5 05:19:42 someserver consul[115599]: 2023-07-05T05:19:42.976-0400 [INFO]  agent.server.cert-manager: initialized server certificate management
Jul  5 05:19:42 someserver consul[115599]: 2023-07-05T05:19:42.976-0400 [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=udp
Jul  5 05:19:42 someserver consul[115599]: 2023-07-05T05:19:42.976-0400 [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=tcp
Jul  5 05:19:42 someserver consul[115599]: 2023-07-05T05:19:42.976-0400 [INFO]  agent: Starting server: address=127.0.0.1:8500 network=tcp protocol=http
Jul  5 05:19:42 someserver consul[115599]: 2023-07-05T05:19:42.976-0400 [INFO]  agent: Started gRPC listeners: port_name=grpc_tls address=127.0.0.1:8503 network=tcp
Jul  5 05:19:42 someserver consul[115599]: 2023-07-05T05:19:42.977-0400 [INFO]  agent: started state syncer
Jul  5 05:19:42 someserver consul[115599]: 2023-07-05T05:19:42.977-0400 [INFO]  agent: Consul agent running!
Jul  5 05:19:49 someserver consul[115599]: 2023-07-05T05:19:49.097-0400 [WARN]  agent.server.raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id=
Jul  5 05:19:49 someserver consul[115599]: 2023-07-05T05:19:49.097-0400 [INFO]  agent.server.raft: entering candidate state: node="Node at 10.1.1.71:8300 [Candidate]" term=10
Jul  5 05:19:49 someserver consul[115599]: 2023-07-05T05:19:49.097-0400 [INFO]  agent.server.raft: election won: term=10 tally=1
Jul  5 05:19:49 someserver consul[115599]: 2023-07-05T05:19:49.097-0400 [INFO]  agent.server.raft: entering leader state: leader="Node at 10.1.1.71:8300 [Leader]"
Jul  5 05:19:49 someserver consul[115599]: 2023-07-05T05:19:49.098-0400 [INFO]  agent.server: cluster leadership acquired
Jul  5 05:19:49 someserver consul[115599]: 2023-07-05T05:19:49.098-0400 [INFO]  agent.server: New leader elected: payload=someserver
Jul  5 05:19:49 someserver consul[115599]: 2023-07-05T05:19:49.101-0400 [INFO]  agent.server.autopilot: reconciliation now enabled
Jul  5 05:19:49 someserver consul[115599]: 2023-07-05T05:19:49.101-0400 [INFO]  agent.leader: started routine: routine="federation state anti-entropy"
Jul  5 05:19:49 someserver consul[115599]: 2023-07-05T05:19:49.101-0400 [INFO]  agent.leader: started routine: routine="federation state pruning"
Jul  5 05:19:49 someserver consul[115599]: 2023-07-05T05:19:49.101-0400 [INFO]  agent.leader: started routine: routine="streaming peering resources"
Jul  5 05:19:49 someserver consul[115599]: 2023-07-05T05:19:49.101-0400 [INFO]  agent.leader: started routine: routine="metrics for streaming peering resources"
Jul  5 05:19:49 someserver consul[115599]: 2023-07-05T05:19:49.101-0400 [INFO]  agent.leader: started routine: routine="peering deferred deletion"
Jul  5 05:19:49 someserver consul[115599]: 2023-07-05T05:19:49.101-0400 [INFO]  connect.ca: initialized primary datacenter CA from existing CARoot with provider: provider=consul
Jul  5 05:19:49 someserver consul[115599]: 2023-07-05T05:19:49.101-0400 [INFO]  agent.leader: started routine: routine="intermediate cert renew watch"
Jul  5 05:19:49 someserver consul[115599]: 2023-07-05T05:19:49.101-0400 [INFO]  agent.leader: started routine: routine="CA root pruning"
Jul  5 05:19:49 someserver consul[115599]: 2023-07-05T05:19:49.101-0400 [INFO]  agent.leader: started routine: routine="CA root expiration metric"
Jul  5 05:19:49 someserver consul[115599]: 2023-07-05T05:19:49.101-0400 [INFO]  agent.leader: started routine: routine="CA signing expiration metric"
Jul  5 05:19:49 someserver consul[115599]: 2023-07-05T05:19:49.101-0400 [INFO]  agent.leader: started routine: routine="virtual IP version check"
Jul  5 05:19:49 someserver consul[115599]: 2023-07-05T05:19:49.101-0400 [INFO]  agent.leader: started routine: routine="config entry controllers"
Jul  5 05:19:49 someserver consul[115599]: 2023-07-05T05:19:49.101-0400 [INFO]  agent.leader: stopping routine: routine="virtual IP version check"
Jul  5 05:19:49 someserver consul[115599]: 2023-07-05T05:19:49.101-0400 [INFO]  agent.leader: stopped routine: routine="virtual IP version check"
Jul  5 05:19:51 someserver consul[115599]: 2023-07-05T05:19:51.077-0400 [INFO]  agent: Synced node info
Jul  5 05:21:13 someserver systemd[1]: consul.service: start operation timed out. Terminating.
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.094-0400 [INFO]  agent: Caught: signal=terminated
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.094-0400 [INFO]  agent: Graceful shutdown disabled. Exiting
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.094-0400 [INFO]  agent: Requesting shutdown
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.094-0400 [INFO]  agent.server: shutting down server
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.094-0400 [INFO]  agent.leader: stopping routine: routine="CA root pruning"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.094-0400 [INFO]  agent.leader: stopping routine: routine="CA root expiration metric"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.094-0400 [INFO]  agent.leader: stopping routine: routine="CA signing expiration metric"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.094-0400 [INFO]  agent.leader: stopping routine: routine="config entry controllers"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.094-0400 [INFO]  agent.leader: stopping routine: routine="streaming peering resources"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.094-0400 [INFO]  agent.leader: stopping routine: routine="metrics for streaming peering resources"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.094-0400 [INFO]  agent.leader: stopping routine: routine="intermediate cert renew watch"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.094-0400 [INFO]  agent.leader: stopping routine: routine="federation state anti-entropy"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.094-0400 [INFO]  agent.leader: stopping routine: routine="federation state pruning"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.094-0400 [INFO]  agent.leader: stopping routine: routine="peering deferred deletion"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.094-0400 [WARN]  agent.server.serf.lan: serf: Shutdown without a Leave
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.094-0400 [WARN]  agent.controller-runtime: error received from watch: managed_type=internal.v1.tombstone error="rpc error: code = Unavailable desc = error reading from server: io: read/write on closed pipe"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent.leader: stopping routine: routine="peering deferred deletion"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent.leader: stopping routine: routine="federation state anti-entropy"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent.leader: stopping routine: routine="federation state pruning"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent.leader: stopping routine: routine="streaming peering resources"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent.leader: stopping routine: routine="metrics for streaming peering resources"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent.leader: stopping routine: routine="intermediate cert renew watch"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent.leader: stopping routine: routine="CA root pruning"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent.leader: stopped routine: routine="CA signing expiration metric"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent.leader: stopped routine: routine="CA root expiration metric"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent.leader: stopping routine: routine="CA root expiration metric"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent.leader: stopping routine: routine="config entry controllers"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent.server.autopilot: reconciliation now disabled
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent.leader: stopped routine: routine="intermediate cert renew watch"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent.leader: stopped routine: routine="config entry controllers"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent.leader: stopped routine: routine="streaming peering resources"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent.server.peering_metrics: stopping peering metrics
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent.leader: stopped routine: routine="metrics for streaming peering resources"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent.leader: stopped routine: routine="CA root pruning"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent.leader: stopped routine: routine="peering deferred deletion"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent.leader: stopped routine: routine="federation state pruning"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [ERROR] agent.server: error performing anti-entropy sync of federation state: error="context canceled"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent.leader: stopped routine: routine="federation state anti-entropy"
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [WARN]  agent.server.serf.wan: serf: Shutdown without a Leave
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent.router.manager: shutting down
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent: consul server down
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent: shutdown complete
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent: Stopping server: protocol=DNS address=127.0.0.1:8600 network=tcp
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent: Stopping server: protocol=DNS address=127.0.0.1:8600 network=udp
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent: Stopping server: address=127.0.0.1:8500 network=tcp protocol=http
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent: Waiting for endpoints to shut down
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent: Endpoints down
Jul  5 05:21:13 someserver consul[115599]: 2023-07-05T05:21:13.095-0400 [INFO]  agent: Exit code: code=1
Jul  5 05:21:13 someserver systemd[1]: consul.service: Main process exited, code=exited, status=1/FAILURE
Jul  5 05:21:13 someserver systemd[1]: consul.service: Failed with result 'timeout'.
Jul  5 05:21:13 someserver systemd[1]: Failed to start "HashiCorp Consul - A service mesh solution".
Jul  5 05:21:13 someserver systemd[1]: consul.service: Service RestartSec=100ms expired, scheduling restart.
Jul  5 05:21:13 someserver systemd[1]: consul.service: Scheduled restart job, restart counter is at 1.
Jul  5 05:21:13 someserver systemd[1]: Stopped "HashiCorp Consul - A service mesh solution".
Jul  5 05:21:13 someserver systemd[1]: Starting "HashiCorp Consul - A service mesh solution"...
Jul  5 05:21:13 someserver consul[115752]: ==> Starting Consul agent...
Jul  5 05:21:13 someserver consul[115752]:               Version: '1.16.0'
Jul  5 05:21:13 someserver consul[115752]:            Build Date: '2023-06-26 20:07:11 +0000 UTC'
Jul  5 05:21:13 someserver consul[115752]:               Node ID: '0f16565d-d897-2e7d-5975-ba4be6770d16'
Jul  5 05:21:13 someserver consul[115752]:             Node name: 'someserver'
Jul  5 05:21:13 someserver consul[115752]:            Datacenter: 'dc1' (Segment: '<all>')
Jul  5 05:21:13 someserver consul[115752]:                Server: true (Bootstrap: true)
Jul  5 05:21:13 someserver consul[115752]:           Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: -1, gRPC-TLS: 8503, DNS: 8600)
Jul  5 05:21:13 someserver consul[115752]:          Cluster Addr: 10.1.1.71 (LAN: 8301, WAN: 8302)
Jul  5 05:21:13 someserver consul[115752]:     Gossip Encryption: false
Jul  5 05:21:13 someserver consul[115752]:      Auto-Encrypt-TLS: false
Jul  5 05:21:13 someserver consul[115752]:           ACL Enabled: false
Jul  5 05:21:13 someserver consul[115752]:     Reporting Enabled: false
Jul  5 05:21:13 someserver consul[115752]:    ACL Default Policy: allow
Jul  5 05:21:13 someserver consul[115752]:             HTTPS TLS: Verify Incoming: false, Verify Outgoing: false, Min Version: TLSv1_2
Jul  5 05:21:13 someserver consul[115752]:              gRPC TLS: Verify Incoming: false, Min Version: TLSv1_2
Jul  5 05:21:13 someserver consul[115752]:      Internal RPC TLS: Verify Incoming: false, Verify Outgoing: false (Verify Hostname: false), Min Version: TLSv1_2
Jul  5 05:21:13 someserver consul[115752]: ==> Log data will now stream in as it occurs:
Jul  5 05:21:13 someserver consul[115752]: 2023-07-05T05:21:13.443-0400 [WARN]  agent: skipping file /etc/consul.d/consul.env, extension must be .hcl or .json, or config format must be set
Jul  5 05:21:13 someserver consul[115752]: 2023-07-05T05:21:13.443-0400 [WARN]  agent: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
Jul  5 05:21:13 someserver consul[115752]: 2023-07-05T05:21:13.443-0400 [WARN]  agent: bootstrap = true: do not enable unless necessary
Jul  5 05:21:13 someserver consul[115752]: 2023-07-05T05:21:13.451-0400 [WARN]  agent.auto_config: skipping file /etc/consul.d/consul.env, extension must be .hcl or .json, or config format must be set
Jul  5 05:21:13 someserver consul[115752]: 2023-07-05T05:21:13.451-0400 [WARN]  agent.auto_config: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
Jul  5 05:21:13 someserver consul[115752]: 2023-07-05T05:21:13.451-0400 [WARN]  agent.auto_config: bootstrap = true: do not enable unless necessary
Jul  5 05:21:13 someserver consul[115752]: 2023-07-05T05:21:13.454-0400 [INFO]  agent.server.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:0f16565d-d897-2e7d-5975-ba4be6770d16 Address:10.1.1.71:8300}]"
Jul  5 05:21:13 someserver consul[115752]: 2023-07-05T05:21:13.454-0400 [INFO]  agent.server.raft: entering follower state: follower="Node at 10.1.1.71:8300 [Follower]" leader-address= leader-id=
Jul  5 05:21:13 someserver consul[115752]: 2023-07-05T05:21:13.455-0400 [INFO]  agent.server.serf.wan: serf: EventMemberJoin: someserver.dc1 10.1.1.71
Jul  5 05:21:13 someserver consul[115752]: 2023-07-05T05:21:13.455-0400 [WARN]  agent.server.serf.wan: serf: Failed to re-join any previously known node
Jul  5 05:21:13 someserver consul[115752]: 2023-07-05T05:21:13.455-0400 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: someserver 10.1.1.71
Jul  5 05:21:13 someserver consul[115752]: 2023-07-05T05:21:13.455-0400 [INFO]  agent.router: Initializing LAN area manager
Jul  5 05:21:13 someserver consul[115752]: 2023-07-05T05:21:13.456-0400 [WARN]  agent.server.serf.lan: serf: Failed to re-join any previously known node
Jul  5 05:21:13 someserver consul[115752]: 2023-07-05T05:21:13.456-0400 [INFO]  agent.server: Adding LAN server: server="someserver (Addr: tcp/10.1.1.71:8300) (DC: dc1)"
Jul  5 05:21:13 someserver consul[115752]: 2023-07-05T05:21:13.456-0400 [INFO]  agent.server: Handled event for server in area: event=member-join server=someserver.dc1 area=wan
Jul  5 05:21:13 someserver consul[115752]: 2023-07-05T05:21:13.456-0400 [INFO]  agent.server.autopilot: reconciliation now disabled
Jul  5 05:21:13 someserver consul[115752]: 2023-07-05T05:21:13.457-0400 [INFO]  agent.server.cert-manager: initialized server certificate management
Jul  5 05:21:13 someserver consul[115752]: 2023-07-05T05:21:13.457-0400 [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=udp
Jul  5 05:21:13 someserver consul[115752]: 2023-07-05T05:21:13.457-0400 [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=tcp
Jul  5 05:21:13 someserver consul[115752]: 2023-07-05T05:21:13.457-0400 [INFO]  agent: Starting server: address=127.0.0.1:8500 network=tcp protocol=http
Jul  5 05:21:13 someserver consul[115752]: 2023-07-05T05:21:13.457-0400 [INFO]  agent: Started gRPC listeners: port_name=grpc_tls address=127.0.0.1:8503 network=tcp
Jul  5 05:21:13 someserver consul[115752]: 2023-07-05T05:21:13.458-0400 [INFO]  agent: started state syncer
Jul  5 05:21:13 someserver consul[115752]: 2023-07-05T05:21:13.458-0400 [INFO]  agent: Consul agent running!
Jul  5 05:21:22 someserver consul[115752]: 2023-07-05T05:21:22.190-0400 [WARN]  agent.leaf-certs: handling error in Manager.Notify: error="No cluster leader" index=1
Jul  5 05:21:22 someserver consul[115752]: 2023-07-05T05:21:22.190-0400 [ERROR] agent.server.cert-manager: failed to handle cache update event: error="leaf cert watch returned an error: No cluster leader"
Jul  5 05:21:23 someserver consul[115752]: 2023-07-05T05:21:23.309-0400 [WARN]  agent.server.raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id=
Jul  5 05:21:23 someserver consul[115752]: 2023-07-05T05:21:23.309-0400 [INFO]  agent.server.raft: entering candidate state: node="Node at 10.1.1.71:8300 [Candidate]" term=11
Jul  5 05:21:23 someserver consul[115752]: 2023-07-05T05:21:23.310-0400 [INFO]  agent.server.raft: election won: term=11 tally=1
Jul  5 05:21:23 someserver consul[115752]: 2023-07-05T05:21:23.310-0400 [INFO]  agent.server.raft: entering leader state: leader="Node at 10.1.1.71:8300 [Leader]"
Jul  5 05:21:23 someserver consul[115752]: 2023-07-05T05:21:23.310-0400 [INFO]  agent.server: cluster leadership acquired
Jul  5 05:21:23 someserver consul[115752]: 2023-07-05T05:21:23.310-0400 [INFO]  agent.server: New leader elected: payload=someserver
Jul  5 05:21:23 someserver consul[115752]: 2023-07-05T05:21:23.314-0400 [INFO]  agent.server.autopilot: reconciliation now enabled
Jul  5 05:21:23 someserver consul[115752]: 2023-07-05T05:21:23.314-0400 [INFO]  agent.leader: started routine: routine="federation state anti-entropy"
Jul  5 05:21:23 someserver consul[115752]: 2023-07-05T05:21:23.314-0400 [INFO]  agent.leader: started routine: routine="federation state pruning"
Jul  5 05:21:23 someserver consul[115752]: 2023-07-05T05:21:23.314-0400 [INFO]  agent.leader: started routine: routine="streaming peering resources"
Jul  5 05:21:23 someserver consul[115752]: 2023-07-05T05:21:23.314-0400 [INFO]  agent.leader: started routine: routine="metrics for streaming peering resources"
Jul  5 05:21:23 someserver consul[115752]: 2023-07-05T05:21:23.314-0400 [INFO]  agent.leader: started routine: routine="peering deferred deletion"
Jul  5 05:21:23 someserver consul[115752]: 2023-07-05T05:21:23.314-0400 [INFO]  connect.ca: initialized primary datacenter CA from existing CARoot with provider: provider=consul
Jul  5 05:21:23 someserver consul[115752]: 2023-07-05T05:21:23.314-0400 [INFO]  agent.leader: started routine: routine="intermediate cert renew watch"
Jul  5 05:21:23 someserver consul[115752]: 2023-07-05T05:21:23.314-0400 [INFO]  agent.leader: started routine: routine="CA root pruning"
Jul  5 05:21:23 someserver consul[115752]: 2023-07-05T05:21:23.314-0400 [INFO]  agent.leader: started routine: routine="CA root expiration metric"
Jul  5 05:21:23 someserver consul[115752]: 2023-07-05T05:21:23.314-0400 [INFO]  agent.leader: started routine: routine="CA signing expiration metric"
Jul  5 05:21:23 someserver consul[115752]: 2023-07-05T05:21:23.314-0400 [INFO]  agent.leader: started routine: routine="virtual IP version check"
Jul  5 05:21:23 someserver consul[115752]: 2023-07-05T05:21:23.314-0400 [INFO]  agent.leader: started routine: routine="config entry controllers"
Jul  5 05:21:23 someserver consul[115752]: 2023-07-05T05:21:23.314-0400 [INFO]  agent.leader: stopping routine: routine="virtual IP version check"
Jul  5 05:21:23 someserver consul[115752]: 2023-07-05T05:21:23.314-0400 [INFO]  agent.leader: stopped routine: routine="virtual IP version check"
Jul  5 05:21:23 someserver consul[115752]: 2023-07-05T05:21:23.458-0400 [ERROR] agent.server.autopilot: Failed to reconcile current state with the desired state
Jul  5 05:21:33 someserver consul[115752]: 2023-07-05T05:21:33.268-0400 [INFO]  agent: Synced node info
Jul  5 05:22:43 someserver systemd[1]: consul.service: start operation timed out. Terminating.
Jul  5 05:22:43 someserver consul[115752]: 2023-07-05T05:22:43.594-0400 [INFO]  agent: Caught: signal=terminated
Jul  5 05:22:43 someserver consul[115752]: 2023-07-05T05:22:43.594-0400 [INFO]  agent: Graceful shutdown disabled. Exiting
Jul  5 05:22:43 someserver consul[115752]: 2023-07-05T05:22:43.594-0400 [INFO]  agent: Requesting shutdown

@loshz
Copy link
Contributor

loshz commented Jul 6, 2023

@JSurf thanks for the detailed explanation. I'll schedule some time internally next week to look into this further to determine how we want to deal with the potential regression, and I'll report back.

@steinbrueckri
Copy link

@loshz Any update about that?

@loshz
Copy link
Contributor

loshz commented Aug 31, 2023

Apologies for the delay here. We're scoping out a small piece of work to make the systemd notify mechanism more robust and hope to have this included in the next set of patch releases due in a couple of weeks.

For now, manually changing the systemd config to Type=simple, should solve the single node cluster startup issues. As per the docs, you can also try this:

The retry_join parameter is required for the systemd process to complete successfully and send its notify signal on LAN join.

I'll report back shortly.

@deepankarsharma
Copy link

@loshz - any updates on this? Ran into this with consul 1.17.1 on ubuntu.

@agoddard
Copy link

+1, on Ubuntu with 1.17.1, notify is the unit file default, but I'm not seeing a notification to systems before systemd times out and restarts cluster members every 90 seconds. The cluster is healthy, and changing type to exec works fine

@loshz
Copy link
Contributor

loshz commented Jan 29, 2024

@deepankarsharma @agoddard - are you able to share your cluster setup? Single/multi node, etc.?

@agoddard
Copy link

@loshz yep, pretty vanilla:

data_dir = "/opt/consul"
datacenter = "dc1"

server = true
bootstrap_expect = 3
bind_addr = "10.0.0.10"
advertise_addr = "10.0.0.10"
client_addr = "127.0.0.1 10.0.0.1"

ui_config{
  enabled = true
}

and the unit file was stock from the apt package for 1.17.1-1 amd64 (I later changed notify to exec though)

[Unit]
Description="HashiCorp Consul - A service mesh solution"
Documentation=https://www.consul.io/
Requires=network-online.target
After=network-online.target
ConditionFileNotEmpty=/etc/consul.d/consul.hcl

[Service]
Type=notify
EnvironmentFile=-/etc/consul.d/consul.env
User=consul
Group=consul
ExecStart=/usr/bin/consul agent -config-dir=/etc/consul.d/
ExecReload=/bin/kill --signal HUP $MAINPID
KillMode=process
KillSignal=SIGTERM
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants