-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
systemd unit does not use systemd notify, documentation doesn't provide example of using systemd notify #16844
Comments
* updates `consul.service` systemd service unit to use `Type=notify` to resolve issue hashicorp#16844
This is still an issue and causes startup ordering problems on systemd based distros that use the official rpm/deb packages. I've got an on PR to solve this, but no reviewer has commented, labelled or otherwise interacted with the PR. |
* updates `consul.service` systemd service unit to use `Type=notify` to resolve issue hashicorp#16844 * add changelog update to match
* updates `consul.service` systemd service unit to use `Type=notify` to resolve issue hashicorp#16844 * add changelog update to match
* updates `consul.service` systemd service unit to use `Type=notify` to resolve issue #16844 * add changelog update to match
Fixed with #16845 |
This change is causing all kinds of startup issues with single node instances and nodes that have not (yet) joined a cluster. We tried to re-initialize a 3-Node cluster from scratch, but did not manage to do so because of systemd timeout failures. Some related items: We think consul should either implement #4380, revert this change or document a reliable way to startup and join new nodes and bootstrap single node clusters. We did not find a reliable way to start a new node with systemd and join it to a cluster without getting timeouts from systemd. For everyone searching for a quick solution to fix their broken consul instances, we used a systemd override to change the startup type from "notify" to "simple": In the text editor write the following content:
Save the change, then restart the consul service and consul will finally start normally |
Hi @JSurf - thanks for bringing this to our attention. Could you provide more details on your environment, specifically OS and systemd version as well as the config you're using to start the cluster (if possible)? I can try and replicate and debug in a simulated environment. |
We are on RedHat 8.8 With the old default with systemd "Type=simple" we could just start the node and then use a "consul join " to join the node The simplest way to get the systemd startup to hang with the new Type=notify, is to just install the consul rpm and then run The command will hang for a minute and then display: But systemd never completes the startup process and stays in "activating" state, restarting/timing out the process every minute:
Consul itself seems to startup just fine but gets killed and restarted by systemd every 1 minute:
Obviously it warns about missing server in this scenario, but "consul join" is supposed to be called after successfully starting the service via systemd... We also tried to configure a single node server adding a file /etc/consul.d/single-node.hcl
Which we think should be enough to get a single node system running but this hangs with systemd also
|
@JSurf thanks for the detailed explanation. I'll schedule some time internally next week to look into this further to determine how we want to deal with the potential regression, and I'll report back. |
@loshz Any update about that? |
Apologies for the delay here. We're scoping out a small piece of work to make the systemd notify mechanism more robust and hope to have this included in the next set of patch releases due in a couple of weeks. For now, manually changing the systemd config to
I'll report back shortly. |
@loshz - any updates on this? Ran into this with consul 1.17.1 on ubuntu. |
+1, on Ubuntu with 1.17.1, |
@deepankarsharma @agoddard - are you able to share your cluster setup? Single/multi node, etc.? |
@loshz yep, pretty vanilla:
and the unit file was stock from the apt package for 1.17.1-1 amd64 (I later changed notify to exec though)
|
Overview of the Issue
While briefly noted in the online documentation It seems like at some point the deployment guide example systemd service unit dropped
Type=notify
in spite of it having been present at one point. Likewise the default systemd service unit as shipped in linux packages does not specifyType=notify
. The end result is that what is arguably the most appropriate model of running consul under the most common init system on the most common operating system is not the default nor is it advertised as being the standard/best practice.The documentation examples and packaged systemd unit should be updated to utilize
Type=notify
.Reproduction Steps
systemctl start consul
returns before the consul agent has joined the cluster and synced stateconsul.service
to include the following line it its[Service]
sectionsudo systemctl daemon-reload
sudo systemctl restart consul
Operating system and Environment details
Any major linux distro using rpm or deb packages and any version of consul released since June 2017
The text was updated successfully, but these errors were encountered: