Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stale lock file prevents dehydrated from running #813

Closed
jomat opened this issue Apr 8, 2021 · 5 comments
Closed

Stale lock file prevents dehydrated from running #813

jomat opened this issue Apr 8, 2021 · 5 comments

Comments

@jomat
Copy link

jomat commented Apr 8, 2021

dehydrated/dehydrated

Lines 539 to 541 in 5c1551e

( set -C; date > "${LOCKFILE}" ) 2>/dev/null || _exiterr "Lock file '${LOCKFILE}' present, aborting."
remove_lock() { rm -f "${LOCKFILE}"; }
trap 'remove_lock' EXIT

dehydrated sometimes doesn't start because of stale lock files. I haven't investigated further, but I assume it happens when a server is restarted while dehydrated is running. Can be reproduced with a SIGKILL.

@kousu
Copy link
Contributor

kousu commented Apr 24, 2021

Locking is really crufty and lockfiles are the perhaps the best option unfortunately: https://apenwarr.ca/log/20101213

I haven't run into this yet, but can you adjust your server's update/reboot/whatever cycles to be opposite dehydrated's? And/or could you add a boot script that deletes stale lock files?

@jomat
Copy link
Author

jomat commented Apr 25, 2021

but can you adjust your server's update/reboot/whatever cycles to be opposite dehydrated's

There are several hundred servers with domains in the four-digit range, so it takes some time for dehydrated to finish, and the cron job is distributed on the servers throughout the day, and there are no planned reboots (keyword ksplice), so, no, I can't adjust that.
The problem isn't that big, as I'm also monitoring certificate expiry and we get a notification 29 days in advance.

A reboot script would be a workaround I don't want to use. Currently I've deployed the mentioned fork/PR as our servers are quite homogeneous and the lock file isn't on a nfs.

@kousu
Copy link
Contributor

kousu commented Apr 25, 2021

That's cool, I hope it works out for you.

I don't have that many servers under dehydrated yet, so maybe I'll have to keep my eye out for this as I expand.

@lukas2511
Copy link
Member

If this is a problem you only ever have on reboots you might want to configure dehydrated to put the lockfile into a directory that's mounted in memory (e.g. /dev/shm or /run), that way it can't persist over a reboot. Alternatively you could try running dehydrated using systemd services and timers, that way systemd should be able to wait for dehydrated to finish or at least stop it in a way that would trigger the exit trap.

I'm leaving your pull-request #814 open for now. This is something I really really need to test on lots of platforms before I can merge or implement something similar to it. Having a simple lockfile is just one of the easiest solutions that I'm quite sure will work on older and embedded Linux systems, weird WSL things, BSD systems, etc.

@jomat
Copy link
Author

jomat commented Jun 29, 2021

It also happens in low memory situations:

 + Checking domain name(s) of existing cert... unchanged.
 + Checking expire date of existing cert...
 + Valid till Jul  6 14:00:42 2021 GMT (Less than 30 days). Renewing!
 + Signing domains...
 + Generating private key...
 + Generating signing request...
 + Requesting new certificate order from CA...
/opt/dehydrated/dehydrated: line 964: /usr/bin/tr: Cannot allocate memory
/opt/dehydrated/dehydrated: fork: Cannot allocate memory
/opt/dehydrated/dehydrated: fork: Cannot allocate memory
/opt/dehydrated/dehydrated: fork: Cannot allocate memory
/opt/dehydrated/dehydrated: fork: Cannot allocate memory
/opt/dehydrated/dehydrated -c -g  23.13s user 3.86s system 9% cpu 4:39.88 total
254 root@server ~ # /opt/dehydrated/dehydrated -c -g
# INFO: Using main config file /opt/dehydrated/config
ERROR: Lock file '/opt/dehydrated/lock' present, aborting.

Imho it'd be better to close my PR when it's not suitable and let this issue open?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants