Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dhcpd6 prefixes script can fail to restart #8109

Merged
merged 5 commits into from
Jan 4, 2025

Conversation

bensmithurst
Copy link
Contributor

While investigating a problem with delegated prefixes not getting a route set to the client which received the prefix I discovered that the problem was caused by the dhcp/prefixes.sh script not running.

When I checked the system log I saw messages like this:

[meta sequenceId="54"] /usr/local/sbin/pluginctl: The command '/usr/sbin/daemon -f -p '/var/run/dhcpleases6.pid' '/usr/local/opnsense/scripts/dhcp/prefixes.sh'' returned exit code '3', the output was 'daemon: process already running, pid: 67377'

After a bit of digging I discovered it will often not cleanly restart due to the script itself being killed, but the daemon wrapper will wait for EOF on the pipe to the child process, and the sleep process called by the script also holds this pipe open. By redirecting sleep's stdout/stderr to /dev/null, it is no longer connected the pipe and will reliably restart.

I can reproduce the problem by doing sudo pluginctl -s dhcpd6 restart and checking if prefixes.sh appears in the process list. It will often fail to restart without this patch, and reliably restarts with the patch.

@fichtner fichtner self-assigned this Dec 5, 2024
@fichtner
Copy link
Member

fichtner commented Dec 5, 2024

Uh, thanks for the report/proposed fix. I'm wondering of adding set -e to the script instead of this might also do the trick?

@bensmithurst
Copy link
Contributor Author

I will check later but off the top of my head I don't think that will work as that would deal with a command in the script exiting with a non-zero status, which isn't what's happening here - the problem is the script being killed while a child process is still alive.

@fichtner
Copy link
Member

fichtner commented Dec 5, 2024

In theory the sleep should be interrupted causing a non-zero return code causing the script to stop.. if not we can also take your fix, although it will need a comment attached

@fichtner
Copy link
Member

fichtner commented Dec 5, 2024

For the record, -f is supposed to close file descriptors to avoid such things. All a bit fishy.

@bensmithurst
Copy link
Contributor Author

Added a comment, I'll do some more digging later - thanks for the pointers.

@bensmithurst
Copy link
Contributor Author

daemon -m 0 would appear to prevent the script's stdout/stderr being piped to the parent process so that could be a better fix.

@fichtner
Copy link
Member

fichtner commented Dec 5, 2024

@bensmithurst ok sounds good

This was changed from a simple redirect of the sleep command after
discussion with Franco on opnsense#8109
@bensmithurst
Copy link
Contributor Author

@fichtner I have updated the PR now with the daemon -m0 approach, which also works for me, and feels a bit cleaner than the sleep redirect.

@fichtner fichtner merged commit b962ccd into opnsense:master Jan 4, 2025
@fichtner
Copy link
Member

fichtner commented Jan 4, 2025

Merged, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants