Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cmd) clean dangling unix sockets on startup #9254

Merged
merged 4 commits into from
Aug 25, 2022

Conversation

flrgh
Copy link
Contributor

@flrgh flrgh commented Aug 16, 2022

Summary

This change set makes Kong more likely to recover from an unclean shutdown.

It also includes a bugfix for kong start wherein the .kong_env file could be rewritten (among other things) if Kong is already running.

Background

When Nginx suffers an unclean shutdown (HW issue, OS issue, SIGKILL, etc), it may fail to clean up any listening unix sockets from the filesystem. The presence of these will cause Kong to fail to start back up again:

[emerg] 2127#0: bind() to unix:/usr/local/kong/worker_events.sock failed (98: Address in use)

Solution

Kong will search for any existing unix sockets within the prefix directory and remove them before attempting to start nginx. If Kong/nginx is already running, no action is taken.

Limits

The search for dangling sockets is performed only in the top-level prefix directory (no recursion), so this will become less effective if at some point we add a unix socket in a nested directory. The integration tests I've added would likely catch this condition, unless said socket is not created by default.

@dndx
Copy link
Member

dndx commented Aug 16, 2022

Any reason we do not turn this feature on by default?

@flrgh
Copy link
Contributor Author

flrgh commented Aug 17, 2022

Any reason we do not turn this feature on by default?

The primary risk I can think of is that if the master pid file is lost at runtime, kong start won't detect that nginx is already running. Without --cleanup-sockets, kong start will just fail. With --cleanup-sockets, kong start will actively break the current instance that is running by deleting all of the unix sockets before failing.

That's all edge case territory, but losing a pid file to logrotate or somebody's errant filesystem cleanup cron is definitely not unheard of. Docker/systemd will run with daemon=off, so losing track of the nginx master pid is not really much of a worry there.

@dndx
Copy link
Member

dndx commented Aug 19, 2022

@flrgh Hmm, I have been thinking about it, if the master PID file is lost, Nginx will fail to bind anyway right? So we can not start new instance in that case. Also a new .kong_env could be written and a lot of things might break in the meantime (not just this socket file).

@flrgh
Copy link
Contributor Author

flrgh commented Aug 19, 2022

Weirdly enough, there's nothing to prevent us from overwriting .kong_env currently. Seems like these two lines should probably be swapped:

kong/kong/cmd/start.lua

Lines 24 to 27 in d6307f2

assert(prefix_handler.prepare_prefix(conf, args.nginx_conf))
assert(not kill.is_running(conf.nginx_pid),
"Kong is already running in " .. conf.prefix)

@mayocream mayocream added this to the 3.0 milestone Aug 24, 2022
@dndx
Copy link
Member

dndx commented Aug 24, 2022

@flrgh Discussed with the team, let's enable this by default and make it into 3.0 as a feature (with a bit more elaborate warning when dangling sockets are detected and removed).

@flrgh
Copy link
Contributor Author

flrgh commented Aug 24, 2022

@flrgh Discussed with the team, let's enable this by default and make it into 3.0 as a feature (with a bit more elaborate warning when dangling sockets are detected and removed).

@dndx Okie doke, just to make sure I understand: no --cleanup-sockets flag then?

@tyler-ball
Copy link
Contributor

No --cleanup-sockets flag, this should be the default behavior

If Kong is already running, `kong start` should not "prepare" or
otherwise alter the prefix directory.
@flrgh flrgh force-pushed the feat/cli-cleanup-orphaned-sockets branch from 7ed31b8 to 5da2a09 Compare August 24, 2022 18:40
@flrgh flrgh changed the title feat(cmd) add --cleanup-sockets flag feat(cmd) clean dangling unix sockets on startup Aug 24, 2022
@flrgh flrgh marked this pull request as ready for review August 24, 2022 18:52
@flrgh flrgh requested a review from a team as a code owner August 24, 2022 18:52
@flrgh
Copy link
Contributor Author

flrgh commented Aug 24, 2022

@dndx okay, this is ready for review now.

I have also added a fix for the problem I mentioned here and added a test case for it. We should not be altering the prefix directory in kong start if Kong is already running. While I feel it's an important fix that is not very intrusive, it's not explicitly tied to the unix socket issue, so let me know if I should take it out and/or put it in a separate PR.

locao pushed a commit that referenced this pull request Jun 21, 2024
Cherry-picked from #13079

Co-authored-by: Chrono <chrono_cpp@me.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants