-
Notifications
You must be signed in to change notification settings - Fork 1
Fix child process termination in containerized environments #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
| // reparented to the nearest still living ancestor subreaper. | ||
| // This is useful in containerized environments to ensure proper cleanup of | ||
| // orphaned child processes. | ||
| func SetSubreaper() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot seems you forgot to actually set the process as subreaper and waiting for its childs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been addressed in commit c1a9ec4. The process is now properly set as a subreaper by calling SetSubreaper() in Run(), and reapChildren() continuously waits for and reaps orphaned child processes using Wait4(-1, ...) with WNOHANG.
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
|
|
||
| // Stop stops the running process by senging KillSignal to the PID annotated in the pidfile | ||
| // Stop stops the running process by sending KillSignal to the PID annotated in the pidfile | ||
| func (p *Process) Stop() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot since we are here at touching Stop(), let's improve errors that are returned such as we can identify better where they are coming from, and don't shadow errors, but wrap them where necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improved in 2b22273. The Stop() method now:
- Wraps errors with
fmt.Errorfand%wto preserve error chains - Includes contextual information (PID values, signal types) in error messages
- Checks and reports errors from
killProcess()calls instead of ignoring them - Provides specific error messages for each failure scenario (read PID, parse PID, send signal)
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Summary
This PR fixes child process termination issues in containerized environments, addressing resource leaks (especially GPU memory) when stopping processes. Fixes mudler/LocalAI#7958.
Changes Implemented
GracefulTimeout(default 15s) andKillProcessGroup(default true) config optionsWithGracefulTimeout()andWithKillProcessGroup()option functionsSetpgid: trueinRun()(Unix/Linux)Stop()(Unix/Linux)SetSubreaper()function for Linux (using prctl)SetSubreaper()inRun()and implement child reapingreapChildren()to wait for and reap orphaned child processesStop()with proper context and wrappingRoot Causes Addressed
✅ No process group handling → Added
Setpgid: truefor Unix/Linux✅ Signals only sent to direct child → Sending to entire process group via negative PID
✅ Immediate SIGKILL after SIGTERM → Added 15-second graceful timeout
✅ No subreaper in containers → Linux-specific subreaper support via prctl + child reaping
Subreaper Implementation
The subreaper functionality now properly:
SetSubreaper()when starting a process to mark the current process as a subreaperreapChildren()goroutine that continuously waits for and reaps orphaned child processesError Handling Improvements
The
Stop()method now provides better error context:fmt.Errorfand%wto preserve error chainskillProcess()callsTesting
Files Changed
config.go: Added new config fields with defaultsoptions.go: Added option functionsprocess.go: Updated Run() to call SetSubreaper(), monitor() to spawn reaper, improved Stop() error handlingprocess_unix.go: Unix-specific process group handling + reapChildren() implementationprocess_windows.go: Windows-specific process handling + no-op reapChildren()subreaper_linux.go: Linux subreaper supportsubreaper_other.go: No-op for non-Linux platformsprocess_test.go: Added process group termination test💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.