Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 10, 2026

Summary

This PR fixes child process termination issues in containerized environments, addressing resource leaks (especially GPU memory) when stopping processes. Fixes mudler/LocalAI#7958.

Changes Implemented

  • 1. Add GracefulTimeout (default 15s) and KillProcessGroup (default true) config options
  • 2. Add WithGracefulTimeout() and WithKillProcessGroup() option functions
  • 3. Create process groups with Setpgid: true in Run() (Unix/Linux)
  • 4. Kill entire process group using negative PID in Stop() (Unix/Linux)
  • 5. Add graceful timeout: wait between SIGTERM and SIGKILL
  • 6. Add SetSubreaper() function for Linux (using prctl)
  • 7. Add no-op stub for non-Linux platforms
  • 8. Add test verifying child processes are killed when parent stops
  • 9. Create platform-specific implementations (Unix vs Windows)
  • 10. Actually call SetSubreaper() in Run() and implement child reaping
  • 11. Add reapChildren() to wait for and reap orphaned child processes
  • 12. Improve error handling in Stop() with proper context and wrapping

Root Causes Addressed

No process group handling → Added Setpgid: true for Unix/Linux
Signals only sent to direct child → Sending to entire process group via negative PID
Immediate SIGKILL after SIGTERM → Added 15-second graceful timeout
No subreaper in containers → Linux-specific subreaper support via prctl + child reaping

Subreaper Implementation

The subreaper functionality now properly:

  • Calls SetSubreaper() when starting a process to mark the current process as a subreaper
  • Spawns a reapChildren() goroutine that continuously waits for and reaps orphaned child processes
  • Prevents zombie processes from accumulating when child processes terminate
  • Handles platform differences (Unix vs Windows)

Error Handling Improvements

The Stop() method now provides better error context:

  • Wraps errors with fmt.Errorf and %w to preserve error chains
  • Includes contextual information (PID values, signal types) in error messages
  • Checks and reports errors from killProcess() calls
  • Provides specific error messages for different failure scenarios

Testing

  • All 6 tests passing (including new process group termination test)
  • Builds successfully on Linux, macOS, and Windows
  • No race conditions detected
  • No security vulnerabilities (CodeQL clean)

Files Changed

  • config.go: Added new config fields with defaults
  • options.go: Added option functions
  • process.go: Updated Run() to call SetSubreaper(), monitor() to spawn reaper, improved Stop() error handling
  • process_unix.go: Unix-specific process group handling + reapChildren() implementation
  • process_windows.go: Windows-specific process handling + no-op reapChildren()
  • subreaper_linux.go: Linux subreaper support
  • subreaper_other.go: No-op for non-Linux platforms
  • process_test.go: Added process group termination test

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits January 10, 2026 18:05
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
mudler and others added 2 commits January 10, 2026 19:08
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix child process termination in containerized environments Fix child process termination in containerized environments Jan 10, 2026
Copilot AI requested a review from mudler January 10, 2026 18:14
// reparented to the nearest still living ancestor subreaper.
// This is useful in containerized environments to ensure proper cleanup of
// orphaned child processes.
func SetSubreaper() error {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot seems you forgot to actually set the process as subreaper and waiting for its childs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been addressed in commit c1a9ec4. The process is now properly set as a subreaper by calling SetSubreaper() in Run(), and reapChildren() continuously waits for and reaps orphaned child processes using Wait4(-1, ...) with WNOHANG.

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

// Stop stops the running process by senging KillSignal to the PID annotated in the pidfile
// Stop stops the running process by sending KillSignal to the PID annotated in the pidfile
func (p *Process) Stop() error {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot since we are here at touching Stop(), let's improve errors that are returned such as we can identify better where they are coming from, and don't shadow errors, but wrap them where necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved in 2b22273. The Stop() method now:

  • Wraps errors with fmt.Errorf and %w to preserve error chains
  • Includes contextual information (PID values, signal types) in error messages
  • Checks and reports errors from killProcess() calls instead of ignoring them
  • Provides specific error messages for each failure scenario (read PID, parse PID, send signal)

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Copilot AI requested a review from mudler January 10, 2026 18:32
@mudler mudler marked this pull request as ready for review January 10, 2026 18:32
@mudler mudler merged commit 6a2eaee into master Jan 10, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

VRAM is not freed when stopping models

2 participants