-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drivers: update ordering of events in StartTask to fix executor leak #24495
Conversation
If an error occurs before a task is started, but after this executor process is created, the executor must be explicity stopped. This change updates the logic in StartTask so launching the task happens immediately after creating the executor.
I wonder if using named return values would be a better way to prevent a regression. With named return values, we can defer a function that checks We could also write a single test that forces any error after the executor is created, and that should suffice to catch this bug in most other error scenarios. |
If you think it's a solution worth exploring, then I would encourage you to look into it. The testing aspect in particular would be very useful in my opinion. The current code in this PR looks like it would fix the problem, so we can always fallback here if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and thanks @mismithhisler!
I tested this on a Linux workstation with the details supplied and it was easy to witness the fix.
I noticed the added test takes 12s to run which seems a little long. It might be a follow up to see if this can be lowered. Interestingly, the NoOrphanedTasks
test is also long at 27s, so it might be something these tests need.
Description
If an error occurs before a task is started, but after this executor process is created, the executor must be explicity stopped. This change updates the logic in StartTask so launching the task happens immediately after creating the executor.
Testing & Reproduction steps
Run a job with a configuration that would fail after the executor process has started, but before the task is launched. For example, an exec task with
cap_add = ["net_raw"]
. After the job fails, the executor process will still be running.Links
Fixes GH #11958
Contributor Checklist
changelog entry using the
make cl
command.ensure regressions will be caught.
and job configuration, please update the Nomad website documentation to reflect this. Refer to
the website README for docs guidelines. Please also consider whether the
change requires notes within the upgrade guide.
Reviewer Checklist
backporting document.
in the majority of situations. The main exceptions are long-lived feature branches or merges where
history should be preserved.
within the public repository.